There is a wide range of application contexts and configuration opportunities for jadice server. We will therefore only address the most common scenarios here. They can serve as useful illustrations and inspirations for own implementations.
Most of these scenarios follow this pattern:
-
Receiving a file from the client
-
Processing on the server side
-
Returning the result to the client
This plain pattern corresponds to the simplicity of the examples. Usually, the results are not immediately returned but processed further in cascaded steps. Furthermore, the client is not necessarily the source and target of the data streams. Instead, a central file, mail, or archive server or other directories can fulfill this function.
The Job
is created on the client side at first and then attached to 1…n
nodes.
Jobs can be configured with these settings:
-
JobListener
implementation, for details see the section called “Creating a JobListener” -
TimeLimit
and similarLimit
s, for details see the section called “Configuring Limits” -
Workflow (= acyclical graph, which is defined by interlinked
Node
s) -
Type: This denominator marks
Job
s of a similar kind. Among other purposes, this piece of information is used for statistics on the server side.
In order to create a job, a JMSJobFactory
has to be generated and initialized first.
This JMSJobFactory
establishes a connection to the messaging system and manages all further
communication between the client application and jadice server.
The function createServerJob()
from Example 6.1, “Initializing the JobFactory and Creating a Job (with ActiveMQ as message broker)”
forms the basis for the following examples:
Example 6.1. Initializing the JobFactory and Creating a Job (with ActiveMQ as message broker)
public Job createServerJob() throws JobException { if (jobFactory == null) { // Create a job factory with the parameter "brokerUrl" and the default JMS queue name jobFactory = new JMSJobFactory( new ActiveMQConnectionFactory("tcp://<Broker_IP>:<Broker-Port>"), JMSJobFactory.DEFAULT_QUEUE_NAME); // Provide connection credentials (optionally) jobFactory.setCredentials(new Credentials("my-jms-user", "my-jms-password")); // Connect to the messaging system jobFactory.connect(); } // Create a job for jadice server Job job = jobFactory.createJob(); return job; }
If the JobFactory
has been initialized correctly, it should create all subsequent jobs for conversion.
Example 6.2. Configuring and Executing a Job
// Create a job try (Job job = createServerJob()) { // Apply a timeout limit: job.apply(new TimeLimit(60, TimeUnit.SECONDS)); // Declare the job type job.setType("first-example"); // Attach a JobListener (see below) job.addJobListener(…); // Assemble the workflow (see below) job.attach(…); // Perform the job job.submit(); }
When all jobs which were created by this JobFactory
are completed and no further jobs
are to be created, the connection to the messaging system must be cut. Otherwise
connections are not unblocked and a resource leak can occur.
Example 6.3. Shutting Down the JobFactory at the End of its Life Cycle
public void disconnect() { if (jobFactory != null) { jobFactory.close(); jobFactory = null; } }
With the help of JobListener
Job
statuses and error messages from the server can be processed.
Example 6.4. Example for a JobListener
Implementation
public class MyJobListener implements JobListener { public void stateChanged(Job job, State oldState, State newState) { dump("stateChanged", job, oldState, newState, null, null); } public void executionFailed(Job job, Node node, String messageId, String reason, Throwable cause) { dump("executionFailed", job, node, messageId, reason, cause); } public void errorOccurred(Job job, Node node, String messageId, String message, Throwable cause) { dump("errorOccurred", job, node, messageId, message, cause); } public void warningOccurred(Job job, Node node, String messageId, String message, Throwable cause) { dump("warningOccurred", job, node, messageId, message, cause); } public void subPipelineCreated(Job job, Node parent, Set<? extends Node> createdNodes) { dump("subPipelineCreated", job, parent); } private void dump(String ctx, Job job, Object... args) { System.err.println("Context: " + ctx); System.err.println("Job: " + job.toString()); if (args == null) { return; } for (Object arg : args) { System.err.println(" " + arg.toString()); } } }
jadice server offers two implementations of this interface which can be applied in the integration:
TraceListener
-
Forwards error messages to the client log via Apache Commons Logging
JobListenerAdapter
-
Empty standard implementation of the
JobListener
interface. Classes derived from this interface only have to overwrite the desired methods.
In order to curb the the use of resources when processing a Job
or its nodes, the employment of
Limit
s on it is advisable. The following Limit
s are applicable:
Table 6.1. Available Limit
s
Type of Limit | Description | Context | |
---|---|---|---|
Job |
Node |
||
TimeLimit |
Maximum processing time |
☑ | ☑ |
Maximum number of |
☐ | ☑ | |
Maximum size of the |
☐ | ☑ | |
Maximum number of pages of a generated document |
☐ | ☑[a] | |
Maximum number of nodes that a |
☑ | ☒ | |
[a] This applies to nodes generating documents that know pagination, compare javadoc |
Table 6.2. Explanation to Table 6.1, “Available Limit
s”
☑ | is directly considered at this particular point |
☒ | is not considered at this particular point |
☐ | is not considered but passed on to the nodes (see below) |
You can define what is to happen when a Limit
is exceeded.
Example 6.5. Examples for the Use of Limit
s
TimeLimit tl = new TimeLimit(60, TimeUnit.SECONDS); tl.setExceedAction(WhenExceedAction.ABORT); // default action NodeCountLimit ncl = new NodeCountLimit(20); ncl.setExceedAction(WhenExceedAction.WARN);
With Limit.WhenExceedAction
ABORT the whole job is aborted. With Limit.WhenExceedAction
WARN
the client receives a warning.
Because of the client-side workflow definition not all the nodes may be known or it may not be sensible to attach limits to every node. Limits can therefore be attached to a job and thus be inherited by the respective nodes. The following rules apply to this inheritance:
-
Limit
s withLimit.WhenExceedAction
WARN are inherited in any event. -
Limit
s withLimit.WhenExceedAction
ABORT are not inherited to nodes on which a limit of the same class with theLimit.WhenExceedAction
ABORT has been applied already, even if this limit is less restrictive. -
If
Limit
s of the same class withLimit.WhenExceedAction
ABORT are set both on the client side and the security interface, the more restrictive limits take precedence, see the section called “Restrictions”.
jadice server offers powerful modules to detect unknown file formats. These modules are used when automatically converting unknown files or e-mails (see the section called “Converting Unknown Entry Data into a Consistent Format (PDF)” und the section called “Converting E-Mails to PDF”).
Moreover, these modules can be activated by the StreamAnalysisNode
and
thus used for your own purposes.
Example 6.6. Using the StreamAnalysisNode
try (Job job = createServerJob()) { job.setType("run stream analysis"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. analysis node StreamAnalysisNode saNode = new StreamAnalysisNode(); // 3. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow job.attach(siNode.appendSuccessor(saNode).appendSuccessor(soNode)); // Perform the job and send data job.submit(); siNode.addStream(…); siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Reading the meta data final StreamDescriptor descr = stream.getDescriptor(); final String mimeType = descr.getMimeType(); } }
|
The method getStreamBundle() is blocking until jadice server has finished the job. For working in an asynchronous way, a StreamListener can be implemented |
JadiceDocumentInfoNode
analyzes a document by extracting meta data when a document is loaded
with jadice document platform. This information is fed to the StreamDescriptor
and passed on to the next node as
IDocumentInfo
. The document's analyzed format has to be supported by jadice document platform 5.
In the most simple example this information is passed on directly to the client
and returned to the console there with the help of the NotificationNode
.
Example 6.7. Using JadiceDocumentInfoNode
try (Job job = createServerJob()) { job.setType("retrieve document info"); // Instantiate info node JadiceDocumentInfoNode infoNode = new JadiceDocumentInfoNode(); // Create a listener and attach it to a NotificatioNode DocumentInfoListener documentInfoListener = new DocumentInfoListener(); NotificationNode notifyNode = new NotificationNode(); notifyNode.addNotificationResultListener(documentInfoListener); // Assemble the workflow StreamInputNode siNode = new StreamInputNode(); siNode.appendSuccessor(infoNode); infoNode.appendSuccessor(notifyNode); // Discard the data at the end of the analysis: notifyNode.appendSuccessor(new NullNode()); // Perform the job job.attach(siNode); job.submit(); // Submit the data to jadice server and end transmission siNode.addStream(…); siNode.complete(); // Wait for server reply (see above) documentInfoListener.waitForDocumentInfo(); // Retrieve and dump document info: IDocumentInfo documentInfo = documentInfoListener.getDocumentInfo(); System.out.println("Number of pages : " + documentInfo.getPageCount()); // As example here: Details of the first page System.out.println("format : " + documentInfo.getFormat(0)); System.out.println("size (pixels) : " + documentInfo.getSize(0).getWidth() + "x" + documentInfo.getSize(0).getHeight()); System.out.println("resolution (dpi): " + documentInfo.getVerticalResolution(0) + "x" + documentInfo.getHorizontalResolution(0)); }
Example 6.8. The NotificationNode.NotificationListener
used in Example 6.7, “Using JadiceDocumentInfoNode
”
public class DocumentInfoListener implements NotificationListener { /** * DocumentInfo will be generated by the JadiceDocumentInfoNode and attached to the StreamDescriptor */ private IDocumentInfo documentInfo; /** * Latch in order to block the current thread until {@link #documentInfo} is available. * NOTE: This example does not perform any error handling if the job aborts or no result is available! */ private CountDownLatch latch = new CountDownLatch(1); @Override public void notificationReceived(StreamDescriptor streamDescriptor) { final Serializable prop = streamDescriptor.getProperties().get(JadiceDocumentInfoNode.PROPERTY_NAME); if (prop != null && prop instanceof IDocumentInfo) { documentInfo = (IDocumentInfo) prop; latch.countDown(); } } public void waitForDocumentInfo() throws InterruptedException { // Block until documentInfo is available latch.await(); } public IDocumentInfo getDocumentInfo() { return documentInfo; } }
It is possible to merge several PDF documents into one document with the PDFMergeNode
.
Example 6.9. Using the PDFMergeNode
try (Job job = createServerJob()) { job.setType("merge pdfs"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. merge input data (1...n streams to a single stream) PDFMergeNode pmNode = new PDFMergeNode(); // 3. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow and perform the job job.attach(siNode.appendSuccessor(pmNode).appendSuccessor(soNode)); job.submit(); // Send PDF documents siNode.addStream(…); siNode.addStream(…); // ... possible further PDF documents // Signalise the end of input data siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Reading the data InputStream is = stream.getInputStream(); // Work with this data (not shown) … } }
Most conversion processes create PDF files (e. g. in LibreOffice). However, by inserting the
ReshapeNode
it is possible to convert the result further to a TIFF format.
The following example shows the change to the workflow from Example 6.9, “Using the PDFMergeNode
”.
Instead of the PDFMergeNode
a conversion to TIFF with subsequent aggregation is attached:
Example 6.10. Converting to Tiff
// (...) ReshapeNode reshapeNode = new ReshapeNode(); reshapeNode.setTargetMimeType("image/tiff"); // Join all incoming data to one resulting stream reshapeNode.setOutputMode(ReshapeNode.OutputMode.JOINED); // Assemble the workflow and include the TIFF converter node job.attach(siNode. appendSuccessor(reshapeNode). appendSuccessor(soNode)); // (...) }
In order to display documents with their annotations in standard software, these annotations have to be anchored in the source format.
You can use the ReshapeNode
for anchoring. The following example shows
the necessary association of the document's and the annotations' data streams:
Example 6.11. Permanent Anchoring of Annotations
/** * stub interface in order to bundle a document and its annotations */ interface DocumentAndAnnotations { InputStream getContent(); List<InputStream> getAnnotations(); } public void convert(DocumentAndAnnotations doc) throws JMSException, JobException, IOException { try (Job job = createServerJob()) { job.setType("imprint annotations"); // Instantiate nodes: StreamInputNode inputNode = new StreamInputNode(); ReshapeNode reshapeNode = new ReshapeNode(); StreamOutputNode outputNode = new StreamOutputNode(); // Define the target MIME type (e.g. PDF) reshapeNode.setTargetMimeType("application/pdf"); // Associate the annotations streams with the content reshapeNode.setOutputMode(ReshapeNode.OutputMode.ASSOCIATED_STREAM); // Assemble the workflow and perform the job job.attach(inputNode.appendSuccessor(reshapeNode).appendSuccessor(outputNode)); job.submit(); // Sending document content (with explicitly declared MIME type here) final StreamDescriptor contentSD = new StreamDescriptor("application/pdf"); inputNode.addStream(doc.getContent(), contentSD); // Process annotations: for (InputStream annoStream : doc.getAnnotations()) { StreamDescriptor annoSD = new StreamDescriptor(); // Associate document and annotation: annoSD.setParent(contentSD); // Declare the annotations' MIME type (e.g. Filenet P8): annoSD.setMimeType(ReshapeNode.AnnotationMimeTypes.FILENET_P8); // Send annotation stream inputNode.addStream(annoStream, annoSD); } // Signalise the end of input data inputNode.complete(); // Handle the job result (not shown) … }
Two settings are essential for this configuration:
The data streams of document and annotations have to be connected via the
StreamDescriptor
hierarchy, which means that the document's StreamDescriptor
has to be set as the parent of the annotations' StreamDescriptor
s.
There are pre-defined constants for the available MIME types of annotations in
the class ReshapeNode
, which have to be set by all means.
For further information on annotation formats and their properties see the
annotations manual of the jadice document platform 5.
Confidential File Content
Note that an analysis of the document's content does not occur at any time during processing. Contents which are concealed by annotations can still be contained in the target data stream (depending on the file format). Moreover, objectionable and confidential (meta) data may remain in the document.
In order to reduce network load, files are often packed. These compressed files can be unpacked by jadice server before further processing. Depending on the file format, this unpacking is realized in different node classes:
Table 6.3. Node Classes for Unpacking Archive Files
File format | Node class | Remarks |
---|---|---|
ZIP | UnZIPNode |
|
RAR |
UnRARNode |
|
GZIP |
UnGZIPNode |
|
TAR |
UnTARNode |
The following code example shows this process using the UnZIPNode
:
Example 6.12. Using UnZIPNode
try (Job job = createServerJob()) { job.setType("unpack zip"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. unpacking of ZIP archives UnZIPNode unzipNode = new UnZIPNode(); // 3. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow job.attach(siNode.appendSuccessor(unzipNode).appendSuccessor(soNode)); // Perform the job job.submit(); // Send data siNode.addStream(…); // Signalise the end of input data siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Reading the data: (1 stream per file in the archive) System.out.println("file name: " + stream.getDescriptor().getFileName()); InputStream is = stream.getInputStream(); // Work with this data (not shown) … } }
Document standardization is extremely useful, especially in long-term archiving. A standardization achieved by accessing the data stream, automatically analyzing the data, target-oriented and dynamic processing and eventual archiving has many advantages:
The retrieving application does not need any knowledge of the source file and format. There is no danger of corrupted or malicious data or documents. Moreover, network traffic is minimized. Because of its structure jadice server can flexibly control the conversion result at any time.
Example 6.13. Converting Miscellaneous Data Streams to PDF
try (Job job = createServerJob()) { job.setType("convert to pdf"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. node for dynamic data converserion DynamicPipelineNode dpNode = new DynamicPipelineNode(); dpNode.setRuleset(new URI("resource:/dynamic-pipeline-rules/default.xml")); // 3. merge input data (1...n streams to a single stream) PDFMergeNode pmNode = new PDFMergeNode(); // 4. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow job.attach(siNode.appendSuccessor(dpNode).appendSuccessor(pmNode).appendSuccessor(soNode)); // Perform the job and send data job.submit(); siNode.addStream(…); siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Work with the result InputStream is = stream.getInputStream(); … } }
If you add an own implementation of a JobListener
to this job, you can
find out which further Node
s have been dynamically created by jadice server with the
help of the method subPipelineCreated
.
You can find the applied rules in the directory
/server-config/dynamic-pipeline-rules
.
These XML-based rules can be adapted to your requirements. The XML scheme, which can
be found in the same directory, will help you.
Example 6.14. Accessing LibreOffice
try(Job job = createServerJob()) { job.setType("libreoffice to pdf"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. Conversion via LibreOffice LibreOfficeConversionNode loNode = new LibreOfficeConversionNode(); // 3. merge input data (1...n streams to a single stream) PDFMergeNode pmNode = new PDFMergeNode(); // 4. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow job.attach(siNode.appendSuccessor(loNode).appendSuccessor(pmNode).appendSuccessor(soNode)); // Perform the job and send document data job.submit(); siNode.addStream(…); siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Reading the data InputStream is = stream.getInputStream(); // Work with this data (not shown) … } }
Note
In order to access LibreOffice the class path has to be set according to the parameters described in the section called “Configuring LibreOffice”.
Note
Documents in Word2007 format (file ends in docx
)
have to be pre-processed by the StreamAnalysisNode
before conversion
(see the section called “Identifying Unknown Entry Data”).
When converting e-mails, the e-mail is fetched directly from the mail server. Thus, the respective access information has to be provided.
The process is similar to the dynamic conversion (see the section called “Converting Unknown Entry Data into a Consistent Format (PDF)”). The e-mail is analyzed, all potential attachments like office documents, pictures etc. are converted, summarized in a list which is annexed to the e-mail's text.
During this process archive files are unpacked and their content is integrated in the conversion.
Example 6.15. Converting E-Mails Fetched Directly from the Server
try (Job job = createServerJob()) { job.setType("mail to pdf"); // Instantiate nodes: // 1. input node that retrieves the mail from a mail server JavamailInputNode jiNode = new JavamailInputNode(); // Configuration of the mail server jiNode.setStoreProtocol("<protocol>"); // POP3 or IMAP jiNode.setHostName("<server>"); jiNode.setUsername("<user>"); jiNode.setPassword("<password>"); jiNode.setFolderName("<e-mail folder>"); jiNode.setImapMessageUID(…); // 2. Perform the email conversion ScriptNode scNode = new ScriptNode(); scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy")); // 3. merge data (1...n streams to a single stream) PDFMergeNode pmNode = new PDFMergeNode(); // 4. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow and perform the job job.attach(jiNode.appendSuccessor(scNode).appendSuccessor(pmNode).appendSuccessor(soNode)); job.submit(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Work with the result InputStream is = stream.getInputStream(); … } }
If you do not want to fetch e-mails with the JavamailInputNode
via an IMAP or
POP3 account but want to import them as a eml
file, for example, you have to use the MessageRFC822Node
in between as it
carries out the separation of e-mail header and body:
Example 6.16. Converting an eml
File
Job job = createServerJob(); // Instantiate nodes: // 1. input node the receives the mail from the client StreamInputNode siNode = new StreamInputNode(); // 2. Separate of mail header and mail body MessageRFC822Node msgNode = new MessageRFC822Node(); // 3. Perform the email conversion ScriptNode scNode = new ScriptNode(); scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy")); // Further procedure as above
E-Mails in MS Outlook format (msg
files) can be
converted into a format supported by jadice server with the TNEFNode
without
launching MS Outlook and are then ready for further conversion:
Example 6.17. Converting a msg
File
Job job = createServerJob(); // Instantiate nodes: // 1. input node the receives the mail from the client StreamInputNode siNode = new StreamInputNode(); // 2. Pre-processing of MSG files TNEFNode tnefNode = new TNEFNode(); tnefNode.setInputFormat(InputFormat.MSG); // 3. Perform the email conversion ScriptNode scNode = new ScriptNode(); scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy")); // Further procedure as above
Note
Note that the mail's body of msg
files is usually
in rich text format (rtf
) and is thus converted
to PDF via LibreOffice in the standard configuration.
In the configuration shown above a separator page which contains the meta data of
the attachment is automatically generated for every file attachment. If you do not
want these separator pages, you can deactivate them for all attachments with the
following configuration of the ScriptNode
:
scNode.getParameters().put("showAttachmentSeparators", false);
Another configuration enables the conversion of formatted e-mails. If these
e-mails have been sent in HTML as well as plain text format, the HTML part will be
converted by default. If the plain text part should be converted instead, the
ScriptNode
needs to be configured as follows:
scNode.getParameters().put("preferPlainTextBody", true);
Regardless of which part is chosen to be converted, the other one can be be attached to the e-mail additionally. Thus, the converted e-mail can be displayed in both HTML and plain text format. In order to attach the other format use the following configuration:
scNode.getParameters().put("showAllAlternativeBody", true);
The following setting prevents jadice server from loading images and other files from unknown sources which are referenced in e-mails:
scNode.getParameters().put("allowExternalHTTPResolution", false);
The parameter unhandledAttachmentAction
controls
the treatment of attachments whose format cannot be detected or is not
targeted for conversion with jadice server:
scNode.getParameters().put("unhandledAttachmentAction", "failure");
The following values are accepted for this parameter:
Value | Meaning |
---|---|
|
A warning is written into the log. |
|
An error is written into the log (default value). |
|
The respective job aborts with an error. |
In order to indicate image files which are referenced in an e-mail but have not been converted, the following placeholders are inserted in the image files' places:
Value | Meaning |
---|---|
The image was not loaded due to the setting
|
|
The image file could not be loaded. |
By default, if images within the HTML markup of an e-mail body are too large to fit onto a PDF page, either by width or by height, they are split onto multiple pages.
Note
In general complex HTML markup is not intended for media that is using a page layout. Hence converting the HTML content could be lossy especially when splitting content due to page breaks. Moreover HTML cannot always be forced into a maximum breadth, which is why sometimes pages are wider than the chosen format. For example when there are tables or complex layouts in HTML markup it is not possible to scale them appropriately onto a specific page format.
Nevertheless, in order to still match the PDFs page format (if possible) in case of images exceeding the
width or the height of the page in HTML markup the MailBodyCreatorNode
can now receive the following
configuration options via setHtmlProcessingMode()
:
Table 6.4. Converting E-Mails: Scaling images in HTML markup to make them fit on a PDF page
Configuration Values | Description |
---|---|
|
Default setting. If an image is too wide or too high it will be split onto multiple pages. |
|
With this setting images that don't fit onto the page format are moved to the attachments part of the PDF document. A placeholder is inserted at the original image position. |
|
Move all images to the attachments part of the PDF. |
|
If possible this setting tries to scale the images such that they fit onto the pdf pages without increasing the page format. This cannot be guaranteed in all cases since otherwise there could be a lossy conversion with complex HTML markups. |
PDF documents can become very large due to images embedded in the file. With the help
of our PDFImageOptimizationNode
you can reduce the file size of an existing PDF document
by decreasing the resolution of the embedded images. The resolution will be reduced according
to a DPI threshold value, which can be set as a parameter. The node checks for each
individual embedded image whether its resolution exceeds the threshold value. If it does,
the image will be replaced by a JPEG image whose resolution corresponds to the threshold
value. The image quality which is to be used in generating the JPEG can also be set as a
parameter (as a percentage value).
The size of the page on which the image is located plays an important role in determining
its resolution. Usually, all pages in PDF documents have the same size (such as A4). However,
individual page sizes can be set in PDF format, which may result in documents containing pages
of different sizes. For such documents the PDFImageOptimizationNode
provides the option of
setting a target page size for individual pages. If you choose not to specify a target page
size, the resolution will be calculated according to the page size used for the overall document.
Setting a target page size may be sensible, particularly if you are concerned
about image quality when printing the document. By setting the target page size you may thus
substantially reduce the overall documents size while retaining image quality.
Example 6.18. Using the PDFImageOptimizationNode
try (Job job = createServerJob()) { job.setType("optimize images"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. optimize embedded images PDFImageOptimizationNode imgOptimizationNode = new PDFImageOptimizationNode(); // 3. set the image resolution threshold to 150 DPI (default: 300) imgOptimizationNode.setMaxResolution(150); // 4. set the JPEG image quality to 80 percent (default: 75) imgOptimizationNode.setJPEGQuality(0.8f); // 5. set the page size of the output device (optional) imgOptimizationNode.setTargetPageSize(PDFImageOptimizationNode.PageSize.A4); // 6. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow and perform the job job.attach(siNode.appendSuccessor(imgOptimizationNode).appendSuccessor(soNode)); job.submit(); // Send PDF document siNode.addStream(…); // Signalise the end of input data siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Reading the data InputStream is = stream.getInputStream(); // Work with this data (not shown) … } }
The ExternalProcessCallNode
makes accessing external software
very easy. jadice server takes care of automatically converting incoming and
outgoing data streams to temporary files and deleting them after the
external software has processed the data.
The only precondition for this operation is that the software can be addressed via command line on the server.
Example 6.19. Using the ExternalProcessCallNode
try (Job job = createServerJob()) { job.setType("external my converter"); // Instantiate nodes: // 1. data input node StreamInputNode siNode = new StreamInputNode(); // 2. start an external process ExternalProcessCallNode epcNode = new ExternalProcessCallNode(); // Configuration: // - Program name (back slashes must be escaped!) epcNode.setProgramName("C:\\Programme\\MyConverter\\MyConverter.exe"); // - Command line parameters (jadice server will substitute ${infile} / ${outfile:pdf}) epcNode.setArguments("-s -a ${infile} /convert=${outfile:pdf}"); // 3. output node StreamOutputNode soNode = new StreamOutputNode(); // Assemble the workflow job.attach(siNode.appendSuccessor(epcNode).appendSuccessor(soNode)); // Submit job and send data job.submit(); StreamDescriptor sd = new StreamDescriptor(); // jadice server will use the name when it stores this file and passes it to the external program sd.setFileName("myfile.dat"); siNode.addStream(new BundledStream(…, sd)); siNode.complete(); // Wait for server reply for (Stream stream : soNode.getStreamBundle()) { // Work with this data (not shown) InputStream is = stream.getInputStream(); … } }
e-mails and archive file formats that shall be converted sometimes may contain files that are irrelevant in terms of business context. To filter out such files there is the possibility to configure a rule-based filtering using the file name.
These rules can be defined in a server-side configuration file (UTF-8 encoding required).
Below short introduction shows a quick example for filtering rules.
Table 6.5. Examples for rules to filter out files during extraction of archive file formats.
Example | Explanation |
---|---|
**/CVS/* |
Matches all files in CVS directories that can be located anywhere in the directory tree. Matches:
But not:
The subdirectories "foo/" and "bar/" of the folder CVS don't match. |
org/apache/jakarta/** |
Matches all files in the org/apache/jakarta directory tree. Matches:
But not:
The subfolder of "org/apache" "jakarta/" is missing hence this rule doesn't match. |
org/apache/**/CVS/* |
Matches all files in CVS directories that are located anywhere in the directory tree under org/apache. Matches:
But not:
The subdirectories "foo/" and "bar/" of the folder CVS don't match. |
**/test/** |
Matches all files that have a test element in their path, including test as a filename. |
More information about possible rules can be found at https://ant.apache.org/manual/dirtasks.html#patterns.
In order to apply a ruleset for an archive file format a configuration for the extraction worker has to be added to the
configuration file server-config/application/workers.xml
.
This functionality is available for the following workers: UnZIPWorker, UnRARWorker, UnSevenZIPWorker and UnTARWorker.
Example 6.20. Configuring a worker for filtering out files of archive file formats (workers.xml)
<bean id="unzipFilterRulesBean" class="com.levigo.jadice.server.archive.worker.filter.AntPatternArchiveEntryFilter"> <!-- The file unzipFilterRules.txt has to be provided in the folder <jadice-server>/server-config/custom/ --> <property name="antPatternFilterRulesURI" value="resource://custom/unzipFilterRules.txt" /> </bean> <workers:worker class="com.levigo.jadice.server.archive.worker.UnZIPWorker"> <property name="filters"> <util:list> <bean class="com.levigo.jadice.server.archive.worker.filter.OSXFilter" /> <ref bean="unzipFilterRulesBean"/> </util:list> </property> </workers:worker>
The namespace xmlns:util="http://www.springframework.org/schema/util"
and the schemaLocations http://www.springframework.org/schema/util
as well as http://www.springframework.org/schema/util/spring-util-2.5.xsd
have to be present in the workers.xml.