Application Scenarios and Code Examples

Creating a Server Job

The Job is created on the client side at first and then attached to 1…n nodes.

Jobs can be configured with these settings:

JobListener implementation, for details see the section called “Creating a JobListener”
TimeLimit and similar Limits, for details see the section called “Configuring Limits”
Workflow (= acyclical graph, which is defined by interlinked Nodes)
Type: This denominator marks Jobs of a similar kind. Among other purposes, this piece of information is used for statistics on the server side.

In order to create a job, a JMSJobFactory has to be generated and initialized first. This JMSJobFactory establishes a connection to the messaging system and manages all further communication between the client application and jadice server. The function createServerJob() from Example 6.1, “Initializing the JobFactory and Creating a Job (with ActiveMQ as message broker)” forms the basis for the following examples:

Example 6.1. Initializing the JobFactory and Creating a Job (with ActiveMQ as message broker)

public Job createServerJob() throws JobException {
  if (jobFactory == null) {
    // Create a job factory with the parameter "brokerUrl" and the default JMS queue name
    jobFactory = new JMSJobFactory(
        new ActiveMQConnectionFactory("tcp://<Broker_IP>:<Broker-Port>"),
        JMSJobFactory.DEFAULT_QUEUE_NAME);
    
    // Provide connection credentials (optionally)
    jobFactory.setCredentials(new Credentials("my-jms-user", "my-jms-password"));
    
    // Connect to the messaging system
    jobFactory.connect();
  }
  
  // Create a job for jadice server
  Job job = jobFactory.createJob();
  return job;
}

If the JobFactory has been initialized correctly, it should create all subsequent jobs for conversion.

Example 6.2. Configuring and Executing a Job

// Create a job
try (Job job = createServerJob()) {
  // Apply a timeout limit:
  job.apply(new TimeLimit(60, TimeUnit.SECONDS));
  // Declare the job type
  job.setType("first-example");
  // Attach a JobListener (see below)
  job.addJobListener(…);
  // Assemble the workflow  (see below)
  job.attach(…);
  // Perform the job
  job.submit();
}

When all jobs which were created by this JobFactory are completed and no further jobs are to be created, the connection to the messaging system must be cut. Otherwise connections are not unblocked and a resource leak can occur.

Example 6.3. Shutting Down the JobFactory at the End of its Life Cycle

public void disconnect() {
  if (jobFactory != null) {
    jobFactory.close();
    jobFactory = null;
  }
}

Creating a JobListener

With the help of JobListener Job statuses and error messages from the server can be processed.

Example 6.4. Example for a JobListener Implementation

public class MyJobListener implements JobListener {
  public void stateChanged(Job job, State oldState, State newState) {
    dump("stateChanged", job, oldState, newState, null, null);
  }

  public void executionFailed(Job job, Node node, String messageId, String reason, Throwable cause) {
    dump("executionFailed", job, node, messageId, reason, cause);
  }

  public void errorOccurred(Job job, Node node, String messageId, String message, Throwable cause) {
    dump("errorOccurred", job, node, messageId, message, cause);
  }

  public void warningOccurred(Job job, Node node, String messageId, String message, Throwable cause) {
    dump("warningOccurred", job, node, messageId, message, cause);
  }

  public void subPipelineCreated(Job job, Node parent, Set<? extends Node> createdNodes) {
    dump("subPipelineCreated", job, parent);
  }

  private void dump(String ctx, Job job, Object... args) {
    System.err.println("Context:   " + ctx);
    System.err.println("Job:       " + job.toString());
    if (args == null) {
      return;
    }
    for (Object arg : args) {
      System.err.println("           " + arg.toString());
    }
  }
}

jadice server offers two implementations of this interface which can be applied in the integration:

TraceListener: Forwards error messages to the client log via Apache Commons Logging
JobListenerAdapter: Empty standard implementation of the JobListener interface. Classes derived from this interface only have to overwrite the desired methods.

Configuring Limits

In order to curb the the use of resources when processing a Job or its nodes, the employment of Limits on it is advisable. The following Limits are applicable:

Table 6.1. Available Limits

Type of Limit	Description	Context
Type of Limit	Description	`Job`	`Node`
`TimeLimit`	Maximum processing time	☑	☑
`StreamCountLimit`	Maximum number of `Stream`s which are provided by a node	☐	☑
`StreamSizeLimit`	Maximum size of the `Stream`s which are provided by a node	☐	☑
`PageCountLimit`	Maximum number of pages of a generated document	☐	☑^[a]
`NodeCountLimit`	Maximum number of nodes that a `Job` can have. This also applies to nodes which are created dynamically by the server.	☑	☒
^[a]This applies to nodes generating documents that know pagination, compare javadoc

Table 6.2. Explanation to Table 6.1, “Available Limits”

☑	is directly considered at this particular point
☒	is not considered at this particular point
☐	is not considered but passed on to the nodes (see below)

You can define what is to happen when a Limit is exceeded.

Example 6.5. Examples for the Use of Limits

TimeLimit tl = new TimeLimit(60, TimeUnit.SECONDS);
tl.setExceedAction(WhenExceedAction.ABORT); // default action

NodeCountLimit ncl = new NodeCountLimit(20);
ncl.setExceedAction(WhenExceedAction.WARN);

With Limit.WhenExceedAction ABORT the whole job is aborted. With Limit.WhenExceedAction WARN the client receives a warning.

Because of the client-side workflow definition not all the nodes may be known or it may not be sensible to attach limits to every node. Limits can therefore be attached to a job and thus be inherited by the respective nodes. The following rules apply to this inheritance:

Limits with Limit.WhenExceedAction WARN are inherited in any event.
Limits with Limit.WhenExceedAction ABORT are not inherited to nodes on which a limit of the same class with the Limit.WhenExceedAction ABORT has been applied already, even if this limit is less restrictive.
If Limits of the same class with Limit.WhenExceedAction ABORT are set both on the client side and the security interface, the more restrictive limits take precedence, see the section called “Restrictions”.

Identifying Unknown Entry Data

jadice server offers powerful modules to detect unknown file formats. These modules are used when automatically converting unknown files or e-mails (see the section called “Converting Unknown Entry Data into a Consistent Format (PDF)” und the section called “Converting E-Mails to PDF”).

Moreover, these modules can be activated by the StreamAnalysisNode and thus used for your own purposes.

Example 6.6. Using the StreamAnalysisNode

try (Job job = createServerJob()) {
  job.setType("run stream analysis");
  
  // Instantiate nodes:
  //  1. data input node
  StreamInputNode siNode = new StreamInputNode();
  // 2. analysis node
  StreamAnalysisNode saNode = new StreamAnalysisNode();
  // 3. output node
  StreamOutputNode soNode = new StreamOutputNode();
  
  // Assemble the workflow
  job.attach(siNode.appendSuccessor(saNode).appendSuccessor(soNode));
  
  // Perform the job and send data
  job.submit();
  siNode.addStream(…);
  siNode.complete();
  
  // Wait for server reply
  for (Stream stream : soNode.getStreamBundle()) {                             
    // Reading the meta data
    final StreamDescriptor descr = stream.getDescriptor();
    final String mimeType = descr.getMimeType();
  }
}

The method getStreamBundle() is blocking until jadice server has finished the job. For working in an asynchronous way, a StreamListener can be implemented

Extracting Document Information

JadiceDocumentInfoNode analyzes a document by extracting meta data when a document is loaded with jadice document platform. This information is fed to the StreamDescriptor and passed on to the next node as IDocumentInfo. The document's analyzed format has to be supported by jadice document platform 5.

In the most simple example this information is passed on directly to the client and returned to the console there with the help of the NotificationNode.

Example 6.7. Using JadiceDocumentInfoNode

try (Job job = createServerJob()) {
    job.setType("retrieve document info");
    
    // Instantiate info node
    JadiceDocumentInfoNode infoNode = new JadiceDocumentInfoNode();

    // Create a listener and attach it to a NotificatioNode
    DocumentInfoListener documentInfoListener = new DocumentInfoListener();
    NotificationNode notifyNode = new NotificationNode();
    notifyNode.addNotificationResultListener(documentInfoListener);

    // Assemble the workflow
    StreamInputNode siNode = new StreamInputNode();
    siNode.appendSuccessor(infoNode);
    infoNode.appendSuccessor(notifyNode);
    // Discard the data at the end of the analysis:
    notifyNode.appendSuccessor(new NullNode());

    // Perform the job
    job.attach(siNode);
    job.submit();

    // Submit the data to jadice server and end transmission
    siNode.addStream(…);
    siNode.complete();

    // Wait for server reply (see above)
    documentInfoListener.waitForDocumentInfo();
    // Retrieve and dump document info:
    IDocumentInfo documentInfo = documentInfoListener.getDocumentInfo();
    System.out.println("Number of pages  : " + documentInfo.getPageCount());
    // As example here: Details of the first page
    System.out.println("format          : " + documentInfo.getFormat(0));
    System.out.println("size (pixels)   : " + documentInfo.getSize(0).getWidth() + "x"
        + documentInfo.getSize(0).getHeight());
    System.out.println("resolution (dpi): " + documentInfo.getVerticalResolution(0) + "x"
        + documentInfo.getHorizontalResolution(0));
  }

Example 6.8. The NotificationNode.NotificationListener used in Example 6.7, “Using JadiceDocumentInfoNode”

public class DocumentInfoListener implements NotificationListener {
  /**
   * DocumentInfo will be generated by the JadiceDocumentInfoNode and attached to the StreamDescriptor
   */
  private IDocumentInfo documentInfo;

  /**
   * Latch in order to block the current thread until {@link #documentInfo} is available.
   * NOTE: This example does not perform any error handling if the job aborts or no result is available!
   */
  private CountDownLatch latch = new CountDownLatch(1);

  @Override
  public void notificationReceived(StreamDescriptor streamDescriptor) {
    final Serializable prop = streamDescriptor.getProperties().get(JadiceDocumentInfoNode.PROPERTY_NAME);
    if (prop != null && prop instanceof IDocumentInfo) {
      documentInfo = (IDocumentInfo) prop;
      latch.countDown();
    }
  }

  public void waitForDocumentInfo() throws InterruptedException {
    // Block until documentInfo is available
    latch.await();
  }

  public IDocumentInfo getDocumentInfo() {
    return documentInfo;
  }
}

Merging Several PDF Documents

It is possible to merge several PDF documents into one document with the PDFMergeNode.

Example 6.9. Using the PDFMergeNode

try (Job job = createServerJob()) {
  job.setType("merge pdfs");
  
  // Instantiate nodes:
  //  1. data input node
  StreamInputNode siNode = new StreamInputNode();
  //  2. merge input data (1...n streams to a single stream)
  PDFMergeNode pmNode = new PDFMergeNode();
  //  3. output node
  StreamOutputNode soNode = new StreamOutputNode();
  
  // Assemble the workflow and perform the job
  job.attach(siNode.appendSuccessor(pmNode).appendSuccessor(soNode));
  job.submit();
  
  // Send PDF documents
  siNode.addStream(…);
  siNode.addStream(…);
  // ... possible further PDF documents
  
  // Signalise the end of input data
  siNode.complete();
  
  // Wait for server reply
  for (Stream stream : soNode.getStreamBundle()) {
    // Reading the data
    InputStream is = stream.getInputStream();
    // Work with this data (not shown)
    …        
  }
}

Converting to TIFF

Most conversion processes create PDF files (e. g. in LibreOffice). However, by inserting the ReshapeNode it is possible to convert the result further to a TIFF format.

The following example shows the change to the workflow from Example 6.9, “Using the PDFMergeNode”. Instead of the PDFMergeNode a conversion to TIFF with subsequent aggregation is attached:

Example 6.10. Converting to Tiff

// (...)
  ReshapeNode reshapeNode = new ReshapeNode();
  reshapeNode.setTargetMimeType("image/tiff");
  // Join all incoming data to one resulting stream
  reshapeNode.setOutputMode(ReshapeNode.OutputMode.JOINED);
  // Assemble the workflow and include the TIFF converter node
  job.attach(siNode.
    appendSuccessor(reshapeNode).
    appendSuccessor(soNode));
  // (...)
}

Permanent Anchoring of Annotations

In order to display documents with their annotations in standard software, these annotations have to be anchored in the source format.

You can use the ReshapeNode for anchoring. The following example shows the necessary association of the document's and the annotations' data streams:

Example 6.11. Permanent Anchoring of Annotations

/**
 * stub interface in order to bundle a document and its annotations
 */
interface DocumentAndAnnotations {
  InputStream getContent();
  List<InputStream> getAnnotations();
}

public void convert(DocumentAndAnnotations doc) throws JMSException, JobException, IOException {
  
  try (Job job = createServerJob()) {
    job.setType("imprint annotations");

    // Instantiate nodes:
    StreamInputNode inputNode = new StreamInputNode();
    ReshapeNode reshapeNode = new ReshapeNode();
    StreamOutputNode outputNode = new StreamOutputNode();

    // Define the target MIME type (e.g. PDF)
    reshapeNode.setTargetMimeType("application/pdf");

    // Associate the annotations streams with the content
    reshapeNode.setOutputMode(ReshapeNode.OutputMode.ASSOCIATED_STREAM);

    // Assemble the workflow and perform the job
    job.attach(inputNode.appendSuccessor(reshapeNode).appendSuccessor(outputNode));
    job.submit();

    // Sending document content (with explicitly declared MIME type here)
    final StreamDescriptor contentSD = new StreamDescriptor("application/pdf");
    inputNode.addStream(doc.getContent(), contentSD);

    // Process annotations:
    for (InputStream annoStream : doc.getAnnotations()) {
      StreamDescriptor annoSD = new StreamDescriptor();
      // Associate document and annotation:
      annoSD.setParent(contentSD);
      // Declare the annotations' MIME type (e.g. Filenet P8):
      annoSD.setMimeType(ReshapeNode.AnnotationMimeTypes.FILENET_P8);
      // Send annotation stream
      inputNode.addStream(annoStream, annoSD);
    }
    // Signalise the end of input data
    inputNode.complete();
    
    // Handle the job result  (not shown)
    …    
}

Two settings are essential for this configuration:

The data streams of document and annotations have to be connected via the StreamDescriptor hierarchy, which means that the document's StreamDescriptor has to be set as the parent of the annotations' StreamDescriptors.

There are pre-defined constants for the available MIME types of annotations in the class ReshapeNode, which have to be set by all means. For further information on annotation formats and their properties see the annotations manual of the jadice document platform 5.

Confidential File Content

Note that an analysis of the document's content does not occur at any time during processing. Contents which are concealed by annotations can still be contained in the target data stream (depending on the file format). Moreover, objectionable and confidential (meta) data may remain in the document.

Unpacking Archive Files

In order to reduce network load, files are often packed. These compressed files can be unpacked by jadice server before further processing. Depending on the file format, this unpacking is realized in different node classes:

Table 6.3. Node Classes for Unpacking Archive Files

File format	Node class	Remarks
ZIP	`UnZIPNode`
RAR	`UnRARNode`
GZIP	`UnGZIPNode`	`.tar.gz` files have to pass `UnGZIPNode` first before they enter `UnTARNode`
TAR	`UnTARNode`

The following code example shows this process using the UnZIPNode:

Example 6.12. Using UnZIPNode

try (Job job = createServerJob()) {
    job.setType("unpack zip");
    
    // Instantiate nodes:
    //  1. data input node
    StreamInputNode siNode = new StreamInputNode();
    //  2. unpacking of ZIP archives
    UnZIPNode unzipNode = new UnZIPNode();
    //  3. output node
    StreamOutputNode soNode = new StreamOutputNode();
    
    // Assemble the workflow
    job.attach(siNode.appendSuccessor(unzipNode).appendSuccessor(soNode));
    
    // Perform the job
    job.submit();
    // Send data
    siNode.addStream(…);
    // Signalise the end of input data
    siNode.complete();

    // Wait for server reply
    for (Stream stream : soNode.getStreamBundle()) {
      // Reading the data: (1 stream per file in the archive)
      System.out.println("file name: " + stream.getDescriptor().getFileName());
      InputStream is = stream.getInputStream();
      // Work with this data (not shown)
      …        
    }
  }

Converting Unknown Entry Data into a Consistent Format (PDF)

Document standardization is extremely useful, especially in long-term archiving. A standardization achieved by accessing the data stream, automatically analyzing the data, target-oriented and dynamic processing and eventual archiving has many advantages:

The retrieving application does not need any knowledge of the source file and format. There is no danger of corrupted or malicious data or documents. Moreover, network traffic is minimized. Because of its structure jadice server can flexibly control the conversion result at any time.

Example 6.13. Converting Miscellaneous Data Streams to PDF

try (Job job = createServerJob()) {
    job.setType("convert to pdf");
    
    // Instantiate nodes:
    //  1. data input node
    StreamInputNode siNode = new StreamInputNode();
    //  2. node for dynamic data converserion
    DynamicPipelineNode dpNode = new DynamicPipelineNode();
    dpNode.setRuleset(new URI("resource:/dynamic-pipeline-rules/default.xml"));
    //  3. merge input data (1...n streams to a single stream)
    PDFMergeNode pmNode = new PDFMergeNode();
    //  4. output node
    StreamOutputNode soNode = new StreamOutputNode();
    
    // Assemble the workflow
    job.attach(siNode.appendSuccessor(dpNode).appendSuccessor(pmNode).appendSuccessor(soNode));

    // Perform the job and send data
    job.submit();
    siNode.addStream(…);
    siNode.complete();

    // Wait for server reply
    for (Stream stream : soNode.getStreamBundle()) {
      // Work with the result
      InputStream is = stream.getInputStream();
      …        
    }
  }

If you add an own implementation of a JobListener to this job, you can find out which further Nodes have been dynamically created by jadice server with the help of the method subPipelineCreated.

You can find the applied rules in the directory /server-config/dynamic-pipeline-rules. These XML-based rules can be adapted to your requirements. The XML scheme, which can be found in the same directory, will help you.

Converting Office Documents to PDF

Example 6.14. Accessing LibreOffice

try(Job job = createServerJob()) {
    job.setType("libreoffice to pdf");  
    
    // Instantiate nodes:
    //  1. data input node
    StreamInputNode siNode = new StreamInputNode();
    //  2. Conversion via LibreOffice
    LibreOfficeConversionNode loNode = new LibreOfficeConversionNode();
    // 3. merge input data (1...n streams to a single stream)
    PDFMergeNode pmNode = new PDFMergeNode();
    // 4. output node
    StreamOutputNode soNode = new StreamOutputNode();
    
    // Assemble the workflow
    job.attach(siNode.appendSuccessor(loNode).appendSuccessor(pmNode).appendSuccessor(soNode));
    
    // Perform the job and send document data
    job.submit();
    siNode.addStream(…);
    siNode.complete();

    // Wait for server reply
    for (Stream stream : soNode.getStreamBundle()) {
      // Reading the data
      InputStream is = stream.getInputStream();
      // Work with this data (not shown)
      …        
    }
  }

Note

In order to access LibreOffice the class path has to be set according to the parameters described in the section called “Configuring LibreOffice”.

Note

Documents in Word2007 format (file ends in docx) have to be pre-processed by the StreamAnalysisNode before conversion (see the section called “Identifying Unknown Entry Data”).

Converting E-Mails to PDF

When converting e-mails, the e-mail is fetched directly from the mail server. Thus, the respective access information has to be provided.

The process is similar to the dynamic conversion (see the section called “Converting Unknown Entry Data into a Consistent Format (PDF)”). The e-mail is analyzed, all potential attachments like office documents, pictures etc. are converted, summarized in a list which is annexed to the e-mail's text.

During this process archive files are unpacked and their content is integrated in the conversion.

Example 6.15. Converting E-Mails Fetched Directly from the Server

try (Job job = createServerJob()) {
    job.setType("mail to pdf");

    // Instantiate nodes:
    //  1. input node that retrieves the mail from a mail server
    JavamailInputNode jiNode = new JavamailInputNode();
    //   Configuration of the mail server
    jiNode.setStoreProtocol("<protocol>"); // POP3 or IMAP
    jiNode.setHostName("<server>");
    jiNode.setUsername("<user>");
    jiNode.setPassword("<password>");
    jiNode.setFolderName("<e-mail folder>");
    jiNode.setImapMessageUID(…);

    //  2. Perform the email conversion
    ScriptNode scNode = new ScriptNode();
    scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy"));
    // 3. merge data (1...n streams to a single stream)
    PDFMergeNode pmNode = new PDFMergeNode();
    // 4. output node
    StreamOutputNode soNode = new StreamOutputNode();

    // Assemble the workflow and perform the job
    job.attach(jiNode.appendSuccessor(scNode).appendSuccessor(pmNode).appendSuccessor(soNode));
    job.submit();

    // Wait for server reply
    for (Stream stream : soNode.getStreamBundle()) {
      // Work with the result
      InputStream is = stream.getInputStream();
      …        
    }
  }

If you do not want to fetch e-mails with the JavamailInputNode via an IMAP or POP3 account but want to import them as a eml file, for example, you have to use the MessageRFC822Node in between as it carries out the separation of e-mail header and body:

Example 6.16. Converting an eml File

Job job = createServerJob();
// Instantiate nodes:
//  1. input node the receives the mail from the client
StreamInputNode siNode = new StreamInputNode();
//  2. Separate of mail header and mail body
MessageRFC822Node msgNode = new MessageRFC822Node();
//  3. Perform the email conversion
ScriptNode scNode = new ScriptNode();
scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy"));
// Further procedure as above

E-Mails in MS Outlook format (msg files) can be converted into a format supported by jadice server with the TNEFNode without launching MS Outlook and are then ready for further conversion:

Example 6.17. Converting a msg File

Job job = createServerJob();
// Instantiate nodes:
//  1. input node the receives the mail from the client
StreamInputNode siNode = new StreamInputNode();
//  2. Pre-processing of MSG files
TNEFNode tnefNode = new TNEFNode();
tnefNode.setInputFormat(InputFormat.MSG);
//  3. Perform the email conversion
ScriptNode scNode = new ScriptNode();
scNode.setScript(new URI("resource:email-conversion/EmailConversion.groovy"));
// Further procedure as above

Note

Note that the mail's body of msg files is usually in rich text format (rtf) and is thus converted to PDF via LibreOffice in the standard configuration.

In the configuration shown above a separator page which contains the meta data of the attachment is automatically generated for every file attachment. If you do not want these separator pages, you can deactivate them for all attachments with the following configuration of the ScriptNode:

scNode.getParameters().put("showAttachmentSeparators", false);

Another configuration enables the conversion of formatted e-mails. If these e-mails have been sent in HTML as well as plain text format, the HTML part will be converted by default. If the plain text part should be converted instead, the ScriptNode needs to be configured as follows:

scNode.getParameters().put("preferPlainTextBody", true);

Regardless of which part is chosen to be converted, the other one can be be attached to the e-mail additionally. Thus, the converted e-mail can be displayed in both HTML and plain text format. In order to attach the other format use the following configuration:

scNode.getParameters().put("showAllAlternativeBody", true);

The following setting prevents jadice server from loading images and other files from unknown sources which are referenced in e-mails:

scNode.getParameters().put("allowExternalHTTPResolution", false);

The parameter unhandledAttachmentAction controls the treatment of attachments whose format cannot be detected or is not targeted for conversion with jadice server:

scNode.getParameters().put("unhandledAttachmentAction", "failure");

The following values are accepted for this parameter:

Value	Meaning
`warning`	A warning is written into the log.
`error`	An error is written into the log (default value).
`failure`	The respective job aborts with an error.

In order to indicate image files which are referenced in an e-mail but have not been converted, the following placeholders are inserted in the image files' places:

Value	Meaning
	The image was not loaded due to the setting `allowExternalHTTPResolution` (see above).
	The image file could not be loaded.

By default, if images within the HTML markup of an e-mail body are too large to fit onto a PDF page, either by width or by height, they are split onto multiple pages.

Note

In general complex HTML markup is not intended for media that is using a page layout. Hence converting the HTML content could be lossy especially when splitting content due to page breaks. Moreover HTML cannot always be forced into a maximum breadth, which is why sometimes pages are wider than the chosen format. For example when there are tables or complex layouts in HTML markup it is not possible to scale them appropriately onto a specific page format.

Nevertheless, in order to still match the PDFs page format (if possible) in case of images exceeding the width or the height of the page in HTML markup the MailBodyCreatorNode can now receive the following configuration options via setHtmlProcessingMode():

Table 6.4. Converting E-Mails: Scaling images in HTML markup to make them fit on a PDF page

Configuration Values	Description
Configuration Values	Description	`DEFAULT`	Default setting. If an image is too wide or too high it will be split onto multiple pages.
`PLACEHOLDER_ATTACHMENTS_BY_SIZE`	With this setting images that don't fit onto the page format are moved to the attachments part of the PDF document. A placeholder is inserted at the original image position.
`PLACEHOLDER_ATTACHMENTS_ALL`	Move all images to the attachments part of the PDF.
`FIT_BODY_IMAGES`	If possible this setting tries to scale the images such that they fit onto the pdf pages without increasing the page format. This cannot be guaranteed in all cases since otherwise there could be a lossy conversion with complex HTML markups.

Decreasing images' resolution in PDF documents

PDF documents can become very large due to images embedded in the file. With the help of our PDFImageOptimizationNode you can reduce the file size of an existing PDF document by decreasing the resolution of the embedded images. The resolution will be reduced according to a DPI threshold value, which can be set as a parameter. The node checks for each individual embedded image whether its resolution exceeds the threshold value. If it does, the image will be replaced by a JPEG image whose resolution corresponds to the threshold value. The image quality which is to be used in generating the JPEG can also be set as a parameter (as a percentage value).

The size of the page on which the image is located plays an important role in determining its resolution. Usually, all pages in PDF documents have the same size (such as A4). However, individual page sizes can be set in PDF format, which may result in documents containing pages of different sizes. For such documents the PDFImageOptimizationNode provides the option of setting a target page size for individual pages. If you choose not to specify a target page size, the resolution will be calculated according to the page size used for the overall document. Setting a target page size may be sensible, particularly if you are concerned about image quality when printing the document. By setting the target page size you may thus substantially reduce the overall documents size while retaining image quality.

Example 6.18. Using the PDFImageOptimizationNode

try (Job job = createServerJob()) {
  job.setType("optimize images");
  
  // Instantiate nodes:
  //  1. data input node
  StreamInputNode siNode = new StreamInputNode();
  //  2. optimize embedded images
  PDFImageOptimizationNode imgOptimizationNode = new PDFImageOptimizationNode();

  //  3. set the image resolution threshold to 150 DPI (default: 300)
  imgOptimizationNode.setMaxResolution(150);

  //  4. set the JPEG image quality to 80 percent (default: 75)
  imgOptimizationNode.setJPEGQuality(0.8f);

  //  5. set the page size of the output device (optional)
  imgOptimizationNode.setTargetPageSize(PDFImageOptimizationNode.PageSize.A4);

  //  6. output node
  StreamOutputNode soNode = new StreamOutputNode();
  
  // Assemble the workflow and perform the job
  job.attach(siNode.appendSuccessor(imgOptimizationNode).appendSuccessor(soNode));
  job.submit();
  
  // Send PDF document
  siNode.addStream(…);
  
  // Signalise the end of input data
  siNode.complete();
  
  // Wait for server reply
  for (Stream stream : soNode.getStreamBundle()) {
    // Reading the data
    InputStream is = stream.getInputStream();
    // Work with this data (not shown)
    …        
  }
}

Accessing External Software

The ExternalProcessCallNode makes accessing external software very easy. jadice server takes care of automatically converting incoming and outgoing data streams to temporary files and deleting them after the external software has processed the data.

The only precondition for this operation is that the software can be addressed via command line on the server.

Example 6.19. Using the ExternalProcessCallNode

try (Job job = createServerJob()) {
    job.setType("external my converter");

    // Instantiate nodes:
    //  1. data input node
    StreamInputNode siNode = new StreamInputNode();
    //  2. start an external process
    ExternalProcessCallNode epcNode = new ExternalProcessCallNode();
    //    Configuration:
    //      - Program name (back slashes must be escaped!)
    epcNode.setProgramName("C:\\Programme\\MyConverter\\MyConverter.exe");
    //      - Command line parameters (jadice server will substitute ${infile} / ${outfile:pdf})
    epcNode.setArguments("-s -a ${infile} /convert=${outfile:pdf}");
    //  3. output node
    StreamOutputNode soNode = new StreamOutputNode();

    // Assemble the workflow
    job.attach(siNode.appendSuccessor(epcNode).appendSuccessor(soNode));

    // Submit job and send data
    job.submit();
    StreamDescriptor sd = new StreamDescriptor();
    // jadice server will use the name when it stores this file and passes it to the external program
    sd.setFileName("myfile.dat");
    siNode.addStream(new BundledStream(…, sd));
    siNode.complete();

    // Wait for server reply
    for (Stream stream : soNode.getStreamBundle()) {
      // Work with this data (not shown)
      InputStream is = stream.getInputStream();
      …  
    }
  }

Filtering out files during extraction of archive file formats (ZIP, RAR, 7ZIP, TAR)

e-mails and archive file formats that shall be converted sometimes may contain files that are irrelevant in terms of business context. To filter out such files there is the possibility to configure a rule-based filtering using the file name.

These rules can be defined in a server-side configuration file (UTF-8 encoding required).

Below short introduction shows a quick example for filtering rules.

Table 6.5. Examples for rules to filter out files during extraction of archive file formats.

Example	Explanation
Example	Explanation	*/CVS/	Matches all files in CVS directories that can be located anywhere in the directory tree. Matches: org/apache/CVS/Entry1 org/apache/CVS/EntryN org/apache/jakarta/tools/ant/CVS/Entry1 org/apache/jakarta/tools/ant/CVS/EntryN But not: org/apache/CVS/foo/bar/Entries The subdirectories "foo/" and "bar/" of the folder CVS don't match.
org/apache/jakarta/**	Matches all files in the org/apache/jakarta directory tree. Matches: org/apache/jakarta/tools/ant/docs/index.html org/apache/jakarta/test.xml But not: org/apache/xyz.java The subfolder of "org/apache" "jakarta/" is missing hence this rule doesn't match.
org/apache/*/CVS/	Matches all files in CVS directories that are located anywhere in the directory tree under org/apache. Matches: org/apache/CVS/Entry1 org/apache/CVS/EntryN org/apache/jakarta/tools/ant/CVS/Entry1 org/apache/jakarta/tools/ant/CVS/EntryN But not: org/apache/CVS/foo/bar/Entries The subdirectories "foo/" and "bar/" of the folder CVS don't match.
/test/	Matches all files that have a test element in their path, including test as a filename.

More information about possible rules can be found at https://ant.apache.org/manual/dirtasks.html#patterns.

In order to apply a ruleset for an archive file format a configuration for the extraction worker has to be added to the configuration file server-config/application/workers.xml. This functionality is available for the following workers: UnZIPWorker, UnRARWorker, UnSevenZIPWorker and UnTARWorker.

Example 6.20. Configuring a worker for filtering out files of archive file formats (workers.xml)

<bean id="unzipFilterRulesBean" class="com.levigo.jadice.server.archive.worker.filter.AntPatternArchiveEntryFilter">
    <!-- The file unzipFilterRules.txt has to be provided in the folder <jadice-server>/server-config/custom/ -->
    <property name="antPatternFilterRulesURI" value="resource://custom/unzipFilterRules.txt" />
</bean>

<workers:worker class="com.levigo.jadice.server.archive.worker.UnZIPWorker">
    <property name="filters">
         <util:list>
             <bean class="com.levigo.jadice.server.archive.worker.filter.OSXFilter" />
             <ref bean="unzipFilterRulesBean"/>
         </util:list>
    </property>
</workers:worker>

The namespace xmlns:util="http://www.springframework.org/schema/util" and the schemaLocations http://www.springframework.org/schema/util as well as http://www.springframework.org/schema/util/spring-util-2.5.xsd have to be present in the workers.xml.

Prev	Up	Next
	Home

Chapter 6. Application / Functionality

Creating a Server Job

Creating a JobListener

Configuring Limits

Identifying Unknown Entry Data

Extracting Document Information

Merging Several PDF Documents

Converting to TIFF

Permanent Anchoring of Annotations

Confidential File Content

Unpacking Archive Files

Converting Unknown Entry Data into a Consistent Format (PDF)

Converting Office Documents to PDF

Note

Note

Converting E-Mails to PDF

Note

Note

Decreasing images' resolution in PDF documents

Accessing External Software

Filtering out files during extraction of archive file formats (ZIP, RAR, 7ZIP, TAR)