Skip to: Site menu | Main content

Smooks User Guide Print

Table Of Contents

Overview

Smooks is a Java Framework/Engine for processing XML and non XML data (CSV, EDI, Java etc).

Smooks can be used to:

  • Perform a wide range of Data Transforms - XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, Java to XML, Java to EDI, Java to CSV, Java to Java, XML to Java, EDI to Java etc.
  • Populate a Java Object Model from a data source (CSV, EDI, XML, Java etc). Populated object models can be used as a transformation result itself, or can be used by (e.g.) Templating resources for generating XML or other character based results. Also supports Virtual Object Models (Maps and Lists of typed data), which can be used by EL and Templating functionality.
  • Process huge messages (GBs) - Split, Transform and Route message fragments to JMS, File, Database etc destinations.
  • Enrich a message with data from a Database, or other Datasources.
  • Perform Extract Transform Load (ETL) operations by leveraging Smooks' Transformation, Routing and Persistence functionality.



Smooks supports both DOM and SAX processing models, but adds a more "code friendly" layer on top of them. It allows you to plug in your own "ContentHandler" implementations (written in Java or Groovy), or reuse the many existing handlers.

Smooks is an ideal fit as part of an overall Integration Solution.



Getting Started

The easiest way to get started with Smooks is to:

  1. Have a quick read of the Smooks Basics section.
  2. Download and try out some of the tutorials.

The tutorials are the perfect base upon which to integrate Smooks into your application.

FAQs

See the Online FAQ

Maven

All Milyn components (including Smooks) are available from the Codehaus Maven Repository at http://dist.codehaus.org.

If your building Milyn related components, please use:

  1. Java 1.5 (not 1.6 and above)
  2. Maven v2.0.8

Artifact IDs

The artifactId for each of the Milyn components is as follows:

  • Smooks Core: "milyn-smooks-core"
  • Smooks Cartridges:
    • CSV: "milyn-smooks-csv"
    • EDI: "milyn-smooks-edi"
    • Javabean: "milyn-smooks-javabean"
    • JSON: "milyn-smooks-json"
    • Routing: "milyn-smooks-routing"
    • Templating: "milyn-smooks-templating"
    • CSS: "milyn-smooks-css"
    • Servlet: "milyn-smooks-servlet"
  • Commons: "milyn-commons" (all components depend on Commons, so no need to specify a dependency on this module if your dependent on one of the others)
  • EdiSax: "milyn-edisax"
  • Magger: "milyn-magger"
  • Tinak: "milyn-tinak"

Note: All cartridges depend on Smooks Core (i.e. "milyn-smooks-core"). Therefore, if your project depends on one of the cartridges, there's no need to specify the dependency on Smooks Core.

See the POMs in the Tutorials as examples of Maven based applications that depend on different Smooks Cartridges.

Smooks v1.1+

The Maven repo groupId for all Milyn components (Smooks v1.1+) is "org.milyn".

Smooks v1.1 is not released yet, but SNAPSHOTs are available from "http://snapshots.repository.codehaus.org". You need to list this Maven repo in your project POM e.g.

<repositories>
    .....
    <repository>
        <id>codehaus.m2.snapshots</id>
        <url>http://snapshots.repository.codehaus.org</url>
        <releases>
            <enabled>false</enabled>
        </releases>
    </repository>
</repositories>

Once Smooks v1.1 is released, the binaries will be available from the maven2 repo on ibiblio.

Smooks v1.0/v1.0.x

The Maven repo groupId for all Milyn components (pre Smooks v1.1) is "milyn".

Also note that the v1.0 and v1.0.1 binaries are not available from ibiblio. You need to list this Maven repo in your project POM e.g.

<repositories>
    .....
    <repository>
        <id>codehaus</id>
        <url>http://dist.codehaus.org</url>
    </repository>
</repositories>

Smooks Basics

The most commonly accepted definition of Smooks would be that it is a "Transformation Engine". However, at it's core, Smooks makes no mention of "data transformation". The smooks-core codebase is designed simply to support hooking of custom "Visitor" logic into an Event Stream produced by a data Source of some kind (XML, CSV, EDI, Java etc). As such, smooks-core is simply a "Structured Data Event Stream Processor".

Of course, the most common application of this will be in the creation of Transformation solutions i.e. implementing Visitor logic that uses the Event Stream produced from a Source message to produce a Result of some other kind. The capabilities in smooks-core enable more than this however. We have implemented a range of other solutions based on this processing model:

  1. Java Binding: Population of a Java Object Model from the Source message.
  2. Message Splitting & Routing: The ability to perform complex splitting and routing operations on the Source message, including routing to multiple destinations concurrently, as well as routing different data formats concurrently (XML, EDI, CSV, Java etc).
  3. Huge Message Processing: The ability to declaratively consume (transform, or split and route) huge message without writing lots of high maintenance code.

Basic Processing Model

As stated above, the basic principal of Smooks is to take a data Source of some kind (e.g. XML) and from it generate an Event Stream, to which you apply Visitor logic to produce a Result of some other kind (e.g. EDI).

Many different data Source and Result types are supported, meaning many different transformation types are supported, including (but not limited to):

  1. XML to XML
  2. XML to Java
  3. Java to XML
  4. Java to Java
  5. EDI to XML
  6. EDI to Java
  7. Java to EDI
  8. CSV to XML
  9. CSV to ...
  10. etc etc

In terms of the Event Model used to map between the Source and Result, Smooks currently supports DOM and SAX Event Models. We will concentrate on the SAX event model here. If you want low level details on either models, please consult the Smooks Developer Guide. The SAX event model is based on the hierarchical SAX events generated from an XML Source (startElement, endElement etc). However, this event model can be just as easily applied to other structured/hierarchical data Sources (EDI, CSV, Java etc).

The most important events (typically) are the visitBefore and visitAfter events. The following illustration tries to convey the hierarchical nature of these events.

Simple Example

In order to consume the SAX Event Stream produced from the Source message, you need to implement one or more of the SAXVisitor interfaces (depending on which events you need to consume).

The following is a very simple example of how you implement Visitor logic and target that logic at the visitBefore and visitAfter events for a specific element in the Event Stream.  In this case we target the Visitor logic at the <xxx> element events.





As you can see, the Visitor implementation is very simple; one method implementation per event.  To target this implementation at the <xxx> element visitBefore and visitAfter events, we need to create a Smook configuration as shown (more on "Resource Configurations" in the following sections).

The Smooks code to execute this is very simple:

Smooks smooks = new Smooks("/smooks/echo-example.xml");


smooks.filter(new StreamSource(inputStream), null);


Note that in this case we don't produce a Result (it's specified as "null").  Also note that we don't interact with the "execution" of the filtering process in any way, since we don't explicitly create an ExecutionContext and supply it to the Smooks.filter method call.

This example illustrated the lower level mechanics of the Smooks Programming Model. In reality however, users are not going to want to solve their problems by implementing lots Java code themselves from scratch. For this reason, Smooks is shipped with quite a lot of pre-built functionality i.e. ready to use Visitor logic. We bundle this Visitor logic based on functionality and we call the bundles "Cartridges".

Smooks Cartridges

The basic functionality of Smooks Core can be extended through the creation of what we call a "Smooks Cartridge". A Cartridge is simply a Java archive (jar) containing reusable Content Handlers (Visitor Logic). A Smooks Cartridge should provide "ready to use" support for a specific type of XML analysis or transformation.

Using Maven?

Name DOM Support SAX Support Description
JavaBean Enables population of Java Object Model from data embedded in

a data stream (XML, non XML, Java etc). See Tutorials. Download.
Templating  FreeMarker

 XSL

 StringTemplate
 FreeMarker

 XSL

 StringTemplate
Enables fragment-level templating using different templating solutions

e.g. FreeMarker, StringTemplate and XSLT. See Tutorials. Download.
Routing  File

 JMS

 Database
 File

 JMS

 Database
Enables routing of message fragments (including populated object models)

to a range of different destination types. See Tutorials. Download.
Scripting  Groovy  Groovy Enables fragment-level Transformation/Analysis using different

scripting languages. Currenly supports Groovy. See Tutorials. Download.
EDI Smooks Cartridge that converts an EDI message data stream

into a stream of SAX events. Download.
CSV Smooks Cartridge that converts a Comma Separated Value (CSV)

data stream into a stream of SAX events. Download.
JSON Smooks Cartridge that converts a JSON formatted

data stream into a stream of SAX events. (Only available in v1.1-SNAPSHOT).
Misc Contains miscellaneous resources for performing common analysis/transformation tasks

on an XML stream e.g. rename an element, delete an element, delete and attribute etc. Download.
Servlet Plugs Smooks into the J2EE Servlet Container. This allows Smooks to be

used for Servlet Response Analysis and Transformation e.g. to optimse the

Servlet Response for the requesting browser make/model. See Tutorials. Download.
CSS Makes Cascading Style Sheet (CSS) information easily available to web content

analysis or transformation logic. Supports linked or inline CSS Download.

Filtering Process Selection (DOM or SAX?)

This is done by Smooks based on the following criteria:

  1. If all visitor resources (i.e. not including non element visitor resources) implement only the DOM visitor interfaces (DOMElementVisitor or SerializationUnit), then the DOM processing model is selected.
  2. If all visitor resources (i.e. not including non element visitor resources) implement only the SAX visitor interface (SAXElementVisitor), then the SAX processing model is selected.
  3. If all visitor resources (i.e. not including non element visitor resources) implement both the DOM and SAX visitor interfaces, then the DOM processing model is selected, unless the Smooks resource configuration contains the stream.filter.type global configuration parameter (see below).

The stream.filter.type global configuration parameter is configured ("DOM"/"SAX") as follows:

<resource-config selector="global-parameters">
    <param name="stream.filter.type">SAX</param>
</resource-config>

Checking the Smooks Execution Process

As Smooks performs the filtering process (processing the Event Stream generated from the Source), it publishes events that can be captured and programmatically analyzed during/after execution.

The easiest way to generate an execution report out of Smooks is to configure the ExecutionContext to generate a report. Smooks supports generation of a HTML report via the HtmlReportGenerator.

The following is an example of how to configure Smooks to generate a HTML report.

Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml");
ExecutionContext execContext = smooks.createExecutionContext();

execContext.setEventListener(new HtmlReportGenerator("/tmp/smooks-report.html"));
smooks.filter(new StreamSource(inputStream), new StreamResult(outputStream), execContext);

The HtmlReportGenerator is a very useful tool during development with Smooks.  It's the nearest thing Smooks has to an IDE based Debugger (which we hope to have in a future release).  It can be very useful for diagnosing issues, or simply as a tool for comprehending a Smooks transformation.

An example HtmlReportGenerator report can be seen online here

Of course you can also write and use your own ExecutionEventListener implementations.

Smooks Resources

A "Smooks Resource" is anything that can be used by Smooks in the process of analyzing or transforming a data stream. They could be pieces of Java logic (DOMElementVisitor), some text or script resource, or perhaps simply a configuration parameter.

Resource Configuration Properties

  • selector: Selector string. Used by Smooks to "lookup" a resource configuration. This is typically the message fragment name, but as mentioned above, not all resources are transformation/analysis resources targeted at a message fragment - this is why we didn't call this attribute "target-fragment".
  • selector-namespace: The XML namespace of the selector target for this resource. This is used to target ContentDeliveryUnits at XML elements from a specific XML namespace e.g. "http://www.w3.org/2002/xforms". If not defined, the resource is targeted at all namespces.
  • target-profile: A list of 1 or more profile targeting expressions. (supports wildcards "*").



Example selectors:

  1. The target fragment name (e.g. for HTML - table, tr, pre etc). This type of selector can be contextual in a similar way to contextual selectors in CSS e.g. "td ol li" will target the resource at all "li" elements nested inside an "ol" element, which is in turn nested inside a "td" element. See sample configurations above. Also supports wildcard based fragment selection ("*").
  2. "$document" is a special selector that targets a resource at the "document" fragment i.e. the whole document, or document root node fragment.
  3. Targeting a specific SmooksXMLReader at a specific profile. See the csv-to-xml and edi-to-xml tutorials.

XML Based Configuration

Smooks can be manually configured (through code), but the easiest way of working is through XML.

Basic Sample:

Note that it is not using any profiling. The resource-config element maps directly to an instance of this class.

 <?xml version='1.0'?>
 <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">
      <resource-config selector="order order-header">
          <resource type="xsl">/com/acme/transform/OrderHeaderTransformer.xsl</resource>
      </resource-config>
      <resource-config selector="order-items order-item">
          <resource>com.acme.transform.MyJavaOrderItemTransformer</resource>
      </resource-config>
 </smooks-resource-list>

More Complex Sample:

This sample uses profiling. So resource 1 is targeted at both "message-exchange-1" and "message-exchange-2", whereas resource 2 is only targeted at "message-exchange-1" and resource 3 at "message-exchange-2" (see Smooks.createExecutionContext(String)).

 <?xml version='1.0'?>
 <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">
      <profiles>
          <profile base-profile="message-exchange-1" sub-profiles="message-producer-A, message-consumer-B" />
          <profile base-profile="message-exchange-2" sub-profiles="message-producer-A, message-consumer-C" />
      </profiles>
 (1)  <resource-config selector="order order-header" target-profile="message-producer-A">
          <resource>com.acme.transform.AddIdentityInfo</resource>
      </resource-config>
 (2)  <resource-config selector="order-items order-item" target-profile="message-consumer-B">
          <resource>com.acme.transform.MyJavaOrderItemTransformer</resource>
          <param name="execution-param-X">param-value-forB</param>
      </resource-config>
 (3)  <resource-config selector="order-items order-item" target-profile="message-consumer-C">
          <resource>com.acme.transform.MyJavaOrderItemTransformer</resource>
          <param name="execution-param-X">param-value-forC</param>
      </resource-config>
 </smooks-resource-list>

For more examples, check out the Tutorials.

Java Based Configuration

TODO!!

Smooks Usage Patterns

This section describes a number of common use cases in which Smooks can be used.

XML to XML Transformations

Smooks provides a number of alternatives when performing XML to XML Transformations. Your choice will depend on the characteristics of the transformation to be performed. If you simply need to make a "tweak" on a single element in an XML document (e.g. modify a namespace attribute on an element) your choice will be different to that of a situation where you need to perform a major restructuring on the message and (for example) enrich the data in the message with additional data from a database or some other data source.

XSL Template Driven Transformations

Smooks allows you to perform fragment based XML to XML transformations via the XslContentHandlerFactory, which adds support for "xsl" resource types into Smooks.

As with everything in Smooks, your XSL Stylesheet is defined as a <resource-config> with the resource being either an inlined XSLT, or a URI reference to an external XSL Stylesheet. The configuration of an XSL Stylesheet resource is as follows:

 <resource-config selector="target-element">
     <resource type="xsl">XSL Resource - Inline or URI</resource>

     <!-- (Optional) The action to be applied on the template content. Should the content
          generated by the template:
          1. replace ("replace") the target element, or
          2. be added to ("addto") the target element, or
          3. be inserted before ("insertbefore") the target element, or
          4. be inserted after ("insertafter") the target element.
          5. be bound to ("bindto") an ExecutionContext variable named by the "bindId" param.
          Default "replace".-->
     <param name="action">replace/addto/insertbefore/insertafter/bindto</param>

     <!-- (Optional) Is this XSL template resource a complete XSL template, or just a "Templatelet".
          Only relevant for inlined XSL resources.  URI based resource are always assumed to NOT be templatelets.
          Default "false" (for inline resources).-->
     <param name="is-xslt-templatelet">true/false</param>

     <!-- (Optional) Should the template be applied before (true) or
             after (false) Smooks visits the child elements of the target element.
             Default "false".-->
     <param name="applyTemplateBefore">true/false</param>

     <!-- (Optional) The name of the OutputStreamResource
             to which the result should be written. If set, the "action" param is ignored. -->
     <param name="outputStreamResource">xyzResource</param>

     <!-- (Optional) Template encoding.
          Default "UTF-8".-->
     <param name="encoding">encoding</param>

     <!-- (Optional) bindId when "action" is "bindto".
     <param name="bindId">xxxx</param>

     <!-- (Optional) Fail on XSL Transformer Warning.
          Default "true".-->
     <param name="failOnWarning">false</param> <!-- Default "true" -->

 </resource-config>

By default, XSLTs are applied to a fragment after Smooks has already processed the child content of that fragment, with the default action being that of replacing the processed fragment with the result of the templating operation. Both of these behaviors can be overridden by configuration parameters (see configuration above).

Another action of note is the "bindto" action, which allows you to bind the templating result to Smooks ExecutionContext under the key specified in the "bindId" <param>. This action is available in support of fragment routing via the Routing Cartridge.

Points to Note Regarding XSL Support

  1. XSL Templating is only supported through the DOM Filter. It is not supported through the SAX Filter. This can (depending on the XSL being applied) result in lower performance when compared to SAX based application of XSL.
  2. Smooks applies XSLs on a message fragment basis (i.e. DOM Element Nodes) Vs to the whole document (i.e. DOM Document Node). This can be very useful for fragmenting/modularizing your XSLs, but don't assume that an XSL written and working standalone (externally to Smooks and on the whole document) will automatically work through Smooks without modification. For this reason, Smooks does handle XSLs targeted at the document root node differently in that it applies the XSL to the DOM Document Node (Vs the root DOM Element). The basic point here is that if you already have XSLs and are porting them to Smooks, you may need to make some tweaks to the Stylesheet.
  3. XSLs typically contain a template matched to the root element. Because Smooks applies XSLs on a fragment basis, matching against the "root element" is no longer valid. You need to make sure the Stylesheet contains a template that matches against the context node (i.e. the targeted fragment).

My XSLT Works Outside Smooks, but not Inside?

This can happen and is most likely going to be a a result of one of the following:

  1. The Fragment based Processing Model: Your Stylesheet contains a template that's using an absolute path reference to the document root node. This will cause issues in the Smooks Fragment based Processing Model because the element being targeted by Smooks is not the document root node. Your XSLT needs to contain a template that matches against the context node being targeted by Smooks. See the following example.
  2. SAX Vs DOM Processing: You are not comparing like with like. Smooks currently only supports a DOM based processing for XSL. In order to do an accurate comparison, you need to use a DOMSource (namespace aware) when executing the XSLT outside Smooks. It has been noticed that a given XSL Processor does not always produce the same output when applying a given XSLT using SAX or DOM.

Example



The primary goals of this example are to introduce you to the following:

  1. A very basic Fragment Transformer written in XSLT.
  2. The Smooks configuration file.
  3. Executing the Smooks Transformation.

 

SVN - Download - Other Tutorials

Other Relevant Info:

 

To Build: "mvn clean install"

To Run: "mvn exec:java"

The Fragment Transformer

In this example we build a very simple (silly) fragment transformer in XSLT. The applied transform is exactly the same as that carried out in the "java-basic" tutorial. Sure, it's totally trivial. The point of this tutorial is purely to demonstrate how to hook in a fragment transformer into Smooks. More realistic usecases can be seen in some of the other tutorials.

So here's the XSLT we use in this tutorial:

<!-- /example/BasicJavaTransformer.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
				version="1.0">

    <xsl:output method="xml" encoding="UTF-8" />

    <xsl:template match="b">
        <xxx></xxx>
    </xsl:template>

</xsl:stylesheet>

It simply generates an "<xxx></xxx>" fragment for the currently matching "b" context element.

The Smooks Configuration

In order to apply this transformer to a message fragment, a Smooks Configuration needs to be created. This configuration will target the transformer at a particular message fragment.

Here's the configuration ("smooks-config.xml"):

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">

    <resource-config selector="c b">
	<resource>/example/BasicXslTransform.xsl</resource>
        <param name="is-xslt-templatelet">false</param>
    </resource-config>

</smooks-resource-list>

As with the java-basic tutorial, the resource-config tells Smooks to apply the "/example/BasicJavaTransformer.xsl" resource on all <b> elements that are enclosed by a parent <c> element.

So, taking the sample message supplied with this example:

<a>
    <b>
        <c>
            <b></b>
        </c>
    </b>
</a>

Smooks produces the following output(same as with the java-basic tutorial):

<a>
    <b>
        <c>
            <xxx></xxx>
        </c>
    </b>
</a>

Executing The Transformation

Again, it's exactly the same as with the java-basic tutorial:

// Instantiate Smooks with the config...
Smooks smooks = new Smooks("smooks-config.xml");

// Filter the input message to the outputWriter...
smooks.filter(new StreamSource(messageInStream), new StreamResult(messageOutStream));

Of course, you'd typically cache the Smooks instance.

See the example/Main.java in the example source.

.

FreeMarker Template Driven Transformations

The FreeMarker templating engine, in conjunction with the Java Binding capabilities of the Javabean Cartridge, can be used to transform messages into a range of different formats (XML, EDI, CSV etc). Basically, this works by merging a Java Object Model (POJO) with a FreeMarker template. This has the effect of making your transforms very clear to understand and visualize and would be our preferred method of Template based transforms.

Lets walk though an example to show what we mean. We begin with our Java Object Model, our POJO, which simulates a PatientRecord (getters and setters are left out):

 public class PatientRecord implement Serializable{
     private String name;
}

Next we look at our FreeMarker template (saved in a file named recordTemplate.ftl) :

<record>
    <firstName>${patient.name}</firstname>
     ...
</record>

Here, "patient" refers to a beanId previously bound in the ExcecutionContext. This could have been bound by a previous transformation or been set as the JavaSource input like this:

//    create the input for the transformation
Map<String, Object> beanMap = new HashMap<String, Object>();
JavaSource source = new JavaSource();
source.setBeans( beanMap );
beanMap.put( "patient", patientRecord );

//    create tranformation ouput result
StreamResult result = new StreamResult( new StringWriter() );

//    transform
smooks.filter( source, result, executionContext );

The next step is to create a Smooks configuration:

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">

    <resource-config selector="global-parameters">
        <param name="stream.filter.type">SAX</param>
    </resource-config>

    <resource-config selector="$document">
        <resource>recordTemplate.ftl</resource>
    </resource-config>

</smooks-resource-list>

In the example above, the Java objects used in the FreeMarker template are manually injected into the Smooks Execution context (via the BeanAccessor class). While this is a valid usecase, it's not expected to be the normal usage pattern.

In most cases, the Java Object model should be populated from the Source message using the Javabean Cartridge using its Java Binding capabilities. Used in this way, the Javabean Cartridge is used to populate data from the source message into a Java Object model, which is then applied to a FreeMarker template to produce output in the result (or to a OutputStreamResource - see Routing). We refer to this method of transformation as "Model Driven Transformation" and a good example of this approach can be seen in the model-driven-basic tutorial.

One downside to the Model Driven Transformation approach can be the requirement to create an Java Object Model. To help here, the Smooks Javabean Cartridge has support for what we call "Virtual Object Models", whereby normal java.util.Map implementations can be used to fill the roll of a node in the model hierarchy. An example of this approach can be seen in the model-driven-basic-virtual tutorial.

Java or Groovy Transformations

Smooks always allows you to hook in low level logic, targeting specific events/fragments in the source data stream. This can be really useful for working around situations not covered out of the box by the Smooks Cartridges.

For more on this please look at the Tutorials.

EDI, CSV etc to XML Transformations

Processing non XML data streams is simply a matter of configuring Smooks with an XMLReader capable of parsing the data stream format in question. See the Stream Parsers documentation for more details on configuring parsers.

Once the stream parser is configured, the data "looks" like XML and can be processed in the same way as an XML stream. The HtmlReportGenerator can be hooked up to Smooks during developement. This will make visualization of this process easier because it allows you to see the event stream produced by the configured parser.

Java Binding

The Smooks JavaBean Cartridge allows you to create Java objects from your message data. You need to add the milyn-smooks-javabean-1.0.jar to your classpath. If you are using Maven then add the milyn:milyn-smooks-javabean:1.0 dependency to the POM.

Note: As you know, Smooks supports a range of source data formats (XML, EDI, CSV, Java etc), but for the purposes of this topic, we will always refer to the message data in terms of an XML format.

In the examples we will be referring a lot to the following XML message data:

<order>
    <header>
        <date>Wed Nov 15 13:45:28 EST 2006</date>
        <customer number="123123">Joe</customer>
    </header>
    <order-items>
        <order-item>
            <product>111</product>
            <quantity>2</quantity>
            <price>8.90</price>
        </order-item>
        <order-item>
            <product>222</product>
            <quantity>7</quantity>
            <price>5.20</price>
        </order-item>
    </order-items>
</order>

In some examples we will use different XML message data. Where this happens, the data is explicitly defined there then.

The JavaBean Cartridge is sued via the org.milyn.javabean.BeanPopulator class. The minimum resource-configuration for the BeanPopulator resource is as follows:

<resource-config selector="order">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanClass">example.model.Order</param>
</resource-config>

This configuration will create an instance of the example.model.Order class when the start of the "order" element is encountered in the data event stream (the "visitBefore" is fired). The created bean is set in the bean map under the key "order". This key is called the  beanId. In this case the beanId is derived from the beanClass class name because we didn't configure it explicitly. You can configure the beanId by setting the "beanId" parameter.

<resource-config selector="order">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">myOrder</param>
    <param name="beanClass">example.model.Order</param>
</resource-config>

This configuration will set the created Order class in the bean map under the key "myOrder".

The parameters of the BeanPopulator are: 

Parameter

Description

beanClass The class of the object that is created on the visit before of the selected node.

beanId The id under which the bean is put in the bean map.

Note: The parameters setOn, addToList, setOnMethod and setOnProperty are deprecated. These have been replace by the bean wiring feature. The bean wiring feature is explained in the POJO Object Models chapter.

The bean map is a very important part of the JavaBean Cartridge. One bean map is created per execution context (i.e. per Smooks.filter operation). Every bean, created by the cartridge, is put into this map under its beanId. The map is managed by the org.milyn.javabean.BeanAccessor class. Normally you don't directly access this class. If you want to get to the beans at the end of the filter process then you supply a org.milyn.delivery.java.JavaResult object in the call to Smooks.filter method. The BeanAccessor shares the same map with JavaResult. That enables you to extract the created beans from the bean map. The example below shows this principal:

//Get the data to filter
StreamSource source = new StreamSource(new InputStreamReader(getClass().getResourceAsStream("data.xml"));

//Create a Smooks instance (cachable)
Smooks smooks = new Smooks("smooks-config.xml");

//Create the JavaResult, which will contain the filter result after filtering
JavaResult result = new JavaResult();

//Filter the data from the source, putting the result into the JavaResult
smooks.filter(source, result);

//Getting the Order bean which was created by the JavaBean cartridge
Order order = (Order)result.getBean("order")

The Javabean cartridge has the following conditions for javabeans:

  1. A public no-argument constructor
  2. Public property setter methods. The don't need to follow any specific name formats, but it would be better if they do follow the standard property setter method names.
  3. Setting javabean properties directly is not supported.

Data binding

Because you probably want something more than an empty Java Object Model, we will now discuss how to get the data from the XML into your Java Objects. The BeanPopulator resource configurations provides the bindings parameter for this purpose. The bindings parameter accepts binding nodes. A binding node defines the binding between an XML node or attribute and a property of the bean.

The binding node configuration attributes are as follows:

Attribute Description

selector The selector that selects the source event/fragment (just like the selector of the <resource-config>) or a bean wiring selector. The bean wiring selector value will be discussed in the Javabean wiring chapter.

property The name of the property to set. This only works if you use the standard setter name format, else use the setterMethod attribute.

type The type of the value. The type defines how to decode the value from the raw data to the java type of the property.

setterMethod

The name of the method to call when setting the value. You can't use this parameter together with the property parameter.

default

The default value if the selected node value or attribute is empty. This doesn't do anything if the node isn't present at all, because the node doesn't get visited.

selector-namespace

The namespace of the selector

The following example shows how three node values are bound with three bean properties:

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItem</param>
    <param name="beanClass">example.model.OrderItem</param>
    <param name="bindings">
        <binding property="productId" type="Long" selector="order-item/product" />
        <binding property="quantity" type="Integer" selector="order-item/quantity" />
        <binding property="price" type="Double" selector="order-item/price" />
    </param>
</resource-config>

The following things happen in the example above:

  1. When the Smooks filter encounters the start of an order-item element in the source XML (on the visitBefore event), it creates an instance of the example.model.OrderItem class and puts it in the bean map under the beanId orderItem.
  2. When the Smooks filter encounters the end of an order-item/product element in the source XML (on the visitAfter event), the text data in the element is decoded into a Long. The OrderItem bean under the beanId orderItem is retrieved from the beanMap. The OrderItem#setProductId(Long) is called on the bean with the decoded Long value as the argument.
  3. When the Smooks filter encounters the end of an order-item/quantity element in the source XML (on the visitAfter event), the text data in the element is decoded into an Integer. The OrderItem bean under the beanId orderItem is retrieved from the beanMap. The OrderItem#setQuantity(Integer) is called on the bean with the decoded Integer value as argument.
  4. When the Smooks filter encounters the end of an order-item/price element in the source XML (on the visitAfter event), the text data in the element is decoded into a Double. The OrderItem bean under the beanId orderItem is retrieved from the beanMap. The OrderItem#setPrice(Double) is called on the bean with the decoded Double value as argument.

 Data decoders

The type attribute of the binding node uses the data decoding feature of Smooks. A data decoder is a class which implements the org.milyn.javabean.DataDecoder class. The value of the type argument is used to find the right data decoder. Smooks comes with a lot of standard data decoders like the IntegerDecoder, LongDecoder, BooleanDecoder, BigDecimalDecoder, etc. You can find these in the org.milyn.javabean.decoders package. Some decoders need configuration like the DateDecoder. Because Smooks threats data decoders as resources, they're configured using the resource-config XML notation. Smooks reserves a special selector prefix for data decoders, namely decoder:[DecoderName]. You replace the [DecoderName] with the name that best fits the type. You then use that name in the type field of the binding node. The following example shows how the configuration looks like then:

<resource-config selector="header">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanClass">org.milyn.javabean.Header</param>
    <param name="bindings">
        <binding property="date" type="OrderDateLong" selector="header/date"/>
        <binding property="customerNumber" type="Long" selector="header/customer/@number" />
        <binding property="customerName" selector="header/customer" />
    </param>
</resource-config>

<resource-config selector="decoder:OrderDateLong">
    <resource>org.milyn.javabean.decoders.DateDecoder</resource>
    <param name="format">EEE MMM dd HH:mm:ss z yyyy</param>
    <param name="locale-language">en</param>
    <param name="locale-country">IE</param>
</resource-config>

A decoder:OrderDateLong resource-config configures a DateDecoder to decode a date based on a certain date format. The OrderDataLong is used as type in the date binding of the header resource-config. The javabean cartridge then uses this configured decoder for decoding the data from the header/date XML node.

For more information about the Data decoder take a look in the Javadoc of the DataDecoder interface.

Javabean Wiring

In the previous chapters we looked at how to convert some XML data to a single Javabean Object. But you probably want something more then one single big object. This chapter will look at how you can create a complete object model using the Javabean Cartridge using  javabean wiring.

Javabean Wiring makes it possible to reference beans from the bean map to each other. This is done via a special selector in the binding configuration. By using the ${beanId} notation, the cartridge knows that you want to wire another bean, from the bean map, to that property. At the moment that the visitor visits the selected node, the bean that you want to wire, can or may not already exist. If the target  bean doesn't exist then the target bean will be wired as soon as it gets created later on in the filter process. This is called late wiring.

Here is an example of bean wiring:

<!-- The normal bindings are left out in this example -->
<resource-config selector="order">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">order</param>
    <param name="beanClass">example.model.Order</param>
    <param name="bindings">
        <binding property="header" selector="${header}" />
    </param>
</resource-config>

<resource-config selector="header">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">header</param>
    <param name="beanClass">example.model.Header</param>
    <param name="bindings">
        <binding property="order" selector="${order}" />
    </param>
</resource-config>

In this example the first resource configuration has one binding with the selector configuration ${header}. That configuration means that it wants to wire another bean with the beanId header. The second resource configuration has the header beanId. That resource configuration has a binding of it's own but for the order beanId of the previous configuration. You can guess what the end result will be. The Order object has a reference to the Header object and the Header object has a reference to the Order object.

In the case when the bean of the targeted beanId already exists in the bean map then the setter method of the target property will only be called once. But in the case of late wiring then it is possible that the setter method will be called multiple times. Because every time the bean with the beanId is created, the javabean cartridge will call the setter method. This makes it possible to call some 'add object to collection' method. The following example shows this situation:

<resource-config selector="order-items">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItemList</param>
    <param name="beanClass">example.model.OrderItemList</param>
    <param name="bindings">
        <binding setterMethod="addOrderItem" selector="${orderItem}" />
    </param>
</resource-config>

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItem</param>
    <param name="beanClass">example.model.OrderItem</param>
    <param name="bindings">
	<!-- bindings -->
    </param>
</resource-config>

Everytime that the <order-item> node is visited, the created example.model.OrderItem object will be added to the example.model.OrderItemList by calling the OrderItemList#addOrderItem(OrderItem) method.

Bean wiring doesn't only work for nested XML structures like this:

<root>
   <a>
      <b/>
      <b/>
   </a>
   <a>
      <b/>
      <b/>
   </a>
</root>

but also for flat XML structures like this:

<root>
   <a/>
   <b/>
   <b/>

   <a/>
   <b/>
   <b/>
</root>

 

You don't have to worry the the second two <b> elements of previous example get wired to the first <a> node. The cartridge makes sure that the first <a> node resource configuration stops wiring new B objects as soon as the new A object, of the second <a> element, gets added to the bean map. If you want to know more about how this works then take a look in the Smooks Developer Guide in the  Javabean wiring internalssection.

With a javabean wiring binding the rules for the other binding configurations are a bit different:

Attribute Description

property If this configuration isn't set then the beanId of the target bean will be used as the property name.

type Defining the type configuration is not allowed.

default

Defining a default configuration is not allowed.

Collections

It possible to directly use java.util.Collection classes as the configured bean class. The cartridge understands the Collection interface and will call the add method instead of the configured method in the binding configuration. The property and setterMethod attributes are ignored by the cartridge.

The following example show the situation where a java.util.ArrayList is used to collect all the OrderItem beans of an order: 

<resource-config selector="order-items">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItemList</param>
    <param name="beanClass">java.util.ArrayList</param>
    <param name="bindings">
        <binding selector="${orderItem}" />
    </param>
</resource-config>

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItem</param>
    <param name="beanClass">example.model.OrderItem</param>
    <param name="bindings">
	<!-- bindings -->
    </param>
</resource-config>

By using the bean wiring all the created OrderItems are added to the ArrayList.

If you wire a Collection class to some bean then you must be aware that the wiring happens as soon as the collection object is created. Therefore the collection is empty when the setter method is called on the wiring bean.

Arrays

Next to Collection objects it is also possible to use a object array as the configured bean class. As with the Collection classes the property and setterMethod attributes of the binding configuration are ignored.

 The following example show the situation where a example.model.OrderItem array is used to collect all the OrderItem beans of an order:

<resource-config selector="order-items">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItemList</param>
    <param name="beanClass">example.model.OrderItem[]</param>
    <param name="bindings">
        <binding selector="${orderItem}" />
    </param>
</resource-config>

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItem</param>
    <param name="beanClass">example.model.OrderItem</param>
    <param name="bindings">
	<!-- bindings -->
    </param>
</resource-config>

You should try to avoid using arrays, except when you can't use the Collection classes or your own Collection like beans. There are two reasons why:

  • There is an overhead in using arrays. This is because the cartridge uses an ArrayList internally. At the after visit event of the resource configuration the ArrayList is converted to the actual array.
  • The arrays aren't usable for all XML structures. This is because the 'ArrayList to array' conversion happens at the after visit event. So the array list only works in the nested XML structure and not for a flat xml structure. You will only end up with an empty array, if you still try.

Maps

The JavaBean cartridge also supports the java.util.Map class as the configured bean class. It will call the put method instead of the configured method in the binding configuration. The configured property attribute becomes the key of the map entry. The configured setterMethod attribute is ignored.

In the following example we show you how the 'order-item' values are stored in a map: 

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItemMap</param>
    <param name="beanClass">java.util.HashMap</param>
    <param name="bindings">
        <binding property="productId" type="Long" selector="order-item product" />
        <binding property="quantity" type="Integer" selector="order-item quantity" />
        <binding property="price" type="Double" selector="order-item price" />
    </param>
</resource-config>

If the property attribute of a binding isn't defined or it is empty then the name of the selected node will be used as the map entry key. In the previous example we could remove the property attribute and still get the same result, because the node names are the same as the map entry key names. When wiring a bean, the name of the target bean id is used as the key.

There is one other way to define the map key. The value of the property attribute can start with the @ character. The rest of the value then defines the attribute name of the selected node, from which the cartridge can get the map key from. The following example demonstrates this:

xml data
<root>
   <property name="key1">value1</property>
   <property name="key2">value2</property>
   <property name="key3">value3</property>
</root>
smooks configuration
<resource-config selector="a">
   <resource>org.milyn.javabean.BeanPopulator</resource>
   <param name="beanClass">java.util.HashMap</param>
   <param name="bindings">
      <binding property="@name" selector="property" />
   </param>
</resource-config>

This would create a HashMap with three entries with the keys set [key1, key2, key3].

Off course the @ character notation doesn't work for bean wiring. The cartridge will simply use the value of the property attribute, including the @ character, as the map entry key.

Virtual Object Models (Maps & Lists)

It is possible to create a complete object model without writing your own Bean classes. This virtual model is created using only maps and lists . This is very convenient if you use the javabean cartridge between two processing steps. For example from xml -> java -> edi.

The following example demonstrates the principle:

<resource-config selector="order">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">order</param>
    <param name="beanClass">java.util.HashMap</param>
    <param name="bindings">
        <binding property="header" selector="${header}" />
        <binding property="orderItems" selector="${orderItems}" />
    </param>
</resource-config>

<resource-config selector="header">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">header</param>
    <param name="beanClass">java.util.HashMap</param>
    <param name="bindings">
	<binding property="order" selector="${order}" />
        <binding property="date" type="OrderDateLong" selector="header date" />
        <binding property="customerNumber" type="Integer" selector="header customer @number" />
        <binding property="customerName" type="String" selector="header customer" />
    </param>
</resource-config>

 <resource-config selector="order-items">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItems</param>
    <param name="beanClass">java.util.ArrayList</param>
    <param name="bindings">
        <binding selector="${orderItem}" />
    </param>
</resource-config>

<resource-config selector="order-item">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">orderItem</param>
    <param name="beanClass">java.util.HashMap</param>
    <param name="bindings">
        <binding property="order" selector="${order}" />
        <binding property="productId" type="Long" selector="order-item product" />
        <binding property="quantity" type="Integer" selector="order-item quantity" />
        <binding property="price" type="Double" selector="order-item price" />
    </param>
</resource-config>

Take a look at the milyn/smooks-examples/xml-to-java-virtual for another example.

Expression Based Bindings

Sometimes your Source message can represent the underlying data differently to the Java Object Model to which you are binding e.g. your Source Order message may contain "quantity" and "price", while your Target model represents this data as "total" (i.e. price * quantity).

You can handle this usecase in Smooks by using an "Expression Based Binding". This feature allows you to set the binding value based on the result of an expression evaluation (on the bean context). Therefore, it can be used to perform all sorts of complex binding operations, including the one outlined above.

The following is an example of a config to merge field values together. Imagine you have a message like this:

<message>
    <time>17:45</time>
    <day>25</day>
    <month>12</month>
    <year>1999</year>
</message>

And you need to merge the date field values together and decode the merged value into the "date" (type: java.util.Date) property on your target bean. Here's the config snippet...

<!-- Concat and decode the date field values into the "date" property of the Message bean... -->
<resource-config selector="message">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">message</param>
    <param name="beanClass">org.milyn.javabean.expressionbinding.Message</param>
    <param name="bindings">
        <binding property="date" type="MessageDate">
            messageDate.time + " " + messageDate.day + "/" + messageDate.month + "/" + messageDate.year
        </binding>
    </param>
</resource-config>

<!-- Capture the date field values into a Map.  Used above... -->
<resource-config selector="message">
    <resource>org.milyn.javabean.BeanPopulator</resource>
    <param name="beanId">messageDate</param>
    <param name="beanClass">java.util.HashMap</param>
    <param name="bindings">
        <binding property="time" selector="message/time" />
        <binding property="day" selector="message/day" />
        <binding property="month" selector="message/month" />
        <binding property="year" selector="message/year" />
    </param>
</resource-config>

<!-- Date decoder... -->
<resource-config selector="decoder:MessageDate">
    <resource>org.milyn.javabean.decoders.DateDecoder</resource>
    <param name="format">HH:mm dd/MM/yyyy</param>
</resource-config>

Merging Multiple Data Entities Into a Single Binding

This can be achieved using Expression Based Bindings (previous section).

Generating the Smooks Binding Configuration

The Javabean Cartridge contains the org.milyn.javabean.gen.ConfigGenerator utility class that can be used to generate a binding configuration template. This template can then be used as the basis for defining a binding.

From the commandline:

$JAVA_HOME/bin/java -classpath <classpath> org.milyn.javabean.gen.ConfigGenerator -c <rootBeanClass> -o <outputFilePath> [-p <propertiesFilePath>]
  • The "-c" commandline arg specifies the root class of the model whose binding config is to be generated.
  • The "-o" commandline arg specifies the path and filename for the generated config output.
  • The "-p" commandline arg specifies the path and filename optional binding configuration file that specifies aditional binding parameters.



The optional "-p" properties file parameter allows specification of additional config parameters:

  • packages.included: Semi-colon seperated list of packages scoping classes to be included in the binding generation.
  • packages.excluded: Semi-colon seperated list of packages scoping classes to be excluded in the binding generation.



After running this utility against the target class, you typically need to perform the following follow-up tasks in order to make the binding configuration work for your Source data model.

  1. For each <resource-config>, set the selector to the event element against which the bean instance should be created i.e. on which the beans lifecycle should be associated.
  2. Update the selector on each <binding> to select the event element/attribute supplying the binding data for that bean property.
  3. Check the type attribute on the bindings. Not all will be set; depending on the actual property type. These must be configured by hand e.g. you may need to configure a custom decoder for the field (e.g. for date fields).
  4. Double check that <binding> config template elements have been added for all relevant properties. Check this against the actual code.



Determining the selector values can sometimes be difficult, especially for non XML Sources (Java etc). The Html Reporting tool can be a great help here because it helps you visualise the input message model (against which the selectors will be applied) as seen by Smooks. So, first off, generate a report using your Source data, but with an empty transformation configuration. In the report, you can see the model against which you need to add your configurations. Add the configurations one at a time, rerunning the report to check they are being applied.

Model Driven Transformations

Java to Java Transformations

Smooks can transform one Java object graph to another Java object graph.For this transformation Smooks uses the SAX processing model, which means no intermediate object model is constructed for populating the target Java object graph. Instead, we go straight from the source Java object graph, to a stream of SAX events, which are used to populate the target Java object graph.

Source and Target Object Models

The required mappings from the source to target Object models are as follows:

Source Model Event Stream

Using the Html Reporting tool we can see that the SAX Event Stream produced by the source Object Model is as follows:

<example.srcmodel.Order>
    <header>
        <customerNumber>
        </customerNumber>
        <customerName>
        </customerName>
    </header>
    <orderItems>
        <example.srcmodel.OrderItem>
            <productId>
            </productId>
            <quantity>
            </quantity>
            <price>
            </price>
        </example.srcmodel.OrderItem>
    </orderItems>
</example.srcmodel.Order>

So we need to target the Smooks Javabean resources at this event stream. This is shown in the Smooks Configuration.

Smooks Configuration

The Smooks configuration for performing this transform ("smooks-config.xml") is as follows (see the Source Model Event Stream above):

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">

    <resource-config selector="global-parameters">
        <param name="stream.filter.type">SAX</param>
    </resource-config>

    <resource-config selector="example.srcmodel.Order">
        <resource>org.milyn.javabean.BeanPopulator</resource>
        <param name="beanId">lineOrder</param>
        <param name="beanClass">example.trgmodel.LineOrder</param>
        <param name="bindings">
            <binding property="customerId" selector="header/customerNumber" />
            <binding property="customerName" selector="header/customerName" />
            <binding property="lineItems" selector="${lineItems}" />
        </param>
    </resource-config>

    <resource-config selector="orderItems">
        <resource>org.milyn.javabean.BeanPopulator</resource>
        <param name="beanId">lineItems</param>
        <param name="beanClass">example.trgmodel.LineItem[]</param>
        <param name="bindings">
            <binding selector="${lineItem}" />
        </param>
    </resource-config>

    <resource-config selector="example.srcmodel.OrderItem">
        <resource>org.milyn.javabean.BeanPopulator</resource>
        <param name="beanId">lineItem</param>
        <param name="beanClass">example.trgmodel.LineItem</param>
        <param name="bindings">
            <binding property="productCode" selector="example.srcmodel.OrderItem/productId" />
            <binding property="unitQuantity" type="Integer" selector="example.srcmodel.OrderItem/quantity" />
            <binding property="unitPrice" type="BigDecimal" selector="example.srcmodel.OrderItem/price" />
        </param>
    </resource-config>

</smooks-resource-list>

Smooks Execution

The source object model is provided to Smooks via a org.milyn.delivery.JavaSource Object. This object is created by passing the constructor the root object of the source model. The resulting JavaSource object is used in the Smooks#filter method. The resulting code could look like as follows:

protected LineOrder runSmooksTransform(Order srcOrder) throws IOException, SAXException {
    Smooks smooks = new Smooks("smooks-config.xml");
    ExecutionContext executionContext = smooks.createExecutionContext();

    // Transform the source Order to the target LineOrder via a
    // JavaSource and JavaResult instance...
    JavaSource source = new JavaSource(srcOrder);
    JavaResult result = new JavaResult();

    // Configure the execution context to generate a report...
    executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));

    smooks.filter(source, result, executionContext);

    return (LineOrder) result.getBean("lineOrder");
}


Processing Huge Messages (GBs)

One of the main features introduced in Smooks v1.0 is the ability to process huge messages (Gbs in size). Smooks supports the following types of processing for huge messages:

  1. One-to-One Transformation:  This is the process of transforming a huge message from it's source format (e.g. XML), to a huge message in a target format e.g. EDI, CSV, XML etc.
  2. Splitting & Routing:  Splitting of a huge message into smaller (more consumable) messages in any format (EDI, XML, Java etc.) and Routing of those smaller messages to a number of different destination types (File, JMS, Database).
  3. Persistence:  Persisting the components of the huge message to a Database, from where they can be more easily queried and processed. Within Smooks, we consider this to be a form of Splitting and Routing (routing to a Database).

With Smooks v1.0, all of the above is possible without writing any code (i.e. in a declarative manner). Typically, any of the above types of processing would have required writing quite a bit of ugly/unmaintainable code. It might also have been implemented as a multi-stage process where the huge message is split into smaller messages (stage #1) and then each smaller message is processed in turn to persist, route etc. (stage #2). This would all be done in an effort to make that ugly/unmaintainable code a little more maintainable and reusable. With Smooks v1.0, most of these use-cases can be handled without writing any code. As well as that, they can also be handled in a single pass over the source message, splitting and routing in parallel (plus routing to multiple destinations of different types and in different formats).

Performance Hint

When processing huge messages with Smooks, make sure you are using the SAX filter.

One-to-One Transformation

If the requirement is to process a huge message by transforming it into a single message of another format, the easiest mechanism with Smooks is to apply multiple templating operations (using XSL, FreeMarker etc) to the Source message Event Stream, outputing to the Result stream.

The following diagram tries to illustrate the process involved when using FreeMarker as the templating technology.  The assumption here is that you're familiar with the Javabean Cartridge and how it can be used to populate the Object Models used in the FreeMarker templates (including how to define Virtual Object Models).



So in this example, you use the Source events up to the end of the header to gather <header> information from the source message and populate a Virtual Object Model, which is then used in the FreeMarker template that's triggered by the <header> element's visitAfter event.  The same basic process is utilized for transforming the <order-items>.  At the end we need to apply a static template to complete the Result message.

This approach to performing a One-to-One Transformation of a huge message works simply because the only objects in memory at any one time are the order header details and the current <order-item> details (in the Virtual Object Model).  Obviously it can't work if the transformation is so obscure as to always require full access to all the data in the source message e.g. if the messages needs to have all the order items reversed in order (or sorted).  In such a case however, you do have the option of routing the order details and items to a database and then using the database's storage, query and paging features to perform the transformation.

Splitting & Routing

Another common approach to processing large/huge messages is to split them out into smaller messages that can be processed independently. Of course Splitting and Routing is not just a solution for processing huge messages. It's often needed with smaller messages too (message size may be irrelevant) where, for example, order items in an an order message need to be split out and routed (based on content - "Content Base Routing") to different departments or partners for processing. Under these conditions, the message formats required at the different destinations may also vary e.g.

  • "destination1" required XML via the file system,
  • "destination2" requires Java objects via a JMS Queue,
  • "destination3" picks the messages up from a table in a Database etc.
  • "destination4" requires EDI messages via a JMS Queue,
  • etc etc

With Smooks v1.0, all of the above is possible. You can perform multiple splitting and routing operations to multiple destinations (of different types) in a single pass over a message.

The key to processing huge messages is to make sure that you always maintain a small memory footprint. You can do this using the Javabean Cartridge by making sure you're only binding the most relevant message data (into the bean context) at any one time. In the following sections, the examples are all based on splitting and routing of order-items out of an order message. The solutions shown all work for huge messages because the Smooks Javabean Cartridge binding configurations are implemented such that the only data held in memory at any given time is the main order details (order header etc) and the "current" order item details.

Complex splitting operations are supported through use of the Javabean Cartridge to extract the data for the split-message. In this way, you can extract and recombine data from across different sub-hierarchies of the Source message, to produce the split messages. It also means you can (through the use of templating) easily generate the split messages in a range of different formats. More on this later.

Routing to File

The example outlined in this section illustrates how you can combine the following Smooks functionality to split a message out into smaller messages on the file system.

  1. The Javabean Cartridge for extracting data from the message and holding it in variables in the bean context.
  2. The FileOutputStreamResource for managing file system streams (naming, opening, closing, throttling creation etc).
  3. FreeMarker Templating for generating the individual split messages from data bound in the bean context, and written to the FileOutputStreamResource.

In the example, we want to process a huge order message and route the individual order item details to file. The following illustrates what we want to achieve. As you can see, the split messages don't just contain data from the order item fragments. They also contain data from the order header and root elements.

To achieve this with Smooks, we assemble the following solution.

Smooks Resource configuration #1 and #2 define the Java Bindings for extracting the order header information (config #1) and the order-item information (config #2). This is the key to processing a huge message; making sure that we only have the current order item in memory at any one time. The Smooks Javabean Cartridge manages all this for you, creating and recreating the orderItem beans as the <order-item> fragments are being processed.

Smooks Resource configuration #3 manages the generation of the files on the file system. As you can see from the configuration, the file names can be dynamically constructed from data in the bean context. You can also see that it can throttle the creation of the files via the "highWaterMark" configuration parameter. This helps you manage file creation so as not to overwhelm the target file system.

Smooks Resource configuration #4 defines the FreeMarker templating resource used to write the split messages to the OutputStream created by the FileOutputStreamResource (config #3). See how config #4 references the FileOutputStreamResource resource.

Routing to JMS

Routing to a JMS Destination is performed using the JMSRouter Visitor implementation.  The following is an example a JMSRouter configuration that routes an "orderItem" bean to a JMS Queue named "/queue/smooksRouterQueue".

<resource-config selector="order-item">
    <resource>org.milyn.routing.jms.JMSRouter</resource>
    <param name="beanId">orderItem</param>
    <param name="destinationName">/queue/smooksRouterQueue</param>
    <param name="correlationIdPattern">${order.orderId}-${order.orderItem.itemId}</param>
    <param name="highWaterMark">50</param>
</resource-config>

The "orderItem" bean can be a full Object Model, in which case it's routed as a Serialized ObjectMessage.  It can also be the result of templating operation that was bound into the bean context using the "bindto" templating action (for an example, see the FreeMarker Templating Javadocs).  Using the Templating "bindto" action, you can route text based messages.

See the JMSRouter for more details.  Also check out the jms-routing tutorial.

Routing to a Database

Routing to a Database is also quite easy. Please read the "Routing to File" section above before reading this section.

So we take the same scenario as with the File Routing example above, but this time we want to route the order and order item data to a Database. This is what we want to achieve:

To achieve this with Smooks, we assemble the following solution.

The main points of note with this example are:

  1. How to configure a DataSource resource (config #1).
  2. How to reference and use the DataSource resource from the SQLExecutor resource configurations (configs #2, #3 and #4).

For more on the SQLExecutor, please read the Persistence section.

Check out the db-extract-transform-load example.

Message Splitting & Routing

Please refer to the Splitting & Routing section in the previous section.

Persistence (Database Reading and Writing)

The SQLExecutor Visitor class can be used to perform event driven read and write operations on a DataSource, using data in the bean context as the query/update parameters.

Configuration Example
<resource-config selector="customer-details">
        <resource>org.milyn.routing.db.SQLExecutor</resource>
        <param name="executeBefore">true</param>
        <param name="datasource">DBExtractTransformLoadDS</param>
        <param name="statement">select ORDERNUMBER from ORDERS where ORDERNUMBER = ${order.orderNum}</param>
        <param name="resultSetName">orderExistsRS</param>
</resource-config>
Parameter Description

datasource The name of the datasource configuration to use. See datasource section b