The Spring Batch 2.0 release has six major themes: –
- Java 5- The 1.x.x releases of Spring Batch were all based on Java 1.4. This prevented the framework from using many enhancements provided in Java 5 such as generics, parametrized types, etc. The entire framework has been updated to utilize these features. As a result, Java 1.4 is no longer supported. Most of the interfaces developers work with have been updated to support generic types. As an example, the ItemReader interface from 1.1 is below:
public interface ItemReader { Object read() throws Exception; void mark() throws MarkFailedException; void reset() throws ResetFailedException; }
same thing in 2.0 version given below.
public interface ItemReader<T> { T read() throws Exception, UnexpectedInputException, ParseException; }
public interface ItemWriter<T> { void write(List<? extends T> items) throws Exception; }
As you can see, ItemReader now supports the generic type, T, which is returned from read. You may also notice that mark and reset have been removed. This is due to step processing strategy changes, which are discussed below. Many other interfaces have been similarly updated.
- Non Sequential Step Execution-2.0 has also seen improvements in how steps can be configured. Rather than requiring that they solely be sequential:
They may now be conditional:
This new ‘conditional flow’ support is made easy to configure via the new namespace:
<job id="job"> <step id="stepA"> <next on="FAILED" to="stepB"></next> <next on="*" to="stepC"></next> </step> <step id="stepB" next="stepC"></step> <step id="stepC"></step> </job>
- Chunk oriented processing– In version 1.x.x –
In item-oriented processing, the ItemReader returns one Object (the ‘item’) which is then handed to the ItemWriter, periodically committing when the number of items hits the commit interval. For example, if the commit interval is 5, ItemReader and ItemWriter will each be called 5 times. This is illustrated in a simplified code example below:for(int i = 0; i < commitInterval; i++){ Object item = itemReader.read(); itemWriter.write(item); }
In 2.x.x, this strategy has been changed to a chunk-oriented approach:
Using the same example from above, if the commit interval is five, read will be called 5 times, and write once. The items read will be aggregated into a list, that will ultimately be written out, as the simplified example below illustrates:
List items = new Arraylist(); for(int i = 0; i < commitInterval; i++){ items.add(itemReader.read()); } itemWriter.write(items);
In previous version 1.x.x, Steps had only two dependencies, ItemReader and ItemWriter:
The basic configuration above is fairly robust. However, there are many cases where the item needs to be transformed before writing. In version 1.x.x this can be achieved using the composite pattern:
This approach works. However, it requires an extra layer between either the reader or the writer and the Step. Furthermore, the ItemWriter would need to be registered separately as an ItemStream with the Step. For this reason, in the version 2.x.x the ItemTransfomer was renamed to ItemProcessor and moved up to the same level as ItemReader and ItemWriter:
- Meta Data enhancements-The JobRepository interface represents basic CRUD operations with Job meta-data. However, it may also be useful to query the meta-data. For that reason, the JobExplorer and JobOperator interfaces have been created:
- Scalability-
Spring Batch 1.x was always intended as a single VM, possibly multi-threaded model, but many features were built into it that support parallel execution in multiple processes. Many projects have successfully implemented a scalable solution relying on the quality of service features of Spring Batch to ensure that processing only happens in the correct sequence. In 2.X those features have been exposed more explicitly. There are two approaches to scalability: remote chunking, and partitioning.Remote Chunking-
Remote chunking is a technique for dividing up the work of a step without any explicit knowledge of the structure of the data. Any input source can be split up dynamically by reading it in a single process (as per normal in 1.x) and sending the items as a chunk to a remote worker process. The remote process implements a listener pattern, responding to the request, processing the data and sending an asynchronous reply. The transport for the request and reply has to be durable with guaranteed delivery and a single consumer, and those features are readily available with any JMS implementation. But Spring Batch is building the remote chunking feature on top of Spring Integration, therefore it is agnostic to the actual implementation of the message middleware.
Partitioning-
Partitioning is an alternative approach which in contrast depends on having some knowledge of the structure of the input data, like a range of primary keys, or the name of a file to process. The advantage of this model is that the processors of each element in a partition can act as if they are a single step in a normal Spring Batch job. They don’t have to implement any special or new patterns, which makes them easy to configure and test. Partitioning in principle is more scalable than remote chunking because there is no serialization bottleneck arising from reading all the input data in one place.
In Spring Batch 2.0 partitioning is supported by two interfaces: PartitionHandler and StepExecutionSplitter. The PartitionHandler is the one that knows about the execution fabric – it has to transmit requests to remote steps and collect the results using whatever grid or remoting technology is available. PartitionHandler is an SPI, and Spring Batch provides one implementation out of the box for local execution through a TaskExecutor. This will be useful immediately when parallel processing of heavily IO bound tasks is required, since in those cases remote execution only complicates the deployment and doesn’t necessarily help much with the performance. Other implementations will be specific to the execution fabric. (e.g. one of the grid providers such as IBM, Oracle, Terracotta, Appistry etc.), Spring Batch makes no preference for any of grid provider over another.
- Configuration– Until 2.X.X, the only option for configuring batch jobs has been normal spring bean configuration. However, in version 2.X.X there is a new namespace for configuration. For example, in 1.1, configuring a job looked like the following:
<bean class="org.springframework.batch.core.job.SimpleJob" id="myEmpExpireJob"> <property name="steps"> <list> <!-- Step bean details ommitted for clarity --> <bean id="readEmployeeData"></bean> <bean id="writeEmployeeData"></bean> <bean id="employeeDataProcess"></bean> </list> </property> <property name="jobRepository" ref="jobRepository"></property> </bean>
In version 2.X.X, the equivalent would be:
<job id="myEmpExpireJob"> <!-- Step bean details ommitted for clarity --> <step id="readEmployeeData" next="writeEmployeeData"></step> <step id="writeEmployeeData" next="employeeDataProcess"></step> <step id="employeeDataProcess"></step> </job>