In this tutorial we will discuss about the three most important interfaces of spring batch and an overview of Spring Batch item reader and writer with a sample application. One of the important goals of a batch processing framework is to read large amounts of data, perform some business processing/transformation and write out the result. Spring Batch Framework supports this bulk reading, processing and writing using three key interfaces: ItemReader, ItemProcessor and ItemWriter.
ItemReader is the means for providing data from many different types of input. ItemReader interface is the means for reading bulk data in a bulk processing system. There are many different implementations of ItemReader interface. All implementations are expected to be stateful and will be called multiple times for each batch, with each call to read() returning a different value and finally returning null when all input data is exhausted. Below are few frequently used implementations of ItemReader.
ItemReader Implementation | Description |
---|---|
FlatFileItemReader | Reads lines of data from input file. Typically read line describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma (,), Pipe (|) etc). |
JdbcCursorItemReader | Opens a JDBC cursor and continually retrieves the next row in the ResultSet. |
StoredProcedureItemReader | Executes a stored procedure and then reads the returned cursor and continually retrieves the next row in the ResultSet. |
All the above implementations override the read() method from the ItemReader interface. The read method defines the most essential contract of the ItemReader. It returns one item or null if no more items are left. An item might represent a line in a file, a row in a database and so on.
ItemWriter is similar in functionality to an ItemReader, but with inverse operations. ItemWriter is a interface for generic output operations. Implementation class will be responsible for serializing objects as necessary. Resources still need to be located, opened and closed but they differ in that an ItemWriter writes out, rather than reading in. For databases these may be inserts or updates.
The write method defines the most essential contract of the ItemWriter. It will attempt to write out the list of items passed in as long as it is open. As it is expected that items will be ‘batched’ together into a chunk and then output, the interface accepts a list of items, rather than an item by itself. Once the items are written out , any flushing that may be necessary can be performed before returning from the write method.
ItemWriter Implementation | Description |
---|---|
FlatFileItemWriter | Writes data to a file or stream. Uses buffered writer to improve performance. |
StaxEventItemWriter | An implementation of ItemWriter which uses StAX and Marshaller for serializing object to XML. |
The ItemReader and ItemWriter interfaces are both very useful for their specific tasks, but what if you want to insert business logic before writing? An ItemProcessor is very simple interface for item transformation. Given one object, transform it and return another. Any business/transformation logic can be plugged into this component. Assume an ItemReader provides a class of type User, and it needs to be converted to type Employee before being written out. An ItemProcessor can be written that performs the conversion. Another typical use for an item processor is to filter out records before they are passed to the ItemWriter. Filtering simply indicates that a record should not be written.
In this sample application we will describe all three interfaces implementation.
We required following technologies
Review the final project structure, a standard java project.
Below is our custom item reader. Each time it is called, it returns the next element from the list and returns null if the list is exhausted.
CustomItemReader.java
package com.doj.batch.reader; import java.util.List; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ParseException; import org.springframework.batch.item.UnexpectedInputException; /** * @author Dinesh Rajput * */ public class CustomItemReader implements ItemReader<String>{ private List<String> bookNameList; private int bookCount = 0; @Override public String read() throws Exception, UnexpectedInputException, ParseException { if(bookCount < bookNameList.size()){ return bookNameList.get(bookCount++); }else{ return null; } } public List<String> getUserNameList() { return bookNameList; } public void setBookNameList(List<String> bookNameList) { this.bookNameList = bookNameList; } }
CustomItemProcessor is simple custom ItemProcessor which transforms every element returned by the ItemReader. Here book name return with respective author name.
CustomItemProcessor.java
package com.doj.batch.processor; import org.springframework.batch.item.ItemProcessor; /** * @author Dinesh Rajput * */ public class CustomItemProcessor implements ItemProcessor<String, String> { @Override public String process(String bookNameWithoutAuthor) throws Exception { String bookNameWithAuthor = "Book Name - "+bookNameWithoutAuthor+" | Author Name - "; if("Effective Java".equalsIgnoreCase(bookNameWithoutAuthor)){ bookNameWithAuthor += "Joshua Bloch"; }else if("Design Patterns".equalsIgnoreCase(bookNameWithoutAuthor)){ bookNameWithAuthor += "Erich Gamma"; }else if("Refactoring".equalsIgnoreCase(bookNameWithoutAuthor)){ bookNameWithAuthor += "Martin Fowler"; }else if("Head First Java".equalsIgnoreCase(bookNameWithoutAuthor)){ bookNameWithAuthor += "Kathy Sierra"; }else if("Thinking in Java".equalsIgnoreCase(bookNameWithoutAuthor)){ bookNameWithAuthor += " Bruce Eckel"; } return bookNameWithAuthor; } }
CustomItemWriter is custom ItemWriter which outputs the transformed item(s) returned by our CustomItemProcessor.
CustomItemWriter.java
package com.doj.batch.writer; import java.util.List; import org.springframework.batch.item.ItemWriter; /** * @author Dinesh Rajput * */ public class CustomItemWriter implements ItemWriter<String> { @Override public void write(List<? extends String> bookNameWithAuthor) throws Exception { System.out.println(bookNameWithAuthor); } }
Below is the applicationContext.xml which is required to create JobRepository, JobLauncher and TransactionManager.
Repository is responsible for persistence of batch meta-data information. SimpleJobRepository is an implementation of JobRepository that stores JobInstances, JobExecutions, and StepExecutions information using the DAOs injected via constructure arguments. Spring Batch supports two implementation of these DAOs: Map based (in-memory) and Jdbc based. In real enterprise application the Jdbc variants are preffered but we will use simpler in-memory alternatives (MapJobInstanceDao, MapJobExecutionDao, MapStepExecutionDao, MapExecutionContextDao) in this example.
As name suggests it is responsible for launching batch job. We are using SimpleJobLauncher implementation which requires only one dependency, a JobRepository. JobRepository is used to obtain a valid JobExecution. Repository must be used because the provided Job could be a restart of an existing JobInstance, and only the Repository can reliably recreate it.
As this example won’t be dealing with transactional data, we are using ResourcelessTransactionManager which is mainly used for testing purpose.
applicationContext.xml
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:context="http://www.springframework.org/schema/context" xmlns:p="http://www.springframework.org/schema/p" xmlns:mvc="http://www.springframework.org/schema/mvc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.0.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.0.xsd"> <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/> <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository"/> </bean> <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"> <property name="transactionManager" ref="transactionManager"/> </bean> <bean id="simpleJob" class="org.springframework.batch.core.job.SimpleJob" abstract="true"> <property name="jobRepository" ref="jobRepository" /> </bean> </beans>
It’s time to wire the above 3 components together into a job which will perform reading, processing and transformation work for us.
simple-job.xml
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:context="http://www.springframework.org/schema/context" xmlns:p="http://www.springframework.org/schema/p" xmlns:batch="http://www.springframework.org/schema/batch" xmlns:mvc="http://www.springframework.org/schema/mvc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.0.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.0.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.0.xsd"> <import resource="applicationContext.xml"/> <bean id="customReader" class="com.doj.batch.reader.CustomItemReader" > <property name="bookNameList" > <list> <value>Effective Java</value> <value>Design Patterns</value> <value>Refactoring</value> <value>Thinking in Java</value> <value>Head First Java</value> </list> </property> </bean> <bean id="customProcessor" class="com.doj.batch.processor.CustomItemProcessor" /> <bean id="customWriter" class="com.doj.batch.writer.CustomItemWriter" /> <batch:job id="simpleDojJob" job-repository="jobRepository" parent="simpleJob"> <batch:step id="step1"> <batch:tasklet transaction-manager="transactionManager"> <batch:chunk reader="customReader" processor="customProcessor" writer="customWriter" commit-interval="1"/> </batch:tasklet> </batch:step> </batch:job> </beans>
First we create three beans (customReader, customWriter, customProcessor) corresponding to CustomItemReader, CustomItemWriter and CustomItemProcessor. Note that the customReader is injected with the list of book names. This list is the source of data for the customReader bean.
Later we created a simpleStep bean using the SimpleStepFactoryBean class. Most common configuration options for simple steps should be found in this factory class. We injected the jobRepository, transactionManager, customReader, customWriter and customProcessor in this simple step bean. Note the property commitInterval which is set to 1. This tells Spring Batch that the commit should happen after 1 element .i.e. writer will write 1 item at a time.
Spring Batch comes with a simple utility class called CommandLineJobRunner which has a main() method which accepts two arguments. First argument is the spring application context file containing job definition and the second is the name of the job to be executed.
Now run as a java application with both two arguments.
org.springframework.batch.core.launch.support.CommandLineJobRunner
simple-job.xml simpleDojJob
[Book Name – Effective Java | Author Name – Joshua Bloch]
[Book Name – Design Patterns | Author Name – Erich Gamma]
[Book Name – Refactoring | Author Name – Martin Fowler]
[Book Name – Thinking in Java | Author Name – Bruce Eckel]
[Book Name – Head First Java | Author Name – Kathy Sierra]
Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…
Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…
Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…
Technology has emerged a lot in the last decade, and now we have artificial intelligence;…
Managing a database is becoming increasingly complex now due to the vast amount of data…
Overview In this article, we will explore Spring Scheduler how we could use it by…