Spring Batch

The Domain Language of Batch

The diagram above highlights the key concepts that make up the domain language of batch. A Job has one to many steps, which has exactly one ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched (JobLauncher), and meta data about the currently running process needs to be stored (JobRepository).

Job-

A Job is an entity that encapsulates an entire batch process. As is common with other Spring projects, a Job will be wired together via an XML configuration file. This file may be referred to as the “job configuration”. However, Job is just the top of an overall hierarchy:

In Spring Batch, a Job is simply a container for Steps. It combines multiple steps that belong logically together in a flow and allows for configuration of properties global to all steps, such as restartability. The job configuration contains:

  • The simple name of the job
  • Definition and ordering of Steps
  • Whether or not the job is restartable

A default simple implementation of the Job interface is provided by Spring Batch in the form of the SimpleJob class which creates some standard functionality on top of Job, however the batch namespace abstracts away the need to instantiate it directly. Instead, the <job> tag can be used:

<job id="myEmpExpireJob">
    <!-- Step bean details ommitted for clarity -->
    <step id="readEmployeeData" next="writeEmployeeData"></step>
    <step id="writeEmployeeData" next="employeeDataProcess"></step>
    <step id="employeeDataProcess"></step>
</job>

JobInstance-

A JobInstance refers to the concept of a logical job run. Let’s consider a batch job that should be run once at the end of the day, such as the ‘EndOfDay’ job from the diagram above. There is one ‘EndOfDay’ Job, but each individual run of the Job must be tracked separately. In the case of this job, there will be one logical JobInstance per day. For example, there will be a January 1st run, and a January 2nd run. If the January 1st run fails the first time and is run again the next day, it is still the January 1st run.

JobParameters-

Having discussed JobInstance and how it differs from Job, the natural question to ask is: “how is one JobInstance distinguished from another?” The answer is: JobParameters. JobParameters is a set of parameters used to start a batch job. They can be used for identification or even as reference data during the run:

JobExecution-

A JobExecution refers to the technical concept of a single attempt to run a Job. An execution may end in failure or success, but the JobInstance corresponding to a given execution will not be considered complete unless the execution completes successfully. Using the EndOfDay Job described above as an example, consider a JobInstance for 01-01-2013 that failed the first time it was run. If it is run again with the same job parameters as the first run (01-01-2013), a new JobExecution will be created. However, there will still be only one JobInstance.

A Job defines what a job is and how it is to be executed, and JobInstance is a purely organizational object to group executions together, primarily to enable correct restart semantics. A JobExecution, however, is the primary storage mechanism for what actually happened during a run, and as such contains many more properties that must be controlled and persisted:

status A BatchStatus object that indicates the status of the execution. While running, it’s BatchStatus.STARTED, if it fails, it’s BatchStatus.FAILED, and if it finishes successfully, it’s BatchStatus.COMPLETED
startTime A java.util.Date representing the current system time when the execution was started.
endTime A java.util.Date representing the current system time when the execution finished, regardless of whether or not it was successful.
exitStatus The ExitStatus indicating the result of the run. It is most important because it contains an exit code that will be returned to the caller. See chapter 5 for more details.
createTime A java.util.Date representing the current system time when the JobExecution was first persisted. The job may not have been started yet (and thus has no start time), but it will always have a createTime, which is required by the framework for managing job level ExecutionContexts.
lastUpdated A java.util.Date representing the last time a JobExecution was persisted.
executionContext The ‘property bag’ containing any user data that needs to be persisted between executions.
failureExceptions The list of exceptions encountered during the execution of a Job. These can be useful if more than one exception is encountered during the failure of a Job.

Step-

A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code. (depending upon the implementations used) A more complex Step may have complicated business rules that are applied as part of the processing. As with Job, a Step has an individual StepExecution that corresponds with a unique JobExecution:

StepExecution-

A StepExecution represents a single attempt to execute a Step. A new StepExecution will be created each time a Step is run, similar to JobExecution. However, if a step fails to execute because the step before it fails, there will be no execution persisted for it. A StepExecution will only be created when its Step is actually started.

Step executions are represented by objects of the StepExecution class. Each execution contains a reference to its corresponding step and JobExecution, and transaction related data such as commit and rollback count and start and end times. Additionally, each step execution will contain an ExecutionContext, which contains any data a developer needs persisted across batch runs, such as statistics or state information needed to restart. The following is a listing of the properties for StepExecution:

StepExecution Properties

status A BatchStatus object that indicates the status of the execution. While it’s running, the status is BatchStatus.STARTED, if it fails, the status is BatchStatus.FAILED, and if it finishes successfully, the status is BatchStatus.COMPLETED
startTime A java.util.Date representing the current system time when the execution was started.
endTime A java.util.Date representing the current system time when the execution finished, regardless of whether or not it was successful.
exitStatus The ExitStatus indicating the result of the execution. It is most important because it contains an exit code that will be returned to the caller. See chapter 5 for more details.
executionContext The ‘property bag’ containing any user data that needs to be persisted between executions.
readCount The number of items that have been successfully read
writeCount The number of items that have been successfully written
commitCount The number transactions that have been committed for this execution
rollbackCount The number of times the business transaction controlled by the Step has been rolled back.
readSkipCount The number of times read has failed, resulting in a skipped item.
processSkipCount The number of times process has failed, resulting in a skipped item.
filterCount The number of items that have been ‘filtered’ by the ItemProcessor.
writeSkipCount The number of times write has failed, resulting in a skipped item.

JobRepository-

JobRepository is the persistence mechanism for all of the Stereotypes mentioned above. It provides CRUD operations for JobLauncher, Job, and Step implementations. When a Job is first launched, a JobExecution is obtained from the repository, and during the course of execution StepExecution and JobExecution implementations are persisted by passing them to the repository:

<job-repository id="jobRepository"/>

JobLauncher-

JobLauncher represents a simple interface for launching a Job with a given set of JobParameters:

public interface JobLauncher {

    public JobExecution run(Job job, JobParameters jobParameters) 
                throws JobExecutionAlreadyRunningException, JobRestartException;
}

Item Reader-

ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time. When the ItemReader has exhausted the items it can provide, it will indicate this by returning null. More details about the ItemReader interface and its various implementations can be found in later Chapter , ItemReaders and ItemWriters.

Item Writer-

ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation. More details about the ItemWriter interface and its various implementations can be found in later Chapter, ItemReaders and ItemWriters.

Item Processor-

ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out. More details about the ItemProcessor interface can be found in later Chapter, ItemReaders and ItemWriters.

Batch Namespace-

Many of the domain concepts listed above need to be configured in a Spring ApplicationContext. While there are implementations of the interfaces above that can be used in a standard bean definition, a namespace has been provided for ease of configuration:

<beans:beans xmlns:beans="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.springframework.org/schema/batch" xsi:schemalocation="
           http://www.springframework.org/schema/beans 
           http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
           http://www.springframework.org/schema/batch 
           http://www.springframework.org/schema/batch/spring-batch-2.0.xsd">

    <job id="ioSampleJob">
        <step id="step1">
            <tasklet>
                <chunk commit-interval="2" reader="itemReader" writer="itemWriter"></chunk>
            </tasklet>
        </step>
    </job>

</beans:beans>
Previous
Next
Dinesh Rajput

Dinesh Rajput is the chief editor of a website Dineshonjava, a technical blog dedicated to the Spring and Java technologies. It has a series of articles related to Java technologies. Dinesh has been a Spring enthusiast since 2008 and is a Pivotal Certified Spring Professional, an author of a book Spring 5 Design Pattern, and a blogger. He has more than 10 years of experience with different aspects of Spring and Java design and development. His core expertise lies in the latest version of Spring Framework, Spring Boot, Spring Security, creating REST APIs, Microservice Architecture, Reactive Pattern, Spring AOP, Design Patterns, Struts, Hibernate, Web Services, Spring Batch, Cassandra, MongoDB, and Web Application Design and Architecture. He is currently working as a technology manager at a leading product and web development company. He worked as a developer and tech lead at the Bennett, Coleman & Co. Ltd and was the first developer in his previous company, Paytm. Dinesh is passionate about the latest Java technologies and loves to write technical blogs related to it. He is a very active member of the Java and Spring community on different forums. When it comes to the Spring Framework and Java, Dinesh tops the list!

Share
Published by
Dinesh Rajput

Recent Posts

Strategy Design Patterns using Lambda

Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…

2 years ago

Decorator Pattern using Lambda

Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…

2 years ago

Delegating pattern using lambda

Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…

2 years ago

Spring Vs Django- Know The Difference Between The Two

Technology has emerged a lot in the last decade, and now we have artificial intelligence;…

2 years ago

TOP 20 MongoDB INTERVIEW QUESTIONS 2022

Managing a database is becoming increasingly complex now due to the vast amount of data…

2 years ago

Scheduler @Scheduled Annotation Spring Boot

Overview In this article, we will explore Spring Scheduler how we could use it by…

3 years ago