The diagram above highlights the key concepts that make up the domain language of batch. A Job has one to many steps, which has exactly one ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched (JobLauncher), and meta data about the currently running process needs to be stored (JobRepository).
A Job is an entity that encapsulates an entire batch process. As is common with other Spring projects, a Job will be wired together via an XML configuration file. This file may be referred to as the “job configuration”. However, Job is just the top of an overall hierarchy:
In Spring Batch, a Job is simply a container for Steps. It combines multiple steps that belong logically together in a flow and allows for configuration of properties global to all steps, such as restartability. The job configuration contains:
A default simple implementation of the Job interface is provided by Spring Batch in the form of the SimpleJob class which creates some standard functionality on top of Job, however the batch namespace abstracts away the need to instantiate it directly. Instead, the <job> tag can be used:
<job id="myEmpExpireJob"> <!-- Step bean details ommitted for clarity --> <step id="readEmployeeData" next="writeEmployeeData"></step> <step id="writeEmployeeData" next="employeeDataProcess"></step> <step id="employeeDataProcess"></step> </job>
A JobInstance refers to the concept of a logical job run. Let’s consider a batch job that should be run once at the end of the day, such as the ‘EndOfDay’ job from the diagram above. There is one ‘EndOfDay’ Job, but each individual run of the Job must be tracked separately. In the case of this job, there will be one logical JobInstance per day. For example, there will be a January 1st run, and a January 2nd run. If the January 1st run fails the first time and is run again the next day, it is still the January 1st run.
Having discussed JobInstance and how it differs from Job, the natural question to ask is: “how is one JobInstance distinguished from another?” The answer is: JobParameters. JobParameters is a set of parameters used to start a batch job. They can be used for identification or even as reference data during the run:
A JobExecution refers to the technical concept of a single attempt to run a Job. An execution may end in failure or success, but the JobInstance corresponding to a given execution will not be considered complete unless the execution completes successfully. Using the EndOfDay Job described above as an example, consider a JobInstance for 01-01-2013 that failed the first time it was run. If it is run again with the same job parameters as the first run (01-01-2013), a new JobExecution will be created. However, there will still be only one JobInstance.
A Job defines what a job is and how it is to be executed, and JobInstance is a purely organizational object to group executions together, primarily to enable correct restart semantics. A JobExecution, however, is the primary storage mechanism for what actually happened during a run, and as such contains many more properties that must be controlled and persisted:
status | A BatchStatus object that indicates the status of the execution. While running, it’s BatchStatus.STARTED, if it fails, it’s BatchStatus.FAILED, and if it finishes successfully, it’s BatchStatus.COMPLETED |
startTime | A java.util.Date representing the current system time when the execution was started. |
endTime | A java.util.Date representing the current system time when the execution finished, regardless of whether or not it was successful. |
exitStatus | The ExitStatus indicating the result of the run. It is most important because it contains an exit code that will be returned to the caller. See chapter 5 for more details. |
createTime | A java.util.Date representing the current system time when the JobExecution was first persisted. The job may not have been started yet (and thus has no start time), but it will always have a createTime, which is required by the framework for managing job level ExecutionContexts. |
lastUpdated | A java.util.Date representing the last time a JobExecution was persisted. |
executionContext | The ‘property bag’ containing any user data that needs to be persisted between executions. |
failureExceptions | The list of exceptions encountered during the execution of a Job. These can be useful if more than one exception is encountered during the failure of a Job. |
A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code. (depending upon the implementations used) A more complex Step may have complicated business rules that are applied as part of the processing. As with Job, a Step has an individual StepExecution that corresponds with a unique JobExecution:
A StepExecution represents a single attempt to execute a Step. A new StepExecution will be created each time a Step is run, similar to JobExecution. However, if a step fails to execute because the step before it fails, there will be no execution persisted for it. A StepExecution will only be created when its Step is actually started.
Step executions are represented by objects of the StepExecution class. Each execution contains a reference to its corresponding step and JobExecution, and transaction related data such as commit and rollback count and start and end times. Additionally, each step execution will contain an ExecutionContext, which contains any data a developer needs persisted across batch runs, such as statistics or state information needed to restart. The following is a listing of the properties for StepExecution:
status | A BatchStatus object that indicates the status of the execution. While it’s running, the status is BatchStatus.STARTED, if it fails, the status is BatchStatus.FAILED, and if it finishes successfully, the status is BatchStatus.COMPLETED |
startTime | A java.util.Date representing the current system time when the execution was started. |
endTime | A java.util.Date representing the current system time when the execution finished, regardless of whether or not it was successful. |
exitStatus | The ExitStatus indicating the result of the execution. It is most important because it contains an exit code that will be returned to the caller. See chapter 5 for more details. |
executionContext | The ‘property bag’ containing any user data that needs to be persisted between executions. |
readCount | The number of items that have been successfully read |
writeCount | The number of items that have been successfully written |
commitCount | The number transactions that have been committed for this execution |
rollbackCount | The number of times the business transaction controlled by the Step has been rolled back. |
readSkipCount | The number of times read has failed, resulting in a skipped item. |
processSkipCount | The number of times process has failed, resulting in a skipped item. |
filterCount | The number of items that have been ‘filtered’ by the ItemProcessor. |
writeSkipCount | The number of times write has failed, resulting in a skipped item. |
JobRepository is the persistence mechanism for all of the Stereotypes mentioned above. It provides CRUD operations for JobLauncher, Job, and Step implementations. When a Job is first launched, a JobExecution is obtained from the repository, and during the course of execution StepExecution and JobExecution implementations are persisted by passing them to the repository:
<job-repository id="jobRepository"/>
JobLauncher represents a simple interface for launching a Job with a given set of JobParameters:
public interface JobLauncher { public JobExecution run(Job job, JobParameters jobParameters) throws JobExecutionAlreadyRunningException, JobRestartException; }
ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time. When the ItemReader has exhausted the items it can provide, it will indicate this by returning null. More details about the ItemReader interface and its various implementations can be found in later Chapter , ItemReaders and ItemWriters.
ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation. More details about the ItemWriter interface and its various implementations can be found in later Chapter, ItemReaders and ItemWriters.
ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out. More details about the ItemProcessor interface can be found in later Chapter, ItemReaders and ItemWriters.
Many of the domain concepts listed above need to be configured in a Spring ApplicationContext. While there are implementations of the interfaces above that can be used in a standard bean definition, a namespace has been provided for ease of configuration:
<beans:beans xmlns:beans="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.springframework.org/schema/batch" xsi:schemalocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.0.xsd"> <job id="ioSampleJob"> <step id="step1"> <tasklet> <chunk commit-interval="2" reader="itemReader" writer="itemWriter"></chunk> </tasklet> </step> </job> </beans:beans>
Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…
Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…
Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…
Technology has emerged a lot in the last decade, and now we have artificial intelligence;…
Managing a database is becoming increasingly complex now due to the vast amount of data…
Overview In this article, we will explore Spring Scheduler how we could use it by…