Spring Batch Tutorial

What is Spring Batch

Spring Batch is an open-source lightweight and comprehensive batch framework designed to develop batch applications that can process high volume of data.

Spring Batch provides the necessary classes and APIs supporting the following items:

  • Transaction management
  • Chunk based processing
  • Declarative I/O to read and write resources
  • Start/Stop/Restart
  • Retry/Skip
  • Web based administration interface (With Spring Batch Admin)

Spring Batch can execute a sequence of Jobs. A job consists of many steps that can be chained together. There are two types of steps:

  1. READ-PROCESS-WRITE steps : They consist of reading data for a source (Database, File, etc), process that data and finally write it to a resources (Database, File, etc).
  2. Tasklets or Single Task steps: They consis of sigle tasks, for example: deleting temporary files after the execution of the other steps.

Examples of tasks

Here are some examples of use of Spring Batch Tasks and the convenient type to use with each task:

  • READ-WRITE-PROCESS : Read data from MySQL database, process it and write it to CSV file.
  • READ-WRITE-PROCESS: Read Data from files in Folder A, process it, and write data to folder B .
  • TASKLET : Send newsletter to subscribers.
  • TASKLET : Clean up a folder.

How to define jobs and tasks

In Spring Batch we can use Annotations or Config files to define jobs and tasks sequence.

Here is an example of configuration using XML :

<job id="myJob" xmlns="http://www.springframework.org/schema/batch">
 <step id="step1" next="step2">
 <tasklet>
  <chunk reader="txtItemReader" processor="txtItemProcesser" writer="txtItemWriter"  commit-interval="1" />
 </tasklet>
 </step>
 <step id="step2" next="step3">
   <tasklet>
    <chunk reader="txtItemReader" writer="dbItemWriter" processor="dbItemProcesser" commit-interval="10" />
    </tasklet>
 </step>
 <step id="step3">
    <tasklet ref="sendReportEmail" />
 </step>
</job>

 

Use case example

In this example, we will use Spring Batch to read data from a MySql database, process it , and finally write it to a CSV file.

For this tutorial, we will be using the following versions :

  • spring-core : 4.3.10.RELEASE, Spring framework core
  • spring-batch-core: 3.0.8.RELEASE, Spring Batch core
  • mysql-connector-java: 5.1.22 : Contains My SQL driver and other utility classes
  • spring-jdbc: 4.3.10.RELEASE, to support the communication between Spring Batch and MySQL
  • Eclipse EDI: Neon 4.6.2
  • Maven :  Apache Maven 3.0.5

the pom.xml file

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.tutoref</groupId>
  <artifactId>spring-batch-example</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>spring-batch-example</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
   <!-- Spring core -->
   <dependency>
     <groupId>org.springframework</groupId>
     <artifactId>spring-core</artifactId>
     <version>4.3.10.RELEASE</version>
   </dependency>
   <!-- Spring batch core -->
    <dependency>
        <groupId>org.springframework.batch</groupId>
        <artifactId>spring-batch-core</artifactId>
        <version>3.0.8.RELEASE</version>
    </dependency>
    <!-- MySQL connector -->
    <dependency>
     <groupId>mysql</groupId>
     <artifactId>mysql-connector-java</artifactId>
     <version>5.1.22</version>
   </dependency>
   <!-- Spring jdbc -->
   <dependency>
     <groupId>org.springframework</groupId>
     <artifactId>spring-jdbc</artifactId>
     <version>4.3.10.RELEASE</version>
   </dependency>
  </dependencies>
</project>

 

Database structure and data

For the example we will create a products table and populate it with some data using the following script.

CREATE TABLE `product` (
  `id` int(11) NOT NULL,
  `name` varchar(64) NOT NULL,
  `unit_price` double NOT NULL,
  `quantity` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `product` (`id`, `name`, `unit_price`, `quantity`) VALUES
(1, 'Phone', 300, 50),
(2, 'Tablet', 250, 20),
(3, 'Mouse', 20, 30),
(4, 'Keyboard', 20, 30),
(5, 'Speaker', 40, 30),
(6, 'Screen', 300, 40);

 

The Project XML Configuration

As mentioned before, we can do the same configuration using Annotations. for this example, we will be using xml files to configure Spring Batch.

XML configuration can be written into the same file, however it is a good practice to separate them (Spring context, Data sources and jobs description ).

The datasource.xml file

This file contains the information of the datasource used to comunicate with the database.

<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:jdbc="http://www.springframework.org/schema/jdbc"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.springframework.org/schema/beans
  http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
  http://www.springframework.org/schema/jdbc
  http://www.springframework.org/schema/jdbc/spring-jdbc-3.2.xsd">
  
 <bean id="productDataSource"
  class="org.springframework.jdbc.datasource.DriverManagerDataSource">
  <property name="driverClassName" value="com.mysql.jdbc.Driver" />
  <property name="url" value="jdbc:mysql://localhost:3306/tutoref" />
  <property name="username" value="root" />
  <property name="password" value="replaceWithYours" />
 </bean>

 <bean id="transactionManager"
  class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

</beans>
The spring-context.xml file

This file contains the Spring context configuration.

<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="
  http://www.springframework.org/schema/beans
  http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">

 <!-- Job Repository, Job Launcher and transactionManager configuration -->
 <bean id="jobRepository"
        class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />
  
    <bean id="jobLauncher"
        class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository" />
    </bean>

</beans>
The job-products.xml file

This file contains the jobs and tasks definition. It does also include the two previous xml files.

 

<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:batch="http://www.springframework.org/schema/batch"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.springframework.org/schema/batch
  http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
  http://www.springframework.org/schema/beans
  http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
 ">
 
 <import resource="spring-context.xml" />
 <import resource="datasource.xml" />

 <bean id="product" class="com.tutoref.batch.entity.Product" scope="prototype" />
 <bean id="itemProcessor" class="com.tutoref.batch.ProductItemProcessor" />
 <bean id="jobListener" class="com.tutoref.batch.ProductJobListener" />
 
 
    <!-- Reading from the database and returning a mapper row -->
    <bean id="productItemReader"
        class="org.springframework.batch.item.database.JdbcCursorItemReader">
        <property name="dataSource" ref="productDataSource" />
        <property name="sql" value="SELECT * FROM product" />
        <property name="rowMapper">
            <bean class="com.tutoref.batch.ProductMapper" />
        </property>

    </bean>
    
    <!-- Writing a line into an output flat file -->
    <bean id="productFlatFileItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
        <property name="resource" value="file:csv/products.csv" />
   <!-- Converting a product object into delimited list of strings -->
        <property name="lineAggregator">
            <bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
                <property name="delimiter" value="|" />
                <property name="fieldExtractor">
                    <!-- Returning the value of beans property using reflection -->
                    <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
                        <property name="names" value="name,unitPrice,quantity" />
                    </bean>
                </property>
            </bean>
        </property>
    </bean>
    
    <!-- And finally ... the job definition -->
    <batch:job id="productJob">
        <batch:step id="step1">
            <batch:tasklet transaction-manager="transactionManager">
                <batch:chunk reader="productItemReader" writer="productFlatFileItemWriter"
                    processor="itemProcessor" commit-interval="10" />
            </batch:tasklet>
        </batch:step>
        <batch:listeners>
            <batch:listener ref="jobListener" />
        </batch:listeners>
    </batch:job>
</beans>

The file declares also all the beans used by the jobs.

  • product: A reference to the Product Transfer/Value Bean.
  • itemProcessor: A Reference to class responsible for processing items
  • jobListener : A Refernece to the class responsible for listening to job processing events.
  • productItemReader: A reference to the class responsible for reading data from the database.
  • productFlatFileItemWriter: A reference to the class responsible for writing data to csv files.

The xml above make references to the following classes we created for this example.

Project source code

The Product Bean

Represents a product

package com.tutoref.batch.entity;

import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;

/**
 * The Product Bean
 * 
 * @author tutoref
 *
 */

public class Product {

 private int id;
 private String name;
 private double unitPrice;
 private int quantity;
 
 public int getId() {
  return id;
 }
 
 public void setId(int id) {
  this.id = id;
 }
 
 public String getName() {
  return name;
 }
 
 public void setName(String name) {
  this.name = name;
 }
 
 public double getUnitPrice() {
  return unitPrice;
 }
 
 public void setUnitPrice(double unitPrice) {
  this.unitPrice = unitPrice;
 }
 
 public int getQuantity() {
  return quantity;
 }
 
 public void setQuantity(int quantity) {
  this.quantity = quantity;
 }
 
 @Override
 public String toString() {
  return "[" + this.id + " | "+this.name+"]";
 }
}
The Product mapper

This class must implement the RowMapper Class and override the rowMap() method. It is responsible for converting a ResultSet row to a Product object.

package com.tutoref.batch;

import java.sql.ResultSet;
import java.sql.SQLException;
import org.springframework.jdbc.core.RowMapper;

import com.tutoref.batch.entity.Product;

public class ProductMapper implements RowMapper<Product> {

 @Override
 public Product mapRow(ResultSet rs, int rowNum) throws SQLException {
  Product product = new Product();
  product.setId(rs.getInt("id"));
  product.setName(rs.getString("name"));
  product.setQuantity(rs.getInt("quantity"));
  product.setUnitPrice(rs.getDouble("unit_price"));
  return product;
 }

 
}

 

The Item Processor

This object processes an item retrieved from the database. It must implement the ItemProcessor interface and override the process() method.

package com.tutoref.batch;

import org.springframework.batch.item.ItemProcessor;

import com.tutoref.batch.entity.Product;

public class ProductItemProcessor implements ItemProcessor<Product,Product>{

 public Product process(Product product) throws Exception {
  System.out.println("Processing item "+product);
  return product;
 }

}
The Job Listener

This object is responsible for listening to items processing events. It must implement the JobExecutionListener and override the beforeJob() and afterJob() methods.

package com.tutoref.batch;

import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;

public class ProductJobListener implements JobExecutionListener {

 
 public void beforeJob(JobExecution job) {
  System.out.println("Before processing job : " + job.getId() );
 }
 
 public void afterJob(JobExecution job) {
  System.out.println("After processing job : " + job.getId() );
 }

}
The Main program

This class executes the job. It references the Spring Batch context file (job-products.xml).

package com.tutoref.batch;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionException;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

/**
 * Spring batch example
 * 
 * Main Clas
 *
 */
public class App 
{
    public static void main(String[] args){
     
     ApplicationContext context = new ClassPathXmlApplicationContext("job-products.xml");
        
        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("productJob");
      
        try {
            JobExecution execution = jobLauncher.run(job, new JobParameters());
            System.out.println("Job Exit Status : "+ execution.getStatus());
      
     } catch (JobExecutionException e) {
            System.out.println("The Job has failed :" + e.getMessage());
            e.printStackTrace();
        }
        
    }
}

Project structure

Finally  and to recapitulate, the project structure will look like the following.

Execution results

The execution of the code will print the following  trace on the console.

Before processing job : 0
INFO: Executing step: [step1]
Processing item [1 | Phone]
Processing item [2 | Tablet]
Processing item [3 | Mouse]
Processing item [4 | Keyboard]
Processing item [5 | Speaker]
Processing item [6 | Screen]
After processing job : 0
Job Exit Status : COMPLETED

 

And generate an xml file with the following content:

Phone|300.0|50
Tablet|250.0|20
Mouse|20.0|30
Keyboard|20.0|30
Speaker|40.0|30
Screen|300.0|40