Spring Batch Read CSV File and Write to Database Example

Prerequisites and Dependencies

To start with the Spring Batch project, we need to have **Java 8** or later installed on our system. We also require **Maven** or **Gradle** for building and managing dependencies. For this example, we will use **Maven**. The required dependencies include **spring-batch-core**, **spring-jdbc**, and **mysql-connector-java** for database connectivity.

The **pom.xml** file should include the following dependencies:

<dependencies>
 <dependency>
 <groupId>org.springframework.batch</groupId>
 <artifactId>spring-batch-core</artifactId>
 </dependency>
 <dependency>
 <groupId>org.springframework</groupId>
 <artifactId>spring-jdbc</artifactId>
 </dependency>
 <dependency>
 <groupId>mysql</groupId>
 <artifactId>mysql-connector-java</artifactId>
 </dependency>
</dependencies>

For further reading on **Spring Batch** configuration, visit our article on Configuring Spring Batch.

We also need to create a **database** and a **table** to store the data. The **table** should have the same structure as the **CSV** file. We can use the following **SQL** query to create the **table**:

CREATE TABLE users (
 id INT PRIMARY KEY,
 name VARCHAR(255),
 email VARCHAR(255)
);

The **CSV** file should have the same structure as the **table**. For example:

1,John Doe,[email protected]
2,Jane Doe,[email protected]

We can use the following **Java** class to read the **CSV** file and write to the **database**:

package com.example.springbatch;

import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.core.JdbcTemplate;

public class CSVReader {
 public static void main(String[] args) {
 // Create a JdbcTemplate instance
 JdbcTemplate jdbcTemplate = new JdbcTemplate();
 // Set the data source
 jdbcTemplate.setDataSource(dataSource());
 
 // Create a FlatFileItemReader instance
 FlatFileItemReader<User> reader = new FlatFileItemReader<>();
 // Set the resource
 reader.setResource(new ClassPathResource("users.csv"));
 // Set the line mapper
 reader.setLineMapper(lineMapper());
 
 // Read the CSV file and write to the database
 reader.open(new ExecutionContext());
 User user;
 while ((user = reader.read()) != null) {
 // Insert the user into the database
 jdbcTemplate.update("INSERT INTO users (id, name, email) VALUES (?, ?, ?)", user.getId(), user.getName(), user.getEmail());
 }
 reader.close();
 }

 // Create a line mapper
 private static DefaultLineMapper<User> lineMapper() {
 DefaultLineMapper<User> lineMapper = new DefaultLineMapper<>();
 // Set the line tokenizer
 lineMapper.setLineTokenizer(lineTokenizer());
 // Set the field set mapper
 lineMapper.setFieldSetMapper(fieldSetMapper());
 return lineMapper;
 }

 // Create a line tokenizer
 private static DelimitedLineTokenizer lineTokenizer() {
 DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
 // Set the delimiter
 lineTokenizer.setDelimiter(",");

In-Depth Look at Spring Batch Concepts

A job is the core concept in Spring Batch, representing a batch process that can be executed. It is defined using the Job interface and consists of one or more steps. Each step represents a single unit of work that is executed within the job. The Step interface is used to define a step, which can be either a chunk-oriented step or a tasklet step.

Prerequisites and Dependencies
In-Depth Look at Spring Batch Concepts
Step-by-Step Guide to Configuring Spring Batch
Full Example of Spring Batch CSV to Database Configuration
Common Mistakes and Troubleshooting Tips
Mistake 1: Incorrect Delimiter
Mistake 2: Missing Database Configuration
Production-Ready Tips and Best Practices
Testing and Validating Spring Batch Jobs
Key Takeaways and Conclusion
Advanced Topics and Customization Options
Real-World Applications and Use Cases

A chunk-oriented step is used for bulk processing, where data is read and processed in chunks. This type of step is defined using the ChunkOrientedTasklet interface and consists of three main components: item reader, item processor, and item writer. For example, reading a CSV file and writing to a database would involve using a FlatFileItemReader as the item reader and a jdbcBatchItemWriter as the item writer. For more information on configuring item readers and item writers, see our article on Configuring Item Readers and Writers.

Item processing is the core of Spring Batch, where data is transformed and validated. The ItemProcessor interface is used to define an item processor, which takes an input object, processes it, and returns an output object. This allows for data transformation, filtering, and validation to be performed on the data being processed. The item processor is used in conjunction with the item reader and item writer to form a complete chunk-oriented step.

The job repository is used to store the state of the job and its associated steps. The JobRepository interface is used to define the job repository, which provides methods for storing and retrieving job and step execution data. This allows for the job to be restarted from a previous point of failure, ensuring that data is not lost in the event of a failure. Understanding the job repository is crucial for implementing robust and fault-tolerant batch processes.

Step-by-Step Guide to Configuring Spring Batch

To configure **Spring Batch** to read a CSV file and write to a database, you need to create a **Job** that consists of a **Step**. This **Step** will contain a **Reader**, a **Processor**, and a **Writer**. The **Reader** will read the CSV file, the **Processor** will process the data, and the **Writer** will write the data to the database. The first step is to create a **JobRepository** and a **JobLauncher**. The **JobRepository** is used to store the job's metadata, and the **JobLauncher** is used to launch the job. For more information on **JobRepository** and **JobLauncher**, please refer to our article on Spring Batch Tutorial. To read the CSV file, you can use the **FlatFileItemReader**. This reader will read the CSV file line by line and split each line into an array of strings.

package com.example.springbatch;

import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;

public class CSVReader {
 public FlatFileItemReader<User> reader() {
 FlatFileItemReader<User> reader = new FlatFileItemReader<>();
 reader.setResource(new ClassPathResource("users.csv"));
 reader.setLinesToSkip(1); // skip the header
 reader.setLineMapper(new DefaultLineMapper<>() {{
 setLineTokenizer(new DelimitedLineTokenizer() {{
 setNames(new String[] { "id", "name", "email" });
 }});
 setFieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{
 setTargetType(User.class);
 }});
 }});
 return reader;
 }
}

To write the data to the database, you can use the **JdbcBatchItemWriter**. This writer will write the data to the database in batches.
The expected output will be the data written to the database:

+----+----------+---------------+
| id | name | email |
+----+----------+---------------+
| 1 | John | john@example |
| 2 | Jane | jane@example |
+----+----------+---------------+

For more information on **JdbcBatchItemWriter**, please refer to our article on Spring Batch Item Writer.

Full Example of Spring Batch CSV to Database Configuration

To configure a **Spring Batch** job that reads a CSV file and writes to a database, you need to define a **Job** and a **Step**. The **Job** is the main entry point for the batch process, while the **Step** defines the specific task to be executed. For more information on **Job** configuration, refer to our article on Configuring a Spring Batch Job.

The **Job** is composed of one or more **Step** instances, each of which defines a self-contained batch process. In this example, we will define a single **Step** that reads a CSV file and writes to a database.
To read a CSV file, we will use the **FlatFileItemReader** class, which is a built-in **Spring Batch** class for reading flat files.

package com.example.springbatch;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.item.database.BeanPropertySqlParameterSource;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

import javax.sql.DataSource;

@Configuration
@EnableBatchProcessing
public class BatchConfig {
 
 @Autowired
 public JobBuilderFactory jobBuilderFactory;
 
 @Autowired
 public StepBuilderFactory stepBuilderFactory;
 
 @Autowired
 public DataSource dataSource;
 
 @Bean
 public FlatFileItemReader<User> reader() {
 // We are using a FlatFileItemReader to read the CSV file
 return new FlatFileItemReaderBuilder<User>()
 .resource(new ClassPathResource("users.csv"))
 .delimited()
 .names("name", "email")
 .fieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{
 setTargetType(User.class);
 }})
 .build();
 }
 
 @Bean
 public JdbcBatchItemWriter<User> writer() {
 // We are using a JdbcBatchItemWriter to write to the database
 return new JdbcBatchItemWriterBuilder<User>()
 .itemSqlParameterSourceProvider(new BeanPropertySqlParameterSource<>())
 .sql("INSERT INTO users (name, email) VALUES (:name, :email)")
 .dataSource(dataSource)
 .build();
 }
 
 @Bean
 public Job importUserJob() {
 // We are defining a Job that consists of a single Step
 return jobBuilderFactory.get("importUserJob")
 .flow(step())
 .end()
 .build();
 }
 
 @Bean
 public Step step() {
 // We are defining a Step that reads from a CSV file and writes to a database
 return stepBuilderFactory.get("step")
 .<User, User>chunk(10)
 .reader(reader())
 .writer(writer())
 .build();
 }
}

When you run this **Job**, it will read the CSV file and write the data to the database. The expected output will be:

+----+----------+---------------+
| id | name | email |
+----+----------+---------------+
| 1 | John | john@example |
| 2 | Jane | jane@example |
+----+----------+---------------+

For further reading on **Spring Batch** configuration, refer to our article on Configuring Spring Batch.

Common Mistakes and Troubleshooting Tips

When using Spring Batch to read a CSV file and write to a database, several common pitfalls can occur. One of the most frequent mistakes is incorrect configuration of the FlatFileItemReader.
A wrong configuration can lead to a java.lang.IllegalStateException exception.
For more information on Spring Batch configuration, visit our Spring Batch configuration best practices page.

Mistake 1: Incorrect Delimiter

The following code shows an example of incorrect delimiter configuration:

public class IncorrectDelimiterConfig {
 // WRONG
 @Bean
 public FlatFileItemReader<String> reader() {
 FlatFileItemReader<String> reader = new FlatFileItemReader<>();
 reader.setResource(new ClassPathResource("example.csv"));
 // using incorrect delimiter
 reader.setDelimiter("|"); // WRONG
 return reader;
 }
}

This will result in a java.lang.IllegalStateException exception with the message "Input resource must exist".
The correct configuration is:

public class CorrectDelimiterConfig {
 @Bean
 public FlatFileItemReader<String> reader() {
 FlatFileItemReader<String> reader = new FlatFileItemReader<>();
 reader.setResource(new ClassPathResource("example.csv"));
 // using correct delimiter
 reader.setDelimiter(","); // correct delimiter
 return reader;
 }
}

Production tip: Always verify the delimiter used in the CSV file to avoid incorrect data processing.

Mistake 2: Missing Database Configuration

Another common mistake is missing database configuration.
The following code shows an example of missing database configuration:

public class MissingDatabaseConfig {
 // WRONG
 @Bean
 public DataSource dataSource() {
 // missing database configuration
 return null; // WRONG
 }
}

This will result in a java.lang.NullPointerException exception.
The correct configuration is:

public class CorrectDatabaseConfig {
 @Bean
 public DataSource dataSource() {
 // correct database configuration
 return DataSourceBuilder.create()
 .driverClassName("com.mysql.cj.jdbc.Driver")
 .url("jdbc:mysql://localhost:3306/example")
 .username("username")
 .password("password")
 .build();
 }
}

For more information on database configuration, visit our database configuration in Spring Boot page.
The expected output of the correct configuration is:

Data successfully written to the database

Production tip: Always verify the database configuration to ensure correct data writing.

Production-Ready Tips and Best Practices

When deploying and running Spring Batch jobs in a production environment, it is crucial to consider several key factors to ensure reliability, scalability, and maintainability. One of the primary concerns is job configuration, which involves defining the JobRepository and JobLauncher beans. For more information on configuring these components, refer to our article on Configuring Spring Batch.

Production tip: Use a robust database as the backend for your JobRepository to ensure that job execution data is persisted reliably.

To ensure that Spring Batch jobs can be executed efficiently, it is essential to implement retry mechanisms and error handling strategies. This can be achieved by using the RetryTemplate and SkipPolicy interfaces provided by Spring Batch.

Production tip: Implement a retry policy that takes into account the type of exception that occurred during job execution to determine the appropriate retry strategy.

In addition to implementing retry mechanisms and error handling strategies, it is also important to consider job monitoring and logging to ensure that issues can be identified and resolved quickly. This can be achieved by using tools such as Spring Boot Actuator and logging frameworks like Logback or Log4j. For more information on monitoring and logging Spring Batch jobs, refer to our article on Monitoring Spring Batch Jobs.

Production tip: Use a logging framework to log important events and errors that occur during job execution, and consider integrating with a monitoring tool to provide real-time visibility into job execution.

By following these production-ready tips and best practices, developers can ensure that their Spring Batch jobs are reliable, scalable, and maintainable, and can be executed efficiently in a production environment. For further reading on Spring Batch and its applications, refer to our article on Spring Batch Tutorial.

Testing and Validating Spring Batch Jobs

When developing **Spring Batch** applications that read CSV files and write to databases, a thorough testing strategy is crucial to ensure data integrity and job reliability. Testing **Spring Batch** jobs involves verifying the correctness of the **JobExecution** and **StepExecution**. To achieve this, developers can leverage the **JobLauncherTestUtils** and **JobRepositoryTestUtils** classes provided by Spring Batch.

Testing a **Spring Batch** job that reads a CSV file and writes to a database involves several steps, including setting up a test database, creating a **JobLauncher**, and launching the job. The **JobLauncher** is responsible for executing the job, while the **JobRepository** is used to store the job's execution history. For more information on **Spring Batch** job configuration, see our article on Configuring Spring Batch Jobs.

To test a **Spring Batch** job, developers can create a test class that extends the **AbstractJobLauncherTest** class. This class provides methods for launching the job and verifying its execution. The following example demonstrates how to test a **Spring Batch** job that reads a CSV file and writes to a database:

import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobLauncher;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.test.JobLauncherTestUtils;
import org.springframework.batch.test.JobRepositoryTestUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
import org.junit.Test;
import org.junit.runner.RunWith;

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "/test-context.xml")
public class JobLauncherTest {
 
 @Autowired
 private JobLauncherTestUtils jobLauncherTestUtils;
 
 @Autowired
 private JobRepositoryTestUtils jobRepositoryTestUtils;
 
 @Test
 public void testJob() throws Exception {
 // launch the job
 JobExecution execution = jobLauncherTestUtils.launchJob(new JobParameters());
 
 // verify the job execution
 assertEquals(1, execution.getStepExecutions().size());
 // verify that the step execution was successful
 assertEquals("COMPLETED", execution.getStepExecutions().get(0).getStatus());
 }
}

The expected output of the test will be:

1 step execution(s) were found
Step execution with id 1 was successful

By using the **JobLauncherTestUtils** and **JobRepositoryTestUtils** classes, developers can easily test and validate their **Spring Batch** jobs, ensuring that they are working correctly and reliably. For further reading on **Spring Batch** testing, see our article on Testing Strategies for Spring Batch Applications.

Key Takeaways and Conclusion

The Spring Batch framework provides a robust way to read CSV files and write to a database. The key to a successful implementation is understanding the job configuration and the step execution process. By using the FlatFileItemReader and ItemWriter interfaces, developers can easily read and write data to various sources. For a more detailed explanation of the JobRepository and its role in managing job executions, refer to our guide on Configuring the JobRepository for Spring Batch.

When reading a CSV file, it is essential to define the field set and the corresponding domain model. The BeanWrapperFieldExtractor can be used to map the CSV columns to the domain model properties. Additionally, the DelimitedLineTokenizer can be used to tokenize the CSV lines into individual fields. By using these components, developers can easily handle complex CSV file structures.

Writing data to a database involves using the ItemWriter interface and a database connection. The JdbcBatchItemWriter is a popular choice for writing data to a relational database. It provides a simple way to execute SQL statements in batch mode, improving performance and reducing the overhead of individual inserts. For more information on optimizing database performance with Spring Batch, see our article on Optimizing Database Performance with Spring Batch.

As a next step, developers can explore more advanced features of Spring Batch, such as job partitioning and remote chunking. These features enable the processing of large datasets in parallel, further improving the performance and scalability of batch applications. By mastering these techniques, developers can build robust and efficient batch processing systems that meet the demands of modern enterprise applications.

Advanced Topics and Customization Options

When working with Spring Batch, **error handling** is crucial to ensure that the batch process can recover from failures. The RetryTemplate class provides a way to implement retry mechanisms, allowing the batch process to retry failed operations. This can be particularly useful when dealing with external resources, such as databases or file systems, that may be temporarily unavailable. By configuring the RetryTemplate with a **retry policy**, you can specify the number of attempts and the backoff period between attempts.

To implement **error handling** in a Spring Batch job, you can use the JobExecutionListener interface to listen for job execution events, such as job failures. This allows you to take corrective action, such as sending notifications or logging errors, when a job fails. For more information on implementing job execution listeners, see our article on Spring Batch Job Execution Listeners.

In addition to error handling, Spring Batch also provides a range of **customization options**, including the ability to create custom **item readers** and **item writers**. The ItemReader interface provides a way to read data from a variety of sources, including files, databases, and messaging systems. By implementing a custom item reader, you can read data from a specific source, such as a CSV file, and process it using a Spring Batch job.

The ItemWriter interface provides a way to write data to a variety of destinations, including databases, files, and messaging systems. By implementing a custom item writer, you can write data to a specific destination, such as a relational database, and configure the writer to handle errors and exceptions. By using these customization options, you can create complex batch processes that meet the specific needs of your application.

Real-World Applications and Use Cases

Spring Batch is a powerful tool for reading CSV files and writing to databases, with a wide range of real-world applications. For example, in the field of finance, Spring Batch can be used to read transaction.csv files and write the data to a relational database for further analysis. This can help identify trends and patterns in financial data, enabling better decision-making. The FlatFileItemReader class is particularly useful for reading CSV files.

In the field of e-commerce, Spring Batch can be used to read product information from product.csv files and write it to a database for use in online catalogs. This can help to automate the process of updating product information, reducing the risk of errors and improving efficiency. For more information on using Spring Batch for e-commerce applications, see our article on using Spring Batch for e-commerce data integration.

Another example of a real-world application for Spring Batch is in the field of data science, where it can be used to read large datasets from data.csv files and write them to a NoSQL database for analysis. This can help to identify trends and patterns in large datasets, enabling data scientists to gain insights and make predictions. The ItemWriter interface is particularly useful for writing data to a database.

Spring Batch can also be used in the field of healthcare, where it can be used to read patient data from patient.csv files and write it to a database for use in electronic health records. This can help to improve the accuracy and efficiency of patient care, by providing healthcare professionals with access to up-to-date and accurate patient information. By using the JobLauncher class, developers can launch and manage Spring Batch jobs, making it easier to integrate with existing healthcare systems.