Christopher Jeffers
August 2012
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
2
• Lightweight framework designed to enable the development of robust batch applications used in enterprise systems
• As a part of Spring, it builds on the ease of use of the POJObased development approach, while making it easy for developers to use more advanced enterprise services when necessary
• Provides reusable functions that are essential in processing large volumes of data
• Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs
3
• DataRoomBatch
– Physically delete all rows marked for deletion from a given bucket (DeepSix)
– Rerun user documents through publishing workflow
– Proactive auditing of the environment
• Public Records Batch Processing
– User inputs file with search criteria for many individuals and program searches database for changes in information, returning a report of hits to user
– Read, Process, and Write sequence
– Satisfies Government and Corporate requirements
4
• Current batch system for public records is not powerful enough to handle very large requests
• Have had to turn away customers because of this
• A more powerful and flexible batch solution could solve this problem
5
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
6
• Layered architecture
• The application layer contains all batch jobs and custom code
• Batch Core contains runtime classes necessary to launch and control a batch job
• Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html
7
• A Job entity encapsulates an entire batch process
• A Job is comprised of Steps, which encapsulate a phase of a batch job
– Step can be as complex or simple as developer wants http://static.springsource.org/spring-batch/reference/html/domain.html
8
• Typical Spring Batch Step
– Read, Process, Write sequence
• Multiple items are read and processed before being written as a “chunk”
– Size of chunk declared in configuration (commit-interval) http://static.springsource.org/spring-batch/reference/html/configureStep.html
9
• Steps can be configured to flow sequentially or conditionally
– Allows for some complex jobs http://static.springsource.org/spring-batch/reference/html/configureStep.html
10
• The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution
– Example: Job Parameters, Job/Step status, etc.
http://static.springsource.org/spring-batch/reference/html/domain.html
11
• Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution
• Used for exceptions that will be thrown on every attempt of the Step
– FileNotFoundException, Parse Exceptions, etc.
• SkipListener can be used to log skipped items
12
• If an exception listed in the configuration is thrown, the operation is attempted again
• Used for exceptions that may not be thrown on every attempt of the Step
– ConcurrencyFailureException,
DeadlockLoserDataAccessException, etc.
• Can set a limit on number of retries
• RetryListener can be used to log retried items
• RetryTemplate can be used to further customize retry logic
13
• Multi-Threaded Jobs or Steps
– Using Spring’s TaskExecutor object
• Parallel Steps
– Using split flows and a TaskExecutor in Job configuration.
http://static.springsource.org/spring-batch/reference/html/scalability.html
14
• Remote Chunking
– Splits Step processing across multiple processes, using some middleware to communicate http://static.springsource.org/spring-batch/reference/html/scalability.html
15
• Step Partitioning
– Splits input and executes remote steps in parallel
– PartitionHandler sends StepExecution requests to remote steps
– Partitioner generates the input for new step executions http://static.springsource.org/spring-batch/reference/html/scalability.html
16
17
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
18
• DataRoomBatch (DeepSix Example)
– Bucket is input to JdbcCursorItemReader
– Create an Item Processor to check if the row is marked for deletion and delete it if so
– Item Writer could be empty or used to output statistics
– Partitioning easily done by dividing up number of rows per partition
19
• Public Records Batch Processing
– Input file is input to FlatFileItemReader
– Custom Item Processor to search the database for hits
– Custom Item Writer to compile report of search results
– Following step to send report to user
– Easy to implement a Partitioner for the input file
20
• Part of Spring Framework
– Allows easy integration with other Spring features
– General simplicity offered by Spring
• Step flow customizable
• Basic Item Readers and Writers already available
• Features available for monitoring Jobs and Steps
• Many scaling options available
21
• No built-in scheduler
– Not a big issue, scheduler libraries easily integrated
• Potentially a lot of XML configuration
– Business logic across Java and XML files can complicate debugging and maintenance
– Annotations can help
• Anything but very basic components will need to be created as new classes
22
• Spring Batch Admin
– Web-Based administration console
– Contains Spring Batch Integration, allowing use of Spring
Integration messages to launch and monitor jobs
• Scheduler (cron, Spring Scheduling, Quartz)
• Clustering Framework (Hadoop, GridGain,
Terracotta)
– Ideal for improving horizontal scaling
– Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop
23
• Get Spring Batch set up with a clustered environment
– Evaluate performance
– Figure out dynamic load balancing
• Play around with more features and integration options
– Spring Batch Admin, manual job restarting, etc.
• Implement Spring Batch Admin into Cobalt GUI?
• Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs
• Look into Partitioning and how much must be done to implement sending partitions off to remote machines
• Look into job/step timeout
24