Friday, April 30, 2010

Spring Batch integration module for GridGain

For the purpose of using Spring Batch in a scalable and distributed manner to process huge amount of data, I am actually developing some components to make integration of Spring Batch with compute/data grid easier.

Different solutions is offered by Spring Batch to provide scalability, the one that best suit my needs is remote chunking.

As I already done some investigation before using GridGain I chose this framework to implement a distributed remote chunking system that can be easily integrated into any existing Spring Batch systems.

Using GridGain is really straightforward, and setting up a grid on a development machine doesn't need so much configuration.

The only issue I faced is due to the fact that GridGain use serialization to deploy tasks on nodes, in order to be able to deploy a remote ChunkProcessor, it must contains serializable ItemProcessor and ItemWriter, which unfortunately is not the case by default.

So instead of creating new interfaces, I made a SerializableChunkProcessor which only accept serializable ItemProcessor and ItemWriter. It's surely not the smarter solution, but since I can't modify default interfaces in Spring Batch and I don't want to create my own interfaces, this workaround will suffice.

Usage
Here is the job application context used for the integration test, as you can see the 'real' ItemProcessor / ItemWriter are injected into the GridGain chunk writer:

Download
You can download the spring-batch-integration-gridgain module here:
http://github.com/downloads/aloiscochard/spring-batch-integration-gridgain/spring-batch-integration-gridgain-0.0.1-SNAPSHOT.jar

If you want to see a full working sample, take a look at the integration test. The full project sources can be downloaded here:
http://github.com/aloiscochard/spring-batch-integration-gridgain