Developer Without Future: Fork Join to transform list of objects

I'm starting this new blog with a little framework that is aimed to simplify the processing of lists in Java.

I'm quite sure in your career you have faced the following use case: given a list of objects you have to transform it in a new list of objects, where each object is obtained through a transformation (complicated or simple) of the original objects.

The simple way to resolve this problem is to iterate through the original list, transform each object and add it to a new list. It is something like this:

List<Object> newList = new ArrayList<Object>();
for (Object originalObject : originalList) {
newList.add(transform(originalObject));
}

It is simple, isn't it?

The problem may arise when you have a huge amount of objects.

Keep in mind that we're not talking about big data, so the 'huge' word should be intended as thousand of objects, a sizing that is too little to start thinking about a complicated map/reduce algorithm but, at the same time, it has impact on performances if the process if executed sequentially like showed in the code snippet above.

The solution I developed is based on the Fork/Join pool introduced in Java 7. Probably with Java 8, with its support for lambdas, there is a more elegant and coincise solution, but our target JVM at the moment is JDK 7, so I focused on this platform.

This is the generic list transformer.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.RecursiveTask;

public class ListTransformer<T, S> extends RecursiveTask<List<T>> {

private static final long serialVersionUID = 1L;

static final int LIMIT = 1000;

private List<S> sources;
private TransformFunction<T, S> function;

public ListTransformer ( List<S> sources, TransformFunction<T, S> function ) {

super ();
this.sources = sources;
this.function = function;
}

@Override
protected List<T> compute () {

if ( sources.size() < LIMIT ) {
List<T> preparedEntities = new ArrayList<T> ();
for ( S source : sources ) {
preparedEntities.add ( function.transform( source ) );
}
return preparedEntities;
}
else {

int divider = sources.size () / 2;

ListTransformer<T, S> curLeft = new ListTransformer<T, S> ( sources.subList ( 0, divider ), function );

ListTransformer<T, S> curRight = new ListTransformer<T, S> ( sources.subList ( divider, sources.size() ), function );

curRight.fork ();
List<T> leftResult = curLeft.compute ();
List<T> rightResult = curRight.join ();
leftResult.addAll ( rightResult );

return leftResult;
}
}

public interface TransformFunction<T, S> {

T transform (S source);
}
}

The class is parameterized: S should be intended as the source object, T as the transformed object. The process is quite simple. If the size of the processed list is above a reasonable limit, we create two new subtasks, firing the first and wait for the completion of the second, which can trigger internally other tasks until we get a list of the desired size to apply the transformation.

Let's see it in action. Suppose you have to deal with the trivial task of transforming a list of Integer into a list of String. This is a test case that shows how to implement this task with a Fork / Join approach

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;

import org.junit.Assert;
import org.junit.Test;

public class ListTransformerTest {

@Test
public void testList() {
List<Integer> source = new ArrayList<Integer>();
for (int i = 0; i < 5000; i ++) {
source.add(i);
}

ForkJoinPool pool = new ForkJoinPool ();

List<String> transform = pool.invoke ( new ListTransformer<String, Integer> ( source,
new ListTransformer.TransformFunction<String, Integer>() {

@Override
public String transform ( Integer source ) {

return source.toString ();
}

} ) );

for (int i = 0; i < 5000; i ++) {
Assert.assertEquals( "" + i, transform.get ( i ) );
}
}
}

The test checks that the order of the new list is what we're expecting. This is important, because I don't think you want that the new list contains objects in random order :-)

Now it's time to demonstrate that this is not only an exercise but it can actually give you much better performances. I created a simple JMH benchmark (in a future post I will talk about this great framework) to test the performances of the Fork / Join approach compared to the traditional sequential approach. The tests were conducted on a MacBook Pro late 2014 model (core I7 2.4 GHz, 16 GB Ram)

Benchmark (size) Mode Cnt Score Error

ListTransformerBenchmark.testNew 1000 thrpt 5 26182,426 ± 899,082 ops/s

ListTransformerBenchmark.testNew         2000 thrpt    5 16690,305 ± 2213,340 ops/s
ListTransformerBenchmark.testNew         5000 thrpt    5 9268,967 ± 257,888 ops/s
ListTransformerBenchmark.testNew        50000 thrpt    5 1356,710 ± 81,285 ops/s
ListTransformerBenchmark.testStandard    1000 thrpt    5 32834,898 ± 870,725 ops/s
ListTransformerBenchmark.testStandard    2000 thrpt    5 15759,069 ± 884,292 ops/s

ListTransformerBenchmark.testStandard 5000 thrpt 5 6198,490 ± 253,743 ops/s

ListTransformerBenchmark.testStandard 50000 thrpt 5 561,933 ± 27,503 ops/s

The testNew, as you may guess, is the method that use the Fork/Join transformation, while the testStandard method use a simple loop.

The results show that for a small list, the traditional approach is faster, probably because the overhead necessary to create the ForkJoin pool is heavier than the transformation process itself. When the list reaches a size of some thousand of objects, the multithreaded approach shines. We can see that for a list of 5000 objects, the score reached for the Fork/Join transformation is 9228 operation per second, while the sequential process stops at 6199 operations per second. A nice gain of performance I think.

For bigger lists, the Fork/Join approach shows better performances. For a list of 50000 objects, the multithreaded transformer grants a huge gain of performance (the 240%).

If you're in need of extreme optimization on your codebase, I think that this is a valuable solution.

Cheers

Developer Without Future

giovedì 12 marzo 2015

Fork Join to transform list of objects

Nessun commento:

Posta un commento