I'm quite sure in your career you have faced the following use case: given a list of objects you have to transform it in a new list of objects, where each object is obtained through a transformation (complicated or simple) of the original objects.
The simple way to resolve this problem is to iterate through the original list, transform each object and add it to a new list. It is something like this:
List<Object> newList = new ArrayList<Object>();
for (Object originalObject : originalList) {
newList.add(transform(originalObject));
}
It is simple, isn't it?
The problem may arise when you have a huge amount of objects.
Keep in mind that we're not talking about big data, so the 'huge' word should be intended as thousand of objects, a sizing that is too little to start thinking about a complicated map/reduce algorithm but, at the same time, it has impact on performances if the process if executed sequentially like showed in the code snippet above.
The solution I developed is based on the Fork/Join pool introduced in Java 7. Probably with Java 8, with its support for lambdas, there is a more elegant and coincise solution, but our target JVM at the moment is JDK 7, so I focused on this platform.
This is the generic list transformer.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.RecursiveTask;
public class ListTransformer<T, S> extends RecursiveTask<List<T>> {
private static final long serialVersionUID = 1L;
static final int LIMIT = 1000;
private List<S> sources;
private TransformFunction<T, S> function;
public ListTransformer ( List<S> sources, TransformFunction<T, S> function ) {
super ();
this.sources = sources;
this.function = function;
}
@Override
protected List<T> compute () {
if ( sources.size() < LIMIT ) {
List<T> preparedEntities = new ArrayList<T> ();
for ( S source : sources ) {
preparedEntities.add ( function.transform( source ) );
}
return preparedEntities;
}
else {
int divider = sources.size () / 2;
ListTransformer<T, S> curLeft = new ListTransformer<T, S> ( sources.subList ( 0, divider ), function );
ListTransformer<T, S> curRight = new ListTransformer<T, S> ( sources.subList ( divider, sources.size() ), function );
curRight.fork ();
List<T> leftResult = curLeft.compute ();
List<T> rightResult = curRight.join ();
leftResult.addAll ( rightResult );
return leftResult;
}
}
public interface TransformFunction<T, S> {
T transform (S source);
}
}
The class is parameterized: S should be intended as the source object, T as the transformed object. The process is quite simple. If the size of the processed list is above a reasonable limit, we create two new subtasks, firing the first and wait for the completion of the second, which can trigger internally other tasks until we get a list of the desired size to apply the transformation.
Let's see it in action. Suppose you have to deal with the trivial task of transforming a list of Integer into a list of String. This is a test case that shows how to implement this task with a Fork / Join approach
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import org.junit.Assert;
import org.junit.Test;
public class ListTransformerTest {
@Test
public void testList() {
List<Integer> source = new ArrayList<Integer>();
for (int i = 0; i < 5000; i ++) {
source.add(i);
}
ForkJoinPool pool = new ForkJoinPool ();
List<String> transform = pool.invoke ( new ListTransformer<String, Integer> ( source,
new ListTransformer.TransformFunction<String, Integer>() {
@Override
public String transform ( Integer source ) {
return source.toString ();
}
} ) );
for (int i = 0; i < 5000; i ++) {
Assert.assertEquals( "" + i, transform.get ( i ) );
}
}
}
The test checks that the order of the new list is what we're expecting. This is important, because I don't think you want that the new list contains objects in random order :-)
Now it's time to demonstrate that this is not only an exercise but it can actually give you much better performances. I created a simple JMH benchmark (in a future post I will talk about this great framework) to test the performances of the Fork / Join approach compared to the traditional sequential approach. The tests were conducted on a MacBook Pro late 2014 model (core I7 2.4 GHz, 16 GB Ram)
Benchmark (size) Mode Cnt Score Error
ListTransformerBenchmark.testNew 1000 thrpt 5 26182,426 ± 899,082 ops/s
ListTransformerBenchmark.testNew 2000 thrpt 5 16690,305 ± 2213,340 ops/s
ListTransformerBenchmark.testNew 5000 thrpt 5 9268,967 ± 257,888 ops/s
ListTransformerBenchmark.testNew 50000 thrpt 5 1356,710 ± 81,285 ops/s
ListTransformerBenchmark.testStandard 1000 thrpt 5 32834,898 ± 870,725 ops/s
ListTransformerBenchmark.testStandard 2000 thrpt 5 15759,069 ± 884,292 ops/s
ListTransformerBenchmark.testNew 5000 thrpt 5 9268,967 ± 257,888 ops/s
ListTransformerBenchmark.testNew 50000 thrpt 5 1356,710 ± 81,285 ops/s
ListTransformerBenchmark.testStandard 1000 thrpt 5 32834,898 ± 870,725 ops/s
ListTransformerBenchmark.testStandard 2000 thrpt 5 15759,069 ± 884,292 ops/s
ListTransformerBenchmark.testStandard 5000 thrpt 5 6198,490 ± 253,743 ops/s
ListTransformerBenchmark.testStandard 50000 thrpt 5 561,933 ± 27,503 ops/s
The testNew, as you may guess, is the method that use the Fork/Join transformation, while the testStandard method use a simple loop.
The results show that for a small list, the traditional approach is faster, probably because the overhead necessary to create the ForkJoin pool is heavier than the transformation process itself. When the list reaches a size of some thousand of objects, the multithreaded approach shines. We can see that for a list of 5000 objects, the score reached for the Fork/Join transformation is 9228 operation per second, while the sequential process stops at 6199 operations per second. A nice gain of performance I think.
For bigger lists, the Fork/Join approach shows better performances. For a list of 50000 objects, the multithreaded transformer grants a huge gain of performance (the 240%).
If you're in need of extreme optimization on your codebase, I think that this is a valuable solution.
Cheers
Nessun commento:
Posta un commento