Sent to you by jeffye via Google Reader:
The 7th version of the Java Developer's Kit (aka JDK 7) delivers a huge speed boost over JDK 6 array accesses. This is huge. It's like another year and a half of Moore's law for free. Only in software. And you don't even have to write multi-threaded code.
I've been profiling my new K-Means++ implementation for the next LingPipe release on some randomly generated data. It's basically a stress test for array gets, array sets, and simple multiply-add arithmetic. Many LingPipe modules are like this at run-time: named entity, part-of-speech tagging, language modeling, LM-based classifiers, and much more.
While I was waiting for a run using JDK 1.6 to finish, I installed the following beta release of JDK 7:
> java -version java version "1.7.0-ea" Java(TM) SE Runtime Environment (build 1.7.0-ea-b52) Java HotSpot(TM) 64-Bit Server VM (build 15.0-b03, mixed mode)
You can get it, too:
I believe much of the reason it's faster is the work of these fellows:
- Würthinger, Thomas, Christian Wimmer, and Hanspeter Mössenböck. 2007. Array Bounds Check Elimination for the Java HotSpot Client Compiler. PPPJ.
Java's always suffered relative to C in straight matrix multiplication because Java does range checks on every array access (set or get). With some clever static and run-time analysis, Würthinger et al. are able to eliminate most of the array bounds checks. They show on matrix benchmarks that this one improvement doubles the speed of the LU matrix factorization benchmark in the U.S. National Institute of Standards (NIST) benchmark suite SciMark 2, which like our clustering algorithm, is basically just a stress test for array access and arithmetic.
So far, my tests have only been on a Thinkpad Z61P notebook running Windows Vista (64 bit) with an Intel Core 2 CPU (T2700; 2.0GHz), and 4GB of reasonably zippy memory. I don't know if the speedups will be as great for other OSes or for 32-bit JDKs.
I'm pretty excited about the new fork-join concurrency, too, as it's just what we'll need to parallelize the inner loops without too much work for us or the operating system.