Posts‎ > ‎

Experimenting with Groovy @CompileStatic

posted Apr 6, 2013, 3:49 PM by Renato Athaydes   [ updated Aug 6, 2013, 11:01 PM ]
Groovy is one of my favourite languages, especially for scripting, unit-tests and writing complex algorithms. The reason is that, being a Java developer (most of the time) I enjoy using a language that is so similar to Java but without most of its boiler-plate code... and with an excellent set of features which makes most things much easier than in pure Java code. If you want to see a few examples, have a look at this Gist I have written, which shows how to do things like database access, GUI and simple algorithms in Groovy (and Haskell, I was comparing the 2 languages just for fun).

When I started a little project of my own to implement a few common machine learning algorithms (starting with K-means clustering) I decided to use Groovy. But after writing this first algorithm and running a few performance tests, I noticed that performance was not nearly as good as I had hoped for. I made some small modifications in the code but that didn't help much. So I decided to experiment with the @CompileStatic annotation, which, according to Guillaume Laforge in this post, ensures that " your code will be statically compiled, and the bytecode generated here will look like javac’s bytecode, running just as fast." 

I thought that would be perfect! I could write Groovy code which runs as fast as Java!! Definitely worth a try.

So, I looked up @CompileStatic in the Groovy documentation.. unfortunately, it seems the Groovy team considers it as only an experimental feature for now. The only page I could find that talks about it is actually a discussion on the different alternatives for implementing statically-compiled code in Groovy and whether or not this is a good idea! It barely contains any useful information on how to use it efficiently and what the limitations are.

In fact, here's the whole documentation available on @CompileStatic:

Static compilation is for the moment only supported at the method level (do not try to add it to a class) and supports direct method calls as long as you do not:

  • use methods from DefaultGroovyMethods
  • use "indirect" methods like list << obj where the AST representation is not a method call but a binary expression

Ok... so let's try to understand that... DefaultGroovyMethods is actually a GDK class which, according to the javadoc comments, contains"new groovy methods which appear on normal JDK classes inside the Groovy environment".

Alright... so we probably can't use things like [...].each{...} or 'string'.center( 20 ) from inside a method annotated with @CompileStatic. Fairly limiting, but what about the "indirect" methods thing? Well, it's tricky to understand the implications of this one, so let's leave it for now and see how this works out in practice.

So, getting back to my machine learning project, here's what the performance was like when the code was written in pure Groovy:

Cluster size  Avg time (ms)
 500 523
 1000 735
 1500 1730
 2000 1747 

To come up with these figures, I wrote a test where I generated a number of samples and then asked my k-means algorithm to classify all of them into 3 different clusters.

Here's the code to generate the samples:

def getSamples = { order ->
  def sampleValues = ( ( 1..( 1 * order ) ) +
    ( ( 2 * order + 1 )..( 3 * order ) ) +
    ( ( 4 * order + 1 )..( 5 * order ) ) ) as List
  Collections.shuffle( sampleValues )
  sampleValues.collect { new SimpleSample( it ) }

As you can see, this generates data which should be naturally classified in 3 clusters (eg. (1..10) + (21..30) + (41..50) with order = 10).
In the performance table shown above, "Cluster size" refers to the "order" parameter in the code. The test generates 3 clusters with 500 to 2000 samples each. The samples are randomized before being passed to the k-means algorithm. Each test was run 5 times, and the time it took to classify all samples was recorded. The table shows the average time over all 5 runs.

Adding the @CompileStatic annotation to my code to get real performance improvements proved to be quite difficult. For example, in my first try, I wanted to change as little as possible, so I decided to annotate the following method, which runs very frequently:

private static distance( Cluster cluster, BigDecimal value ) {
  cluster.mean ? Math.abs( cluster.mean - value ) : 0.0

Luckily, I had already written it using static types so I thought all I had to do was to add the annotation to it. But to my surprise, I got this error:

Groovyc: [Static type checking] - Reference to method is ambiguous. Cannot choose between [long java.lang.Math#abs(long), double java.lang.Math#abs(double), int java.lang.Math#abs(int), float java.lang.Math#abs(float)]

After I fixed the ambiguity, it ran ok:

private static distance( Cluster cluster, BigDecimal value ) {
  cluster.mean ? Math.abs( ( cluster.mean - value ).doubleValue() ) : 0.0

As the performance improvement after this change was not very convincing (probably a few ms faster, but due to high variability it's hard to tell), I decided to improve on another "inner-loop" method:

private Cluster nearestCluster( BigDecimal value ) {
  def spotOnCluster = clusters.find { it.mean == value }
  if ( spotOnCluster ) return spotOnCluster

  def nullMeanCluster = clusters.find { it.mean == null }
  if ( nullMeanCluster ) return nullMeanCluster
  return clusters.min { c1, c2 ->
      def res = distance( c1, value ) - distance( c2, value )
      res == 0 ? 0 : res > 0 ? 1 : -1

Notice that this method uses the distance method that was optimised previously. After adding the @CompileStatic annotation to it, all I had to do was to declare the types of the closures parameters, resulting in the following:

private Cluster nearestCluster( BigDecimal value ) {
  def spotOnCluster = clusters.find { Cluster c -> c.mean == value }
  if ( spotOnCluster ) return spotOnCluster

  def nullMeanCluster = clusters.find { Cluster c -> c.mean == null }
  if ( nullMeanCluster ) return nullMeanCluster
  return clusters.min { Cluster c1, Cluster c2 ->
      def res = distance( c1, value ) - distance( c2, value )
      res == 0 ? 0 : res > 0 ? 1 : -1

After that, the compiler complained, correctly, about the distance method's return type (which was omitted previously)... but it seems to have gotten confused with the return type of the expression still... I had to alter the method to use a if/else block instead of ?/:, as such:

private static double distance( Cluster cluster, BigDecimal value ) {
  if ( cluster.mean ) Math.abs( ( cluster.mean - value ).doubleValue() ) 
  else 0.0

This looks like a bug in the Groovy compiler, because the two expressions are semantically equivalent.

But going back to performance measurements, I noticed almost no improvement... I had to look at changing further methods. And then, again... and again!

It turned out that to get any significant improvements in performance, I had to make almost everything compile statically. For a full list of the changes I made, please checkout my commit on Github.

But at the end, at least, the improvement was huge:

Cluster size  Avg time (ms) - Before Avg time (ms) - After optimisations
 500 523 55
 1000 735 74
 1500 1730 682*
 2000 1747  276
* the variance between runs was very high, probably due to garbage collection. Although the average was 682ms, the fastest run was only 112ms, while the slowest was 2,059ms!

An improvement of around 1,000%. Not bad! But at a large cost, which makes you wonder whether writing pure Java wouldn't have been a better idea... at least for cases like this, where performance is essential. At least now we have the choice to still use some of the benefits of Groovy even for performance-critical code. Perhaps, in cases where just a very small part of the code needs to run extremely fast, this wouldn't have been as much of a problem. As usual, every case is different...

Groovy version used: 2.1.1
Java VM: 1.7.0_13
Machine: Windows 7 - 64-bit / Intel i7-2630QM CPU