Posts‎ > ‎

Writing JVM bytecode by hand using Groovy

posted Jan 30, 2017, 2:18 PM by Renato Athaydes   [ updated Jan 30, 2017, 2:24 PM ]
JVM bytecode looks similar to Assembly code. It maps to it closely enough that translating it into machine code shouldn't be too much of a challenge.

Writing it can be quite challenging, but it can be quite useful to understand the inner workings of the JVM.

But before we see how to use the Groovy compiler to let us write bytecode by hand (and create pure JVM bytecode that does not require Groovy at runtime!), let's have a quick crash-course on JVM bytecode.

A short introduction to JVM bytecode


Instructions operate on a frame consisting of an operand stack and an array of local variables (amongst other things).

For example, to load the value of an object reference stored in the local variable at index 1 onto the stack:

aload_1

Instance methods always get a reference to the this Object in the local variable at index 0. Method arguments are passed in as the next elements in the local variables array.

So, to invoke the Object#toString() method, for example, within an instance method, one can do:

aload_0
invokevirtual java/lang/Object.toString()Ljava/lang/String;

This will leave the String returned by toString() onto the stack, where it can be used by another JVM instruction.

The above code example shows how JVM methods and types are represented in bytecode. Methods follow the pattern:

<package>/<className>.<method>([argTypes])<returnType>.

Reference types are represented by:

L<package>/<typeName>;

Primitive types, by a single letter:


 Type  JVM representation
 boolean  Z
 byte  B
 char  C
 double  D
 float  F
 int  I
 long  J
 short  S
 void  V


With this knowledge, we can understand much easier the output of javap, the Java class file disassembler.

As an example, take the following, simple Java class:


public class Hello {

public static void main(String[] args) {
System.out.println("Hello world");
}
}


If you compile and run javap on the resulting class file:

javac Hello.java
javap -c Hello


The following output is shown:

public class Hello {
public Hello();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);

Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello world
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}


As you can see, it's pretty simple. A default constructor is added automatically to all Java classes.

Lets' go through each instruction in the main method:

getstatic java/lang/System.out:Ljava/io/PrintStream;

This loads the static field System.out onto the stack.

ldc #3

Loads the "Hello world" String onto the stack. #3 refers to the position of the String in the constant pool

invokevirtual java/io/PrintStream.println:(Ljava/lang/String;)V

Invokes the println method using the variables in the stack (System.out, "Hello world").

return

The return instruction is used to return the flow control to the caller without returning any value.


To find out what an instruction expects to be in the stack when it is called, and what it leaves behind, you must consult the JVM documentation.

The invokevirtual instruction, for example, uses the stack as follows:


Operand Stack

..., objectref, [arg1, [arg2 ...]] →

...


This means it expects an object reference (on which to call the provided method), plus the method arguments to be on the top of the stack... and after it returns, it leaves whatever was on the stack below the object reference.

Writing JVM bytecode with Groovy


The ability to use local variables makes it much easier to write bytecode than if we had only the stack at our disposal (see Forth for an actual language where that's exactly what you must do!).

But that begs the question:

How much difference does it make to use local variables as opposed to the stack?

To find out, the only way is to try it!

To write bytecode by hand is not very easy, though... There are some tools such as ByteBuddy and the lower-level ASM (on which ByteBuddy is based) that let you generate JVM class files, but even they are too high-level for what I wanted... they don't really let you write bytecode exactly as javap outputs it. For that, the only tool I could find was Jasmin, which has been abandoned, unfortunately, since around 2010... so it is limited to creating Java 6 bytecode.

The good news is that JVM bytecode evolves even slower than Java, the language! So this limitation is not nearly as bad as it sounds.

To make it easier to use Jasmin, I created Grasmin, a Groovy AST (Abstract Syntax Tree) Transformation which reads the first String in the body of a method as JVM bytecode (in the Jasmin syntax, which closely mirrors javap's output).

The advantage of using Grasmin over Jasmin is that the class definition itself can be written in Groovy. This means you can use the class in your other code as if it were just a normal class, even though the implementation will be written in bytecode directly!

This is what a hello-world Grasmin class looks like:


@CompileStatic
@JasminCode
class Test1 {

void exampleJasminCode() {
"""
.limit stack 2
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hello World!"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
return
"""
}

}

That's it! Notice that users of this class don't need to know anything about how it is implemented. It could return values and take arguments. But the implementation of this class contains the exact bytecode typed above.

To prove it, we can attempt a more interesting example that we can use to answer the question posed in the beginning of this section: How much difference does it make to use local variables as opposed to the stack?

Here's a simple Java function that converts an array of integers to an array of Strings:


public class JavaArrays {

public String[] stringify( int[] ints ) {
String[] result = new String[ ints.length ];
for (int i = 0; i < ints.length; i++) {
result[ i ] = Integer.toString( ints[ i ] );
}
return result;
}
}


The implementation is as simple as it gets, in good old-fashioned Java style.

According to javap, the bytecode of this Java class looks like this:



public class grasmin.test_target.JavaArrays {
  public grasmin.test_target.JavaArrays();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public java.lang.String[] stringify(int[]);
    Code:
       0: aload_1
       1: arraylength
       2: anewarray     #2                  // class java/lang/String
       5: astore_2
       6: iconst_0
       7: istore_3
       8: iload_3
       9: aload_1
      10: arraylength
      11: if_icmpge     29
      14: aload_2
      15: iload_3
      16: aload_1
      17: iload_3
      18: iaload
      19: invokestatic  #3                  // Method java/lang/Integer.toString:(I)Ljava/lang/String;
      22: aastore
      23: iinc          3, 1
      26: goto          8
      29: aload_2
      30: areturn
}


This code uses if_icmpge as the loop condition. It compares the 2 ints at the top of stack, ie. value1 ≥ value2.

My first attempt at implementing this in bytecode was similar to the above, but instead of using id_icmpge, I decided to use ifeq, which compares the int on the top of the stack to zero, because that seemed to be potentially more efficient. To do this, of course, I would have to start the loop from the last index rather than the first.

This is what I ended up with (with lots of comments, as that's the only way I can understand what I am writing!):


@CompileStatic
@JasminCode
class ArraysCode {

String[] stringify( int[] ints ) {
"""\
.limit locals 4
.limit stack 3
aload_1
arraylength ; put the length of the array on the stack
istore_2 ; store the array length on local variable 2
iload_2 ; read the array length
anewarray java/lang/String ; create a String array with the same length as the input array
astore_3 ; store the result in the local variable 3
Loop:
iload_2
ifeq End ; check if the remaining length is 0, go to End if so
iinc 2 -1 ; decrement index
aload_1 ; load the input array
iload_2 ; load current index
iaload ; read current value
invokestatic java/lang/Integer.toString(I)Ljava/lang/String;
aload_3 ; load the result array
swap ; swap so the String value is on top of the stack
iload_2 ; load current index
swap ; swap so the stack has arrayRef, index, value
aastore ; put the result String into the result array
goto Loop ; loop
End:
aload_3
areturn
"""
}

}

If you're wondering about why I used goto... well, because the only way the computer actually knows how to loop, is by using jump instructions such as goto, even if in higher-level languages we can hide those.


Unfortunately, this hand-written bytecode implementation had the exact same performance as the Java code.

I wondered if using one less local variable to store the current index (by keeping it always on the stack instead) would make this code run faster.

So, here is my second attempt (which took hours to write! Bytecode is not easy to write, especially without using local variables too much):



@CompileStatic
@JasminCode( outputDebugFile = 'build/arrays-code.j' )
class ArraysCode {

String[] stringify( int[] ints ) {
"""\
.limit locals 3
.limit stack 5
aload_1
arraylength ; put the length of the array on the stack
dup
dup ; Stack: length, length, length |TOP
anewarray java/lang/String ; create a String array with the same length as the input array
astore_2 ; store the result in the local variable 2
; Stack: length, length |TOP
Loop:
ifeq End ; check if the remaining length is 0, go to End if so
iconst_1 ; Stack: length, 1 |TOP
isub ; decrement index
dup ; Stack: index, index |TOP
aload_2 ; load the result array
swap ; Stack: index, resultArray, index |TOP
dup ; Stack: index, resultArray, index, index |TOP
aload_1 ; load the input array
swap ; Stack: index, resultArray, index, inputArray, index |TOP
iaload ; read current value
; Stack: index, resultArray, index, intValue |TOP
invokestatic java/lang/Integer.toString(I)Ljava/lang/String;
; Stack: index, resultArray, index, stringValue |TOP
aastore ; put the result String into the result array
dup ; Stack: index, index |TOP
goto Loop ; loop
End:
aload_2
areturn
"""
}

}

This implementation is a little bit shorter, and as you may notice, uses one less local variable than the previous ones. I expected this to run faster.

To my surprise, it actually runs consistently slower. Around 50% slower, to be more accurate. I have no idea why, and would love to hear if anyone has any idea!

Is some bytecode instruction I used too slow? Maybe decreasing the index with isub instead of iinc is the problem?

No idea... but this seems to show that, answering the initial question:

Trying to save local variables by using the stack does NOT seem to be a good approach!

Well, in any case, it does not really matter... the important is that writing bytecode by hand is so much fun, if you ask me... in a weird, painful yet rewarding kind-of-way.

By the way, here's the output of javap for the above implementation of ArraysCode:


public class grasmin.test_target.ArraysCode {
  public java.lang.String[] stringify(int[]);
    Code:
       0: aload_1
       1: arraylength
       2: dup
       3: dup
       4: anewarray     #7                  // class java/lang/String
       7: astore_2
       8: ifeq          28
      11: iconst_1
      12: isub
      13: dup
      14: aload_2
      15: swap
      16: dup
      17: aload_1
      18: swap
      19: iaload
      20: invokestatic  #15                 // Method java/lang/Integer.toString:(I)Ljava/lang/String;
      23: aastore
      24: dup
      25: goto          8
      28: aload_2
      29: areturn
}


As you can see, it's identical to the source.

You can compile and test yourself. Here's the first implementation, and here, this last one!

Thanks for reading.

Comments