Writing JVM bytecode by hand using Groovy

Post date: Jan 30, 2017 10:18:22 PM

JVM bytecode looks similar to Assembly code. It maps to it closely enough that translating it into machine code shouldn't be too much of a challenge.

Writing it can be quite challenging, but it can be quite useful to understand the inner workings of the JVM.

But before we see how to use the Groovy compiler to let us write bytecode by hand (and create pure JVM bytecode that does not require Groovy at runtime!), let's have a quick crash-course on JVM bytecode.

A short introduction to JVM bytecode

Instructions operate on a frame consisting of an operand stack and an array of local variables (amongst other things).

For example, to load the value of an object reference stored in the local variable at index 1 onto the stack:

aload_1

Instance methods always get a reference to the this Object in the local variable at index 0. Method arguments are passed in as the next elements in the local variables array.

So, to invoke the Object#toString() method, for example, within an instance method, one can do:

aload_0

invokevirtual java/lang/Object.toString()Ljava/lang/String;

This will leave the String returned by toString() onto the stack, where it can be used by another JVM instruction.

The above code example shows how JVM methods and types are represented in bytecode. Methods follow the pattern:

<package>/<className>.<method>([argTypes])<returnType>.

Reference types are represented by:

L<package>/<typeName>;

Primitive types, by a single letter:

With this knowledge, we can understand much easier the output of javap, the Java class file disassembler.

As an example, take the following, simple Java class:

public class Hello {

public static void main(String[] args) {

System.out.println("Hello world");

}

If you compile and run javap on the resulting class file:

javac Hello.java

javap -c Hello

The following output is shown:

public class Hello {

public Hello();

Code:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: return

public static void main(java.lang.String[]);

Code:

0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;

3: ldc #3 // String Hello world

5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

8: return

}

As you can see, it's pretty simple. A default constructor is added automatically to all Java classes.

Lets' go through each instruction in the main method:

getstatic java/lang/System.out:Ljava/io/PrintStream;

This loads the static field System.out onto the stack.

ldc #3

Loads the "Hello world" String onto the stack. #3 refers to the position of the String in the constant pool.

invokevirtual java/io/PrintStream.println:(Ljava/lang/String;)V

Invokes the println method using the variables in the stack (System.out, "Hello world").

return

The return instruction is used to return the flow control to the caller without returning any value.

To find out what an instruction expects to be in the stack when it is called, and what it leaves behind, you must consult the JVM documentation.

The invokevirtual instruction, for example, uses the stack as follows:

Operand Stack

..., objectref, [arg1, [arg2 ...]] →

...

This means it expects an object reference (on which to call the provided method), plus the method arguments to be on the top of the stack... and after it returns, it leaves whatever was on the stack below the object reference.

Writing JVM bytecode with Groovy

The ability to use local variables makes it much easier to write bytecode than if we had only the stack at our disposal (see Forth for an actual language where that's exactly what you must do!).

But that begs the question:

How much difference does it make to use local variables as opposed to the stack?

To find out, the only way is to try it!

To write bytecode by hand is not very easy, though... There are some tools such as ByteBuddy and the lower-level ASM (on which ByteBuddy is based) that let you generate JVM class files, but even they are too high-level for what I wanted... they don't really let you write bytecode exactly as javap outputs it. For that, the only tool I could find was Jasmin, which has been abandoned, unfortunately, since around 2010... so it is limited to creating Java 6 bytecode.

The good news is that JVM bytecode evolves even slower than Java, the language! So this limitation is not nearly as bad as it sounds.

To make it easier to use Jasmin, I created Grasmin, a Groovy AST (Abstract Syntax Tree) Transformation which reads the first String in the body of a method as JVM bytecode (in the Jasmin syntax, which closely mirrors javap's output).

The advantage of using Grasmin over Jasmin is that the class definition itself can be written in Groovy. This means you can use the class in your other code as if it were just a normal class, even though the implementation will be written in bytecode directly!

This is what a hello-world Grasmin class looks like:

@CompileStatic

@JasminCode

class Test1 {

void exampleJasminCode() {

"""

.limit stack 2

getstatic java/lang/System/out Ljava/io/PrintStream;

ldc "Hello World!"

invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

return

"""

}

That's it! Notice that users of this class don't need to know anything about how it is implemented. It could return values and take arguments. But the implementation of this class contains the exact bytecode typed above.

To prove it, we can attempt a more interesting example that we can use to answer the question posed in the beginning of this section: How much difference does it make to use local variables as opposed to the stack?

Here's a simple Java function that converts an array of integers to an array of Strings:

public class JavaArrays {

public String[] stringify( int[] ints ) {

String[] result = new String[ ints.length ];

for (int i = 0; i < ints.length; i++) {

result[ i ] = Integer.toString( ints[ i ] );

}

return result;

}

The implementation is as simple as it gets, in good old-fashioned Java style.

According to javap, the bytecode of this Java class looks like this:

public class grasmin.test_target.JavaArrays {

public grasmin.test_target.JavaArrays();

Code:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: return

public java.lang.String[] stringify(int[]);

Code:

0: aload_1

1: arraylength

2: anewarray #2 // class java/lang/String

5: astore_2

6: iconst_0

7: istore_3

8: iload_3

9: aload_1

10: arraylength

11: if_icmpge 29

14: aload_2

15: iload_3

16: aload_1

17: iload_3

18: iaload

19: invokestatic #3 // Method java/lang/Integer.toString:(I)Ljava/lang/String;

22: aastore

23: iinc 3, 1

26: goto 8

29: aload_2

30: areturn

}

This code uses if_icmpge as the loop condition. It compares the 2 ints at the top of stack, ie. value1 ≥ value2.

My first attempt at implementing this in bytecode was similar to the above, but instead of using id_icmpge, I decided to use ifeq, which compares the int on the top of the stack to zero, because that seemed to be potentially more efficient. To do this, of course, I would have to start the loop from the last index rather than the first.

This is what I ended up with (with lots of comments, as that's the only way I can understand what I am writing!):

@CompileStatic

@JasminCode

class ArraysCode {

String[] stringify( int[] ints ) {

"""\

.limit locals 4

.limit stack 3

aload_1

arraylength ; put the length of the array on the stack

istore_2 ; store the array length on local variable 2

iload_2 ; read the array length

anewarray java/lang/String ; create a String array with the same length as the input array

astore_3 ; store the result in the local variable 3

Loop:

iload_2

ifeq End ; check if the remaining length is 0, go to End if so

iinc 2 -1 ; decrement index

aload_1 ; load the input array

iload_2 ; load current index

iaload ; read current value

invokestatic java/lang/Integer.toString(I)Ljava/lang/String;

aload_3 ; load the result array

swap ; swap so the String value is on top of the stack

iload_2 ; load current index

swap ; swap so the stack has arrayRef, index, value

aastore ; put the result String into the result array

goto Loop ; loop

End:

aload_3

areturn

"""

}

If you're wondering about why I used goto... well, because the only way the computer actually knows how to loop, is by using jump instructions such as goto, even if in higher-level languages we can hide those.

Unfortunately, this hand-written bytecode implementation had the exact same performance as the Java code.

I wondered if using one less local variable to store the current index (by keeping it always on the stack instead) would make this code run faster.

So, here is my second attempt (which took hours to write! Bytecode is not easy to write, especially without using local variables too much):

@CompileStatic

@JasminCode( outputDebugFile = 'build/arrays-code.j' )

class ArraysCode {

String[] stringify( int[] ints ) {

"""\

.limit locals 3

.limit stack 5

aload_1

arraylength ; put the length of the array on the stack

dup

dup ; Stack: length, length, length |TOP

anewarray java/lang/String ; create a String array with the same length as the input array

astore_2 ; store the result in the local variable 2

; Stack: length, length |TOP

Loop:

ifeq End ; check if the remaining length is 0, go to End if so

iconst_1 ; Stack: length, 1 |TOP

isub ; decrement index

dup ; Stack: index, index |TOP

aload_2 ; load the result array

swap ; Stack: index, resultArray, index |TOP

dup ; Stack: index, resultArray, index, index |TOP

aload_1 ; load the input array

swap ; Stack: index, resultArray, index, inputArray, index |TOP

iaload ; read current value

; Stack: index, resultArray, index, intValue |TOP

invokestatic java/lang/Integer.toString(I)Ljava/lang/String;

; Stack: index, resultArray, index, stringValue |TOP

aastore ; put the result String into the result array

dup ; Stack: index, index |TOP

goto Loop ; loop

End:

aload_2

areturn

"""

}

This implementation is a little bit shorter, and as you may notice, uses one less local variable than the previous ones. I expected this to run faster.

To my surprise, it actually runs consistently slower. Around 50% slower, to be more accurate. I have no idea why, and would love to hear if anyone has any idea!

Is some bytecode instruction I used too slow? Maybe decreasing the index with isub instead of iinc is the problem?

No idea... but this seems to show that, answering the initial question:

Trying to save local variables by using the stack does NOT seem to be a good approach!

Well, in any case, it does not really matter... the important is that writing bytecode by hand is so much fun, if you ask me... in a weird, painful yet rewarding kind-of-way.

By the way, here's the output of javap for the above implementation of ArraysCode:

public class grasmin.test_target.ArraysCode {

public java.lang.String[] stringify(int[]);

Code:

0: aload_1

1: arraylength

2: dup

3: dup

4: anewarray #7 // class java/lang/String

7: astore_2

8: ifeq 28

11: iconst_1

12: isub

13: dup

14: aload_2

15: swap

16: dup

17: aload_1

18: swap

19: iaload

20: invokestatic #15 // Method java/lang/Integer.toString:(I)Ljava/lang/String;

23: aastore

24: dup

25: goto 8

28: aload_2

29: areturn

}

As you can see, it's identical to the source.

You can compile and test yourself. Here's the first implementation, and here, this last one!

Thanks for reading.