Performance Comparison

Johan Brichau has compared the performance of JavaConnect 2.0-beta and JNIPort (1.5 or earlier, he didn't name the version). He found that JNIPort was slow when looking up Java classes. This has changed with JNIPort 1.6. A smaller part of the effect was actually due to the fact that he used Strings for class names instead of Symbols, but most of the time was actually lost in a single non-local return statement from a critical block in JavaClassIndex>>findClass:

symbol := aClassName asSymbol. sharedMutex critical: [index at: symbol ifPresent: [:it | ^ it]].

Replacing this with

symbol := aClassName asSymbol. classStatic := sharedMutex critical: [index at: symbol ifAbsent: [nil]]. classStatic notNil ifTrue: [^classStatic].

led to a significant speedup. Together with some other optimizations, the speedup in JNIPort 1.6 leads to JNIPort now being faster than JavaConnect 2.0-beta.

Before looking at time, please keep in mind that the performance of your application will not be influenced very much by the speed of JNIPort. If it is, then think about the design of your code. Reducing the number of calls from Smalltalk to Java will have a much larger effect than any performance improvements in JNIPort or JavaConnect can have. The people at CodeMesh explain this very nicely.

I have reproduced Johan's tests on a MacBook Pro with a 2.4 GHz Intel Core 2 Duo, 4 GB 667 MHz DDR2 SDRAM, and Mac OS X 10.5.8. The VisualWorks version was 7.7. The Java runtime has the following versions:

  • java.vm.version: 1.5.0_20-141
  • java.runtime.version: 1.5.0_20-b02-315

To make sure that garbage collection doesn't interfere too much on the Smalltalk side, I made NewSpace larger by a factor of 10, and also doubled all other space sizes:

ObjectMemory sizesAtStartup: #(10.0 10.0 2.0 2.0 2.0 2.0 2.0)

I also increased growthRegimeUpperBound to 500 MB such that OldSpace garbage collection is not an issue. For each test, I started the image, started the Java VM, and ran three repetitions of the test code separated by two global garbage collections after each repetition. The absolute times are higher than in Johan's tests, which is probably due to the hardware used.

Here are the results: