ICOOOLPS 2009‎ > ‎


We have confirmed that this year's ICOOOLPS will feature guest speakers:

  • Dr. Cliff Click, Chief JVM Architect, Distinguished Engineer, Azul Systems
  • Dr. Andreas Gal, Trace Monkey engineer at Mozilla

The program is as follows:


Towards an Actor-based Concurrent Machine Model
Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, Robert Hirschfeld
Universiteit Antwerpen, Vrije Universiteit Brussel, Hasso Plattner Institute

In this position paper we propose to extend an existing delegation-based machine model with concurrency primitives. The original machine model which is built on the concepts of objects, messages, and delegation, provides support for languages enabling multi-dimensional separation of concerns (MDSOC). We propose to extend this model with an actor-based concurrency model, allowing for both true parallelism as well as lightweight concurrency primitives such as coroutines. In order to demonstrate its expressiveness, we informally describe how three high-level languages supporting different concurrency models can be mapped onto our extended machine model. We also provide an outlook on the extended model’s potential to support concurrency-related MDSOC features.


An Efficient Lock-Aware Transactional Memory Implementation
Justin E. Gottschlich, Jeremy G. Siek, Manish Vachharajani, Dwight Y. Winkler and Daniel A. Connors
University of Colorado at Boulder, Colorado State University, Nodeka LLC

Transactional memory (TM) is an emerging concurrency control mechanism that provides a simple and composable programming model. Unfortunately, transactions violate the semantics of mutual exclusion locks when they execute concurrently. Due to the prevalence of locks, transactions must be made lock-aware enabling them to correctly interoperate with locks.

We present a lock-aware transactional memory (LATM) system that employs a unique communication method using local knowledge of locks coupled with granularity-based policies. Our system allows higher concurrent throughput than prior systems because it only prevents truly conflicting critical sections from executing concurrently. Furthermore, our system relaxes the prior requirement of transaction isolation when executing conflicting transactional critical sections and instead runs these transactions as irrevocable, improving transaction concurrency. We demonstrate our performance improvements mathematically and empirically.

Our system also advances LATM research in terms of program consistency. This is achieved by detecting potential deadlocks at run-time and aborting the programs that contain them. Prior systems break deadlocks, which reveal partially executed critical sections to other threads, thereby violating mutual exclusion. Because our system disallows deadlocks, it does not suffer from mutual exclusion violations, improving program consistency.


Blurring the line between Compiler and Runtime
Eric Jul
DIKU, Dept. of Computer Science, University of Copenhagen

A traditional implementation of a language often strives to have a nice clean interface between the compiler and the runtime system where a short, concise, and clean interface description is considered good. However, for efficiency it can be of great advantage to purposefully blur the line between compiler and runtime by letting the compiler and compiler cooperate across an otherwise well-defined interface boundary. This paper presents a few examples and is intended to generate a discussion of these and other examples.



  • Concurrency models
  • The distinction between compiler and runtime

Trace Monkey
Andreas Gal


Tracing the Meta-Level: PyPy's Tracing JIT Compiler
Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, Armin Rigo
University of Düsseldorf, University of Genova, merlinux GmbH

We attempt to apply the technique of Tracing JIT Compilers in the context of the PyPy project, i.e., to programs that are interpreters for some dynamic languages, including Python. Tracing JIT compilers can greatly speed up programs that spend most of their time in loops in which they take similar code paths. However, applying an unmodified tracing JIT to a program that is itself a bytecode interpreter results in very limited or no speedup. In this paper we show how to guide tracing JIT compilers to greatly improve the speed of bytecode interpreters. One crucial point is to unroll the bytecode dispatch loop, based on two hints provided by the implementer of the bytecode interpreter. We evaluate our technique by applying it to two PyPy interpreters: one is a small example, and the other one is the full Python interpreter.


Faster than C#: efficient implementation of dynamic languages on .NET
Antonio Cuni, Davide Ancona, Armin Rigo
DISI, University of Genova

The Common Language Infrastructure (CLI) is a virtual machine expressly designed for implementing statically typed languages as C#, therefore programs written in dynamically typed languages are typically much slower than C# when executed on .NET.

Recent developments show that Just In Time (JIT) compilers can exploit runtime type information to generate quite efficient code. Unfortunately, writing a JIT compiler is far from being simple.

In this paper we report our positive experience with automatic generation of JIT compilers as supported by the PyPy infrastructure, by focusing on JIT compilation for .NET. Following this approach, we have in fact added a second layer of JIT compilation, by allowing dynamic generation of more efficient .NET bytecode, which in turn can be compiled to machine code by the .NET JIT compiler.

The main and novel contribution of this paper is to show that this two-layers JIT technique is effective, since programs written in dynamic languages can run on .NET as fast as (and in some cases even faster than) the equivalent C# programs.

The practicality of the approach is demonstrated by showing some promising experiments done with benchmarks written in a simple dynamic language.



  • Trace compilation vs. traditional

Alternate languages on the Java Virtual Machine (JVM)
Cliff Click
Azul Systems

There are several languages that target bytecodes and the JVM™ machine as their new "assembler," including Scala, Clojure, Jython, JRuby, the JavaScript™ programming language/Rhino, and JPC. This presentations takes a quick look at how well these languages sit on a JVM machine, what their performance is, where it goes, and why.

Some of the results are surprising: Clojure's STM ran a complex concurrent problem with 600 parallel worker threads with perfect scaling on an Azul box without modification. Some of the results are less surprising: fixnum/bignum math ops take a substantial toll on the benefit of entirely transparent integer math, and a lack of tail-call optimization gives some languages fits. Some of the languages can get "to the metal," and sometimes performance takes a backseat to other concerns. This session, for non-Java™ platform JVM machine users, is a JVM machine's-eye-view of bytecodes, JITs, and code-gen and will give you insight into why a language is (or is not!) as fast as you might expect.


Compiling Structural Types on the JVM: A Comparison of Reflective and Generative Techniques from Scala's Perspective
Gilles Dubochet, Martin Odersky
École Polytechnique Fédérale de Lausanne

This article describes Scala’s compilation technique of structural types for the JVM. The technique uses Java reflection and polymorphic inline caches. Performance measure-
ments of this technique are presented and analysed. Further measurements compare Scala’s reflective technique with the “generative” technique used by Whiteoak to compile structural types. The article ends with a comparison of reflective and generative techniques for compiling structural types. It concludes that generative techniques may, in specific cases, exhibit higher performances than reflective approaches, but that reflective techniques are easier to implement and have fewer restrictions.


Compiling Generics Through User-Directed Type Specialization
Iulian Dragos, Martin Odersky
École Polytechnique Fédérale de Lausanne

Compilation of polymorphic code through type erasure gives compact code but performance on primitive types is significantly hurt. Full specialization gives good performance, but at the cost of increased code size and compilation time. Instead we propose a mixed approach, which allows the programmer decide what code to specialize. Our approach supports separate compilation, allows mixing of specialized and generic code, and gives very good results in practice.



  • Multi-language virtual machines

Thread and Execution-Context Specific Barriers via Dynamic Method Versioning
Simon Wilkinson, Ian Watson
The University of Manchester

The insertion of read and write barriers into managed code is a typical runtime compilation task of a Virtual Machine. As part of our current work in applying Thread-Level Speculation (TLS) to Java, we insert a high density of barriers that are conditionally executed based on the identity of the running thread and current execution context. Rather than perform runtime tests, it is more profitable for our TLS system to maintain thread and execution-context specific versions of methods that are compiled with unconditional barriers, and then rely on modified dispatch semantics to ensure conditional execution.

In this paper, we extract the method versioning system from our TLS implementation and present it in a general form, which we call Dynamic Method Versioning (DMV). DMV allows thread and execution-context specific versions of Java methods to be dynamically generated and compiled, with inter-version dispatch managed by a runtime policy. We describe our technique via its implementation within the Jikes Research Virtual Machine, and present initial measurements of its runtime overheads.


Using Program Metadata to Support SDT in Object-Oriented Applications
Daniel Williams, Jason D. Hiser, Jack W. Davidson
University of Virginia

Software dynamic translation (SDT) is a powerful technology that enables software malleability and adaptivity at the instruction level by providing facilities for run-time monitoring and code modification. SDT has been used as the basis for many valuable tools, including dynamic optimizers, profilers, security policy enforcement, and binary translation to name a few. However, modern object-oriented programming techniques and their implementations (e.g., virtual functions, exceptions, dynamic code, etc.) pose unique challenges to high performing SDT systems. In this paper, we present Metaman, a generalized program metadata manager that stores and manages program information so that it can be efficiently accessed by emerging SDT systems to improve overall runtime performance of a managed executable. Using the information collected by Metaman, the run-time performance of an existing SDT system was improved by 22% making its execution speed only 3% slower than native (i.e., non-managed) execution.


Automatic Vectorization Using Dynamic Compilation and Tree Pattern Matching Technique in Jikes RVM
Sara El-Shobaky, Ahmed El-Mahdy, Ahmed El-Nahas
Alexandria University

In the past decade processors were improved by using vector instructions called SIMD instructions. Those vector instructions have dramatically enhanced the performance of many multimedia applications. This paper studies Leupers’ code selection technique capable of generating SIMD instructions automatically in the context of dynamic compilation. It develops a portable implementation using loop unrolling and tree pattern matching techniques, applied on the optimized compiler of Jikes RVM in the phase of converting Lower Intermediate Representation code into Machine-specific Intermediate Representation using BURS system. This implementation adds new BURS rules capable of generating SIMD instructions that perform manipulation of subword data. Applying the suggested implementation in Jikes RVM with IA-32 architecture results in an overall speedup at runtime despite the runtime overhead of the compilation phase.


Just-In-Time compilation on ARM processors
Michele Tartara, Simone Campanoni, Giovanni Agosta, Stefano Crespi Reghizzi
Politecnico di Milano

This paper presents a Just-In-Time compilation system for ARM processors. The complete architecture is described, starting from static compilation of the sources into CIL
(Common Intermediate Language) bytecode. The intermediate languages that are used are explained, together with the instuction selection and code generation techiniques. Finally, some experimental results are presented, comparing them with those of our best open source competitor: Mono.



  • The distinction between compiler and runtime
  • Instruction selection in compilers
18:00ICOOOLPS program committee meeting