The HotPy(2) project aims to bring the optimisations developed and assessed in the original HotPy project to CPython.
It is expected that the resulting HotPy(2) interpreter will be about three times faster than the current CPython interpreter for pure Python code.
With the addition of a JIT compiler speeds ups of up to a factor of 10 should be possible for pure Python code.

Much real-world code spends a substantial part of its time in library code written in C, so smaller speed ups are to be expected, but HotPy(2) should still provide useful performance improvements.

The Technical Overview provides information on how it works.

Get the source code from https://bitbucket.org/markshannon/hotpy_2

Talk from EuroPython 2012 giving an overview of how it works

A talk from EuroPython 2011 explaining some of the ideas that make HotPy(2) unique

Current Status

The Trace Recorder is now complete.
Traces typically contain about 20 times as many bytecode instructions as the original program.
When executing traces the interpreter can execute the "thin" bytecode instructions about 4 times as fast as the original bytecodes.
Consequently, without any optimisation executing traces is slow; typically a fifth of the speed of CPython.

The Trace Manager works, including the implementation "decay" of traces in order to limit memory use.
Traces use too much memory, but there are several ways in which this can be improved.
Trace coverage, that is the dynamic portion of the program covered by traces, is between 96% and 99% for a small set of test programs.

The Type Specialiser works and passes almost all tests. The few failures are due to small differences in reference count totals.
After specialisation traces are slightly smaller and about 20% faster.

The Deferred Object Creation pass is now complete, but still fails several tests.
After DOC traces are much smaller, but the number of instructions executed exceeds that of CPython, but there are many VM register loads and stores which will be removed by the Register Allocator.

The Register Allocator is complete, and produces good quality code, but could be improved.

The Register-based interpreter is complete.

When using the full optimisation chain and the register interpreter, performance is comparable to or better than CPython.