Execution in HotPy(2) is managed by a Supervisor which delegates actual execution to one of the interpreters (or compiled code if present) to do the actual execution of the program. The Supervisor's role is to coordinate the execution of the other components. Initially the Supervisor calls the Base Interpreter to execute the program. When the Base Interpreter reaches a hot-spot in the program, it returns execution to the Supervisor.
The first time a point in a program becomes hot, the Supervisor passes execution to the Trace-recording Interpreter which records a trace and passes the resulting trace to the optimisers. On second and subsequent executions of the hot-spot, the Supervisor delegates execution to the Trace Manager.
The Trace Manager looks up the previously recorded trace matching the hot-spot and passes it to the Fast Interpreter for execution. When it cannot find a trace for a requested point in the program it returns execution to the supervisor.
form traces, which are linear sequences of bytecodes. The traces are optimised and stored in a trace-cache. The next time a section of code that has been recorded is to be executed, the optimised trace is executed instead. When running, typically over 90% of the execution time is spent executing optimised traces and less than 10% running unoptimised code.
In general, a 'fat' bytecode is one that performs a slow, complex operation, such that the time spent executing the instruction is much larger than the time spent dispatching the instruction and handling the operands. A thin bytecode has the relation between time spent doing useful work and interpreter overhead reversed; the time spent executing the instruction is often less than the time spent dispatching the instruction.
For the purpose of HotPy(2) a 'fat' bytecode is one that can decomposed, in other words in can be implemented as a series of 'thin' bytecodes.
The trace recorder records any thin bytecodes that it encounters. However, when it encounters a fat bytecodes, rather than execute and record the fat bytecode it performs a call into a function written entirely in thin bytecodes that has the same effect.
The resulting trace is thus composed entirely of thin bytecodes suitable for optimisation.
Typically, this tracing of thin bytecodes (without further optimisation) increases the number of instructions executed by a factor of 3 to 6, and increases run-time by 50% to 100%.
Typically, Type Specialisation reduces the number of instructions by about 30% and the run-time by 20%, relative to the unoptimised trace. The specialised trace will still be slower than the original (fat) bytecode.
Typically, Deferred Object Creation reduces the number of instructions by a factor of 4 to 8 and the runtime by a similar factor,
with a fast stack-based interpreter. For one benchmark (fannkuch) D.O.C. resulted in a 10 fold reduction in instruction count and runtime (although only a 5 fold speedup from the base-line performance).
D.O.C. is so effective because Instruction Thinning and Type Specialisation have created long traces which expose the many redundancies in the Python execution model.
It is expected that the register-based code will execute about 40% faster than the stack-based code.
This estimate is based on published literature, rather than experimental data from the original HotPy.