RAMP Gold comprises about 36,000 lines of SystemVerilog with minimal third-party IP blocks. Our first production system targets the Xilinx Virtex-5 and is deployed on a low-cost XUP board.
RAMP Gold employs many advanced FPGA optimizations and is designed from the ground up with reliability in mind. The following figure shows the structure of RAMP Gold. The timing and functional models are both host-multithreaded. The functional model maintains architected state and correctly executes the ISA. The timing model determines how much time an instruction takes to execute in the target machine and schedules threads to execute on the functional model accordingly. The interface between the functional and timing models is designed to be simple and extensible to facilitate rapid evaluation of alternative target memory hierarchies and microarchitectures.
SPARC International, and we can boot the Linux 2.6.21 kernel as well as ROS, a prototype manycore research operating system.
Most of the timing model parameters can be configured at runtime by writing control registers that reside on the I/O bus. Among these are the size, block size, and associativity of L1 and L2 caches, the number of L2 cache banks and their latencies, and DRAM bandwidth and latency.
To measure target performance, we implement 657 64-bit hardware performance counters. Among the events we count are target L1 and L2 cache hits, misses, and writebacks; instructions retired by type; and target clock cycles. We also have a number of host performance counters to measure the simulator’s performance itself. The counters are accessed via a ring interconnect to ease routing pressure.
The timing model largely consists of behavioral SystemVerilog and relies on the memory compiler for BRAM allocation. This coding style enables the rapid prototyping of different microarchitectural ideas. Leveraging many high-level language constructs in SystemVerilog, our manycore timing model comprises only 1,000 lines of code.