Key Features
🧪 Parameter sweeps over ROB entries, pipeline width (2→8), and branch predictors (2-bit, Tournament, Bimode, LTAGE)
📈 Metrics: IPC and execution time on a benchmark (bubblesort)
🛠️ Tooling: gem5 for cycle-accurate simulation and controlled experiments
Research Contributions
Showed ROB > 64 yields negligible IPC gains (e.g., 64→192 entries ≈ flat IPC), suggesting 64 as a sweet spot.
Demonstrated pipeline width strongly boosts IPC; width 6 nearly matches width 8 at lower cost.
Quantified branch predictor impact: Tournament and LTAGE substantially outperform simple 2-bit.
Technical Achievements
Selected a balanced GoodCore = width 6, ROB 64, Tournament BP, based on IPC/utilization trends.
Reported comparative results (example set):
• LargeCore: IPC 2.30, 0.00585 s
• GoodCore: IPC 2.01, 0.00655 s
• SmallCore: IPC 0.79, 0.01708 s.
Applications
CPU microarchitecture tuning for balanced cost/performance
Design-space pruning for RTL/VLSI implementation targets
Workload-aware core selection in heterogeneous SoCs
Impact and Recognition
The study delivers a concrete, resource-aware core configuration that retains most of a wide, deep OoO core’s benefit without its full hardware cost—useful as a default OoO template for teaching, research, or early-stage silicon planning.