Md. Sadik Yasir Tauki - Out-of-Order Core Design Space Exploration with gem5

Out-of-Order Core Design Space Exploration with gem5
Finding a “GoodCore” between cost and performance

Project Overview
Using gem5, I evaluated how pipeline width, ROB size, and branch predictor choice impact performance for an OoO CPU. Starting from two extremes—LargeCore (width 8, ROB 192, LTAGE) and SmallCore (width 2, ROB 16, 2-bit BP)—I swept intermediate configurations to identify a cost-effective core that captures most of LargeCore’s performance with far fewer resources.

assignment 4.pdf

Report_4-1.pdf

Key Features
🧪 Parameter sweeps over ROB entries, pipeline width (2→8), and branch predictors (2-bit, Tournament, Bimode, LTAGE)
📈 Metrics: IPC and execution time on a benchmark (bubblesort)
🛠️ Tooling: gem5 for cycle-accurate simulation and controlled experiments

Research Contributions

Showed ROB > 64 yields negligible IPC gains (e.g., 64→192 entries ≈ flat IPC), suggesting 64 as a sweet spot.
Demonstrated pipeline width strongly boosts IPC; width 6 nearly matches width 8 at lower cost.
Quantified branch predictor impact: Tournament and LTAGE substantially outperform simple 2-bit.

Technical Achievements

Selected a balanced GoodCore = width 6, ROB 64, Tournament BP, based on IPC/utilization trends.
Reported comparative results (example set):
• LargeCore: IPC 2.30, 0.00585 s
• GoodCore: IPC 2.01, 0.00655 s
• SmallCore: IPC 0.79, 0.01708 s.

Applications

CPU microarchitecture tuning for balanced cost/performance
Design-space pruning for RTL/VLSI implementation targets
Workload-aware core selection in heterogeneous SoCs

Impact and Recognition
The study delivers a concrete, resource-aware core configuration that retains most of a wide, deep OoO core’s benefit without its full hardware cost—useful as a default OoO template for teaching, research, or early-stage silicon planning.

Page updated

Google Sites

Report abuse