Md Sadik Yasir Tauki - CPU–Memory Performance Trade-off Analysis

Md Sadik Yasir Tauki

CPU–Memory Performance Trade-off Analysis using gem5
Evaluating how CPU model, clock frequency, and DRAM technology affect benchmark execution in gem5

Project Overview
This project studied how processor performance changes with different CPU models, clock frequencies, and memory technologies using the gem5 simulator. The work compared X86TimingSimpleCPU and X86MinorCPU, swept CPU frequency from 1 GHz to 3 GHz, and evaluated multiple DRAM types including DDR3_1600_8x8, DDR3_2133_8x8, and LPDDR2_S4_1066_1x32. The goal was to understand whether application performance is more sensitive to processor speed or memory-system behavior.

Report_3.pdf

Key Features

🖥️ Multiple CPU Models: Compared X86TimingSimpleCPU and X86MinorCPU under identical benchmark conditions.

⚙️ Frequency Sweep: Varied CPU clock from 1 GHz to 3 GHz in 500 MHz steps to measure execution-time sensitivity.

💾 Memory Technology Comparison: Evaluated performance under DDR3_1600_8x8, DDR3_2133_8x8, and LPDDR2_S4_1066_1x32.

📊 Architecture-Level Analysis: Examined execution time, miss latency, cache behavior, and memory-access trends for a sorting benchmark.

Research Contributions

Showed that X86MinorCPU is more sensitive to CPU-frequency scaling than X86TimingSimpleCPU, with the report attributing this to MinorCPU’s in-order pipeline behavior and stronger dependence on processor throughput.
Demonstrated that X86MinorCPU is also more sensitive to memory technology, with larger variation in overall miss latency across DRAM configurations. The report explains that its streamlined pipeline relies more heavily on efficient memory access.
Concluded that the benchmark is more sensitive to memory technology than CPU frequency, meaning memory performance dominated the overall runtime trend more strongly than processor clock changes.
Analyzed the workload’s data-access pattern and identified bubble sort as having mostly sequential accesses with strong spatial locality, which helps cache behavior but can still suffer when memory behavior becomes a bottleneck.

Technical Achievements

Used gem5 to build a full CPU–memory sensitivity study across processor models, frequency scaling, and DRAM types. The assignment specifically required experiments across CPU models, CPU frequencies, and memory configurations.
Instrumented the benchmark using m5_reset_stats() and m5_dump_stats() around the region of interest, as required for focused performance measurement in gem5.
Collected microarchitectural statistics including execution time, CPU instruction counts, DRAM reads, average memory-access latency, row-buffer hits, page-hit rate, and miss rates for dcache, icache, and L2. The milestone report lists sample metrics such as 683 total DRAM reads, dcache miss rate 0.00861, icache miss rate 0.000123, and L2 miss rate 0.400352.
Connected benchmark behavior with architectural effects rather than reporting only raw numbers, emphasizing why frequency and memory changes affect different CPU models differently.

Tools and Software Used

gem5 for full-system architectural simulation.
C / C++ benchmark code with gem5 ROI instrumentation.
X86TimingSimpleCPU and X86MinorCPU models in gem5.
DDR3_1600_8x8, DDR3_2133_8x8, and LPDDR2_S4_1066_1x32 memory models.
GCC / g++ for benchmark compilation, as referenced in the assignment FAQ.
Linux-based lab environment / W135 machines for running gem5 experiments.

Applications

Computer Architecture Research: Studying the interaction between processor pipelines and memory subsystems.
Performance Bottleneck Analysis: Determining whether an application is CPU-bound or memory-bound under different architectural assumptions.
Hardware Design Exploration: Understanding how CPU and DRAM choices influence performance for benchmark workloads.

Impact and Recognition

This project provided a practical architecture-level view of CPU–memory interaction and showed that application performance is not determined by processor speed alone. The results indicated that MinorCPU responds more strongly to both clock and memory changes, while the benchmark overall was more memory-sensitive than frequency-sensitive. The work strengthened understanding of memory-bound behavior, pipeline sensitivity, and how workload access patterns influence processor performance in cycle-level simulation.

Page updated

Google Sites

Report abuse