Welcome to the multiprocessor assignment page of  5MD00 -- Advanced Computer Architecture. The purpose of this assignment is to get familiar with multiprocessor architectures and their programming models. The state-of-the-art multicore processors may contain dozens of cores on a single die. The figures below are examples of these processors. The trend of of going multicore posts new challenges to both computer architects and programmers. Putting hundreds of cores on a die is not difficult, but designing memory hierarchy to keep them busy is difficult. On the other hand, programming dozens of cores requires programmers to think 'parallel'. In this assignment, we will try to tackle these challenges, from the view point of both computer architects and programmers.

               Die photo of 8-core Power7 (source: isscc'10)                                Die photo of 16-core Rainbow Fall (source: isscc'10)

           Die photo of 8-core Nehalem-EX (source: asscc'09)                                 Die photo of 48-core IA-32 (sourse: isscc'10)

In this lab you will be asked (after installing all the stuff) to partitioning (parallelizing) a C program using the well-known pthread library and run it on a parallel multiprocessor simulator. We will look at different configurations, changing the number of processors, the level-1 and level-2 cache parameters (number of entries, block-size and associativity).

We will use the m5sim from University of Michigan for the assignment, and run this on top of linux (so that's the first thing you have to install; see the instructions). You are asked to first go through the example program (our 'cookbook') and then perform the real assignment. You have to explore the multiprocessor architecture, changing the above mentioned parameters, and produce performance-cost parato curves. Performance is determined by the total program execution time (counted in number of cycles). Cost is determined by the total area (for a certain technology). We will use a simple area model, only counting the total cache size, and the number of cores. So we exclude costs like, tag-size, bus and connect cost, etc. We will provide the necessary numbers needed to calculate the area. For the cores you have 2 options, both based on the DEC Alpha ISA (instruction-set architecture); one processor is a simple in-order processor, the other a more advanced out-of-order engine.

Now turn to the install instructions, then run the example, and perform the assignment. You may also check the other links for helpful material.


 2010-01-19 Provide a naive pthread program (with source code) that achieves load-balancing, as described here.
Add a "Load-balancing" subsection in the cookbook example page to address this issue.
 2010-01-13 Update the interface script between m5sim and McPAT to version 0,3. Comments are added to "".
These comments indicate how to obtained statistics from m5sim as input to McPAT.
You can download the script "" from the attachment of the example page.
Please read the "README" file and "" file inside the zip package for detail.
 2010-01-13 Deadline of the assignment is extended to 29 January 2010. 
 2010-01-12 Due to the bugs in SD-VBS, we recommend you to use the disparity benchmark for the assignment. If you have already started with other benchmarks of SD-VBS, you can continue to use them, but please keep in mind that their functionality may not be correct. Despite of the bugs, you can still use the performance figures from those benchmarks. You can also come up with your own benchmark, in which case you should email Yifan He with a description of your benchmark.
 2009-12-26 Update the 5md00 version of sd-vbs to v0.4. The stitch benchmark is now working.
The preparation page is ready. You can start to install and test the tools.
The example page and assignment page are coming soon.


Yifan He
E-mail: y.he [at]
Office: PT 9.15 (Monday 9:45 ~ 10:45, Thursday 9:45 ~ 10:45)

Zhenyu Ye
E-mail: [at]