Homework 4

Deadline: 11th of December 2016.

Submit: by web form

General rules: read them

Problem description

Implement both the trivial and the efficient cache-oblivious algorithms for transposing n×n matrices whose elements are 32-bit integers. Measure the running time on your computer, and then use our cache simulator to determine the number of page faults of an idealized cache.

Cache simulator

Modify your program so that it prints out swapped elements instead of actually swapping elements. Pass this output to our cache simulator to determine the number of page faults of an idealized cache as presented during the lecture.

Download the simulator and start it with the following parameters:

  • B – size of a page;

  • C – number of pages in the cache.

The simulator reads commands for transposing a matrix on the standard input. Every row has one of the following forms:

  • N n – initialize an n×n matrix

  • X r1 s1 r2 s2 – swap elements (r1,s1) and (r2,s2), assuming that coordinates are numbered from 0 to n-1

  • E – terminate the transposition procedure

The output of the simulator is a text file describing results of the experiment:

  • Elements: the number of elements of the matrix

  • Accesses: the number of memory accesses

  • Misses: the number of memory transfers from the main memory to the cache

  • Missed-bytes: the total number of elements loaded to the cache

Use the following cache parameters for your experiments:

The simulator expects that the matrix is stored by rows and the first row starts at the beginning of a page.

Resulting graphs

Transpose matrices of sizes n×n where n = 64, 69, 74, 80, 87, 94, 101, 109, 118, 128, 138, 149, 161, 174, 188, 203, 219, 237, 256, 276, 298, 322, 348, 376, 406, 438, 474, 512, 552, 597, 645, 696, 752, 812, 877, 948, 1024, and so on, as much as you can store in the RAM. This sequence is generated by the function 2^{k/9} starting from k=54. Repeat the measurements a few times and determine the average for each n, so that the results are accurate.

The report is expected to study the following behaviour.

  • Compare the running time of the trivial and the cache-oblivious algorithms on a real computer. Also report on the parameters of the used computer (processor type and the size of cache).

  • Study how the number of page faults of both algorithms is influenced by the parameters of our idealized caches.

In all graphs, plot the number of page faults for a single swap of two elements versus the size of a matrix; that is, the size n is on the horizontal axis and the vertical axis determines the average time of swapping two elements of a matrix.