컴퓨터구조론 Computer Systems

Functions and Procedures
Stack Structure
Calling Conventions
Recursion
Arrays and Structures
One and Multi-dimensional Arrays
Structure Allocation, Access and Alignment
Memory Layout

Week7: Storage and Memory Hierarchy

Lecture9 Slide Holiday(4/30)
Lecture9 Recording (4/27 5PM)

Memory Hierarchy
Registers, Cache, Main Memory, Disk, Remote Storage

Week8: Midterm Review

Holiday(4/30 Optional Make-up) Holiday(5/5 Optional Make-up)
Make-up Recording (4/30) Make-up Recording (5/5)

Make-up Lecture (Review of Week 5-7)
Make-up Lecture (Review of Week 3-4)

Midterm Project

Getting started

Your computer needs be able to run Java and Python 3 scripts for this project. You may use Linux, Windows 10, or Mac OS.

If you run Windows 10, install Ubuntu by following this:
https://ubuntu.com/tutorials/tutorial-ubuntu-on-windows#1-overview
You need to set up the sudo password.

$sudo apt install python3
$sudo apt update
$sudo apt install default-jre
$sudo apt install unzip zip vim git
If asked, please enter sudo password.

Download project files through git clone (try to learn git: https://rogerdudler.github.io/git-guide/index.ko.html)
Throughout project, try to commit your changes by git commit -m "change message" so we can track your changes (at least for each task)

$git clone https://github.com/leeymcj/comuter-system.git
$cd computer-system

list project files by listing

$ls

You should see ex1.s, ex2.s, factorial.s (midterm project files) and argmax.s, dot.s, main.s matmul.s files (where you will write your final project code).

you can start editing

$vim argmax.s

Submission
zip entire project files including git log

$zip <MyName>.zip -r * .git
then, upload <MyName>.zip to LMS homework submission.

Assembly Practice (4 tasks)

In previous courses (C programming), we have been dealing with C program files (.c file extension), and have been using the gcc compiler to execute these higher-level language programs. Now, we are learning about the RISC-V assembly language, which is a lower-level language much closer to machine code. For context, gcc takes the C code we write, first compiles this down to assembly code (e.g. x86, ARM), and then assembles this down to machine code/binary.

In this midterm task, we will write RISC-V assembly programs, each of which have a .s file extension. To run these, we will need to use Venus, a RISC-V simulator that you can find here. There is also a .jar version of Venus that we have provided in the tools folder under your base lab repository.

Goals

Get an idea of how to translate C code to RISC-V (one of Assembly language).
Practice running and debugging RISC-V assembly code.
Write RISC-V functions with the correct function calling procedure.

0. RISC-V

For now, you can ignore ecalls (executive calls) that transfer control to OS (for input/output)

riscv.pdf

1. Translating from C to RISC-V

Open the files ex1.c and ex1.s. The assembly code provided (.s file) is a translation of the given C program into RISC-V.

Find/explain the following components of this assembly file.

The register representing the variable k.
The register representing the variable sum.
The registers acting as pointers to the source and dest arrays.
The assembly code for the loop found in the C code.
How the pointers are manipulated in the assembly code.

2. Get to know RISC-V and Venus Simulator

Paste the contents of ex2.s into the Venus editor.
Click the “Simulator” tab and click the “Assemble & Simulate from Editor” button. This will prepare the code you wrote for execution. If you click back to the “Editor” tab, your simulation will be reset.
In the simulator, to execute the next instruction, click the “step” button.
To undo an instruction, click the “prev” button.
To run the program to completion, click the “run” button.
To reset the program from the start, click the “reset” button.
The contents of all 32 registers are on the right-hand side, and the console output is at the bottom
To view the contents of memory, click the “Memory” tab on the right. You can navigate to different portions of your memory using the dropdown menu at the bottom.

Paste the contents of ex1.s in Venus and record your answers to the following questions. Some of the questions will require you to run the RISC-V code using Venus’ simulator tab.

What do the .data, .word, .text directives mean (i.e. what do you use them for)? Hint: think about the 4 sections of memory (we will learn in the lecture)
Run the program to completion. What number did the program output? What does this number represent?
At what address is n stored in memory? Hint: Look at the contents of the registers.
Without actually editing the code (i.e. without going into the “Editor” tab), have the program calculate the 13th fib number (0-indexed) by manually modifying the value of a register. You may find it helpful to first step through the code. If you prefer to look at decimal values, change the “Display Settings” option at the bottom.

3. Factorial

In this exercise, you will be implementing the factorial function in RISC-V. This function takes in a single integer parameter n and returns n!. A stub of this function can be found in the file factorial.s.

You will only need to add instructions under the factorial label, and the argument that is passed into the function is configured to be located at the label n.
You may solve this problem using either recursion or iteration.

As a sanity check, you should make sure your function properly returns that 3! = 6, 7! = 5040 and 8! = 40320.

You need to test this using the online version of Venus.

Week9: Memory Hierarchy and Locality

Lecture10 Slide Holiday(5/5)
Lecture10 Recording(5/7)

Locality: Temporal and Spatial Locality

Week10: Cache

Lecture11 Slide Lecture12 Slide
Lecture11 Recording(5/12) Lecture12 Recording(5/14)

Cache structure
Cache hit and miss analysis
Cache-Friendly Programming

Week11: OS Memory Management

Lecture13 Slide Lecture14Slide
Lecture13 Recording(5/19) Lecture14 Recording(5/21)

Virtual Memory
VM for caching, programming, memory protection
Process : fork, wait, execve

Week12: Concurrent Programming

Lecture15 Lecture16
Lecture15 Recording(5/26) Lecture16 Recording(5/28)

Process and Thread
Sharing
Synchronization (Mutex)

Week13: Parallel and Cloud Computing

Lecture17 Lecture18
Lecture17 Recording(6/2) Lecture18 Recording(6/4)

Race, Mutex, Semaphore
Parallel Speedup and Ahmdal's Law
MapReduce

Week14: Industry-Coupled Problems

Lecture19 Lecture20
Lecture19 Recording(6/9) Lecture20 Recording(6/11)

Real-World Problems in Industry
Artificial Intelligence
Autonomous cars, IoTs

Week15: Everything We Learn

Lecture21
Lecture21 Recording(6/16)

Final Review
Final Exam (6/18)

Final Project

The purpose of IC-PBL project is to have you understand how machine learning system (or any C program) actually runs on computer hardware. To achieve this, you will implement such system in RISC-V assembly language.

You will learn how to use registers efficiently, write functions, use calling conventions for calling your functions, as well as external ones, allocate memory on the stack and heap, work with pointers and more!

To make the project more interesting, you will implement functions which operate on matrices and vectors – for example, matrix multiplication. You will then use these functions to construct a simple Artificial Neural Net (ANN), which will be able to classify handwritten digits to their actual number! You will see that ANNs can be simply implemented using basic numerical operations, such as vector inner product, matrix multiplications, and thresholding.

Neural Networks

A neural networks tries to approximate a (non-linear) function that maps your input into a desired output. A basic neuron consists of a weighted linear combination of the input, followed by a non-linearity – for example, a threshold. Consider the neural network on the right, which implements the logical AND operation:

It is easy to see that for A=0, B=0, the linear combination 0∗0.6+0∗0.6=0, which is less than the threshold of 1 and will result in a 0 output. With an input A=0, B=1 or A=1, B=0 the linear combination will results in 1∗0.6+0∗0.6=0.6, which is less than 1 and result in a 0 output.

Similarly, A=1, B=1 will result in 1∗0.6+1∗0.6=1.2, which is greater than the threshold and will result in a 1 output! What is interesting is that the simple neuron operation can also be described as an inner product between the vector [A,B] and the weights vector [0.6,0.6] followed by as thresholding, non-linear operation.

More complex functions can not be described by a simple neuron alone. We can extend the system into a network of neurons, in order to approximate the complex functions. For example, the following 2 layer network approximates the logical function XOR:

The left is a 2 layer network. The network takes 2 inputs, computes 2 intemediate values, and finally computes a single final output. It can be written as matrix multiplications with matrices m_0 and m_1 with thresholding operations in between as shown below. Convince yourself that this implements an XOR for the appropriate inputs!

You are probably wondering how the weights of the network were determined? This is beyond the scope of this project, and we would encourage you to take advanced classes in numerical linear algebra, signal processing, machine learning and optimization. We will only say that the weights can be trained by giving the network pairs of correct inputs and outputs and changing the weights such that the error between the outputs of the network and the correct outputs is minimized. Learning the weights is called: “Training”. Using the weights on inputs is called “Inference”. We will only perform inference, and you will be given weights that were pre-trained by your dedicated TA’s.

Handwritten Digit Classification

In this project we will implement a similar, but slightly more complex network which is able to classify handwritten digits. As inputs, we will use the MNIST data set, which is a dataset of 60,000 28x28 images containing handwritten digits ranging from 0-9. We will treat these images as “flattened” input vectors sized 784x1. In a similar way to the example before, we will perform matrix multiplications with pre-trained weight matrices m_0 and m_1. Instead of thresholding we will use two different non-linearities: The ReLU and ArgMax functions. Details will be provided below.

Matrix Format

In this project, all two-dimensional matrices will be stored as a one-dimensional vector in row-major order. One way to think about it is that we can create a 1D vector from a 2D matrix by concatenating together all the rows in the matrix. Alternatively, we could concatenate all the columns together instead, which is known as column-major order. For a more in-depth look at row-major vs. column-major order, see this Wikipedia page.

Vector/Array Strides (This is used for matrix multiplication that is consists of dot product!)

The stride of a vector is the number of memory locations between consecutive elements of our vector, measured in the size of our vector’s elements. If our stride is n, then the memory addresses of our vector elements are n * sizeof(element) bytes apart.

So far, all the arrays/vectors we’ve worked with have had stride 1, meaning there is no gap betwen consecutive elements. Now, to do the row * column dot products with our row-major matrices that we’ll need for matrix multiplication, we will need to consider vectors with varying strides. Specifically, we’ll need to do this when considering a column vector in a flattened, row-major representation of a 2D matrix

Let’s take a look at a practical example. We have the vector int *a with 3 elements.

If the stride is 1, then our vector elements are *(a), *(a + 1), and *(a + 2), in other words a[0], a[1], and a[2].
However, if our stride is 4, then our elements are at *(a), *(a + 4), and *(a + 8) or in other words a[0], a[4], and a[8].

To summarize in C code, to access the ith element of a vector int *a with stride s, we use *(a + i * s), or a[i * s]. We leave it up to you to translate this memory access into RISC-V.

For a closer look at strides in vectors/arrays, see this Wikipedia page.

Task 1: ReLU

In relu.s, implement our relu function to apply the mathematical ReLU function on every element of the input array. This ReLU function is defined as

ReLU(a)=max(a,0)

and applying it elementwise on our matrix is equivalent to setting every negative value equal to 0.

Additionally, notice that our relu function operates on a 1D vector, not a 2D matrix. We can do this because we’re applying the function individually on every element of the matrix, and our 2D matrix is stored in memory as a row-major 1D vector.

Testing: ReLU
We’ve provided test_relu.s for you to test your relu function. In it, you can set the values and dimensions of a matrix in static memory. Running the file will print that matrix before and after calling your relu function on it.
You can first work on the web Venus for coding/debugging.
For testing, we need multiple files (test_relu.s will call relu.s), which is not convenient on Web Interface.
So we run venus.jar on local machine., and please copy the code inside local relu.s file to test with
$java -jar venus.jar test_files/test_relu.s

Task 2: ArgMax

Near the end of our neural net, we’ll be provided with scores for every possible classification. For MNIST, we’ll be given a vector of length 10 containing scores for every digit ranging from 0 to 9. The larger the score for a digit, the more confident we are that our handwritten input image contained that digit. Thus, to classify our handwritten image, we pick the digit with the highest score.

The score for the digit i is stored in the i-th element of the array, to pick the digit with the highest score we find the array index with the highest value. In argmax.s, implement the argmax function to return the index of the largest element in the array. If there are multiple largest elements, return the smallest index.

Additionally, note that just like relu, this function takes in a 1D vector and not a 2D matrix. The index you’re expected to return is the index of the largest element in this 1D vector.

Testing: Argmax

We’ve provided test_argmax.s for you to test your argmax function. You can edit it to set a static vector v0 along with its length, and then run the file to print the output returned by running your function, which should be the index of the largest element in v0.
$java -jar venus.jar test_files/test_argmax.s

Task3 Matrix Multiplication

Task 3.1: Dot Product

In dot.s, implement the dot function to compute the dot product of two integer vectors. The dot product of two vectors a and b is defined as

dot(a,b)=∑ai ∗ bi = a0∗b0+a1∗b1+⋯+an−1∗bn−1

where ai is the i-th element of a.

Notice that this function takes in the a stride as a variable for each of the two vectors, make sure you’re considering this when calculating your memory addresses. We’ve described strides in more detail in the background section above, which also contains a detailed example on how stride affects memory addresses for vector elements.

Also note that we do not expect you to handle overflow when multiplying. This means you won’t need to use the mulh instruction.

For a closer look at dot products, see this Wikipedia page.

Testing: Dot Product

This time, you’ll need to fill out test_dot.s, using the starter code and comments we’ve provided.

$java -jar venus.jar test_files/test_dot.s

Overall, this test should call your dot product on two vectors in static memory, and print the result. Feel free to look at test_argmax.s and test_relu.s for reference. While the test functions themselves won’t be graded, we are only providing a basic sanity test on the autograder and the other tests will be hidden to you until after grades are published so testing your code is crucial.

By default, in the starter code we’ve provided, v0 and v1 point to the start of an array of the integers 1 to 9, continuous in memory. Let’s assume we set the length and stride of both vectors to 9 and 1 respectively. We should get the following:

v0 = [1, 2, 3, 4, 5, 6, 7, 8, 9]

v1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]

dot(v0, v1) = 1 * 1 + 2 * 2 + ... + 9 * 9 = 285

What if we changed the length to 3 and the stride of the second vector v1 to 2, without changing the values in static memory? Now, the vectors contain the following:

v0 = [1, 2, 3]

v1 = [1, 3, 5]

dot(v0, v1) = 1 * 1 + 2 * 3 + 3 * 5 = 22

Note that v1 now has stride 2, so we skip over elements in memory when calculating the dot product. However, the pointer v1 still points to the same place as before: the start of the sequence of integers 1 to 9 in memory.

Task 3.2: Matrix Multiplication

Now that we have a dot product function that can take in varying strides, we can use it to do matrix multiplication. In matmul.s, implement the matmul function to compute the matrix multiplication of two matrices.

The matrix multiplication of two matrices A and B results in the output matrix

C=AB,

where C[i][j] is equal to the dot product of the i-th row of A and the j-th column of B. Note that if we let the dimensions of A be (n∗m), and the dimensions of B be (m∗k), then the dimensions of C must be (n∗k). Additionally, unlike integer multiplication, matrix multiplication is not commutative, AB != BA.

Documentation on the function has been provided in the comments. A pointer to the output matrix is passed in as an argument, so you don’t need to allocate any memory for it in this function. Additionally, note that m0 is the left matrix, and m1 is the right matrix.

Note that since we’re taking the dot product of the rows of m0 with the columns of m1, the length of the two must be the same. If this is not the case, you should exit the program with exit code 2. This code has been provided for you under the mismatched_dimensions label at the bottom of the file, so all you need to do is jump to that label when the a row of m0 and a column of m1 do not have the same length.

A critical part of this function, apart from having a correctly implemented dot product, is passing in the correct row and column vectors to the dot product function, along with their corresponding strides. Since our matrices are in row-major order, all the elements of any single row are contiguous to each other in memory, and have stride 1. However, this will not be the case for columns. You will need to figure out the starting element and stride of each column in the right matrix.

For a closer look at matrix multiplication, see this Wikipedia page.

Testing: Matrix Multiplication

Fill out the starter code in test_matmul.s to test your matrix multiplication function. The completed test file should let you set the values and dimensions for two matrices in .data as 1D vectors in row-major order. When ran, it should print the result of your matrix multiplication.
$java -jar venus.jar test_files/test_matmul.s

Note that you’ll need to allocate space for an output matrix as well. The starter code does this by creating a third matrix in static memory.

For testing complicated inputs, you can use any available tool to help you calculate the expected output (numpy, wolfram, online matrix calculator).

m0 = [1, 2, 3,
4, 5, 6,
7 , 8, 9]

m1 = [1, 2, 3,
4, 5, 6,
7, 8, 9]

matmul(m0, m1) =
[30, 36, 42
66, 81, 96
102, 126, 150]

Testing Your Neural Network

First, we’ve provided several smaller input networks for you to run your main function on. simple0, simple1, and simple2 are all smaller inputs that will be easier to debug.

To test on the first input in simple0 for example, run the following:
$java -jar venus.jar main.s -ms 10000000 inputs/simple0/bin/m0.bin inputs/simple0/bin/m1.bin inputs/simple0/bin/inputs/input0.bin output.bin

You can then convert the written file to plaintext, check that it’s values are correct, and that the printed integer is indeed the index of the largest element in the output file.p
$python convert.py --to-ascii output.bin output.txt

For verifying that the output file itself is correct, you can run the inputs through a matrix multiplication calculator, which allows you to click “insert” and copy/paste directly from your plaintext matrix file. Make sure you manually set values to zero for the ReLU step.

Note that these files cover a variety of dimensions. For example the simple2 inputs have more than one column in them, meaning that your “scores” matrix will also have more than one column. Your code should still work as expected in this case, writing the matrix of “scores” to a file, and printing a single integer that is the row-major index of the largest element of that matrix.

MNIST Hand-written number

All the files for testing the mnist network are contained in inputs/mnist. There are both binary and plaintext versions of m0, m1, and 9 input files.

To test on the first input file for example, run the following:
$java -jar venus.jar main.s -ms 10000000 inputs/mnist/bin/m0.bin inputs/mnist/bin/m1.bin inputs/mnist/bin/inputs/mnist_input0.bin output.bin

This should write a binary matrix file output.bin which contains your scores for each digit, and print out the digit with the highest score. You can compare the printed digit versus the one in inputs/mnist/txt/labels/label0.txt.

You can check the printed digit printed by main against the plaintext labels for each of the input files in the mnist/txt/labels folder.

We’ve also included a script inputs/mnist/txt/print_mnist.py, which will allow you to view the actual image for every mnist input. For example, you can run the following command from the directory inputs/mnist/txt to print the actual image for mnist_input8 as ASCII art alongside the true label.

$python print_mnist.py 8

Your own hand-written number

Just for fun, you can also draw your own handwritten digits and pass them to the neural net. First, open up any basic drawing program like Microsoft Paint. Next, resize the image to 28x28 pixels, draw your digit, and save it as a .bmp file in the directory /inputs/mnist/student_inputs/.

Inside that directory, we’ve provided bmp_to_bin.py to turn this .bmp file into a .bin file for the neural net, as well as an example.bmp file. To convert it, run the following from inside the /inputs/mnist/student_inputs directory:

$python bmp_to_bin.py example

This will read in the example.bmp file, and create an example.bin file. We can then input it into our neural net, alongside the provided m0 and m1 matrices.

$java -jar venus.jar main.s -ms 10000000 -it inputs/mnist/bin/m0.bin inputs/mnist/bin/m1.bin inputs/mnist/student_inputs/example.bin output.bin

You can convert and run your own .bmp files in the same way. You should be able to achieve a reasonable accuracy with your own input images.