Lab5

Let's Learn Some Stuff with These Machines

Extra credit date (+3%): September 20, 2023

Due date: September 24, 2023

Pre-Lab / Motivation

As some quick inspiration, let's check out what machine learning can do and how easy modern frameworks make things. We'll use tensorflow to do some digit recognition on the MNIST dataset.

Note: This section should be turnkey. It should not take more than ten minutes.

Setup: Install Tensorflow

First, we'll need to get TensorFlow on the Jetson:

$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran

$ sudo apt-get install python3-pip

$ sudo pip3 install -U pip

$ sudo pip3 install -U pip testresources setuptools numpy==1.16.1 future==0.17.1 mock==3.0.5 h5py==2.9.0 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind11

$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==2.3.1+nv20.12

Run Tensorflow

Now, download an example implementation:

You should be able to simply run this example:

$ python3 tf_mnist_example.py

Note: You will need network access when this first runs as it will download the mnist dataset.

Take a quick peek inside the tf_mnist_example.py file and see if you can get a sense of what it's doing.

For the rest of lab, we're going to implement (a tiny corner of) what tensorflow does for you.

Lab: Learning a Circuit

Today, we'll be looking into how (performant) machine learning is implemented.

A lot of work for a simple network, we are going to teach a computer to learn how to implement an AND gate :).

Background and Resources

Here are intro slides for today's lab.

Here is the lab workbook. The beginning of the workbook is an explainer of all of the key concepts to implement Batch Gradient Descent. Read through this lightly once, it's a bit dense. Once you get to the Implementation section, we start filling in code. You'll probably need to re-read the relevant sections of the workbook once you start trying to implement each function.

Building a neural network

First, download the starter code: https://classroom.github.com/a/EqFGU4-R

We dropped a fair amount of starter code on you. Take a moment to read through everything to get a sense of what this code is trying to do. By the time we're all done, you'll have implemented:


sigmoid_activation.cu::sigmoidActivationForward()

sigmoid_activation.cu::sigmoidActivationBackprop()

mse_cost.cu::meanSquaredErrorCost()

mse_cost.cu::dMeanSquaredErrorCost()

linear_layer.cu::linearLayerForward()

linear_layer.cu::linearLayerBackprop()

linear_layer.cu::linearLayerUpdateWeights()

linear_layer.cu::linearLayerUpdateBias()


main.cu:: // TODO network structure

    nn.addLayer(...);

    nn.addLayer(...);

    nn.addLayer(...);

    nn.addLayer(...);

mse_cost.cu

The first thing we'll implement is cost functions. We'll need both the cost and the derivative of the cost.

Both functions will start by computing an index into input matrices. Nothing special here, same index you've been doing for a while with CUDA:

int index = blockIdx.x * blockDim.x + threadIdx.x;

You also need to handle when input dimensions don't match hardware structure. That is, sometimes the index will be invalid, so you'll want to wrap everything with:


if (index < size) {

    ...

}


After that, the functions differ.

For cost, you'll want to first compute the partial cost at this particular index. Then add that (mind the hint in the source!) to the single, shared, global total cost.

The derivative is simpler as it's element-wise, with a result in each eA[index].

sigmoid_activation.cu

Next up is the activation function. We'll need this to go both forward (inference) and backwards (training feedback).

The structure of these methods will look a lot like your MSE methods (though, when is index valid now?). The forward path is quite direct, you simply need to compute the sigmoid (this function is given already) for each input. Back-propagation requires a bit more work, but can still be a one-liner.

linear_layer.cu

Here's the most code we'll be writing. We'll need to implement all the methods that make a linear layer work.

Start by computing the forward path:


// No math needed to compute these.. just some reasoning.

int output_rows = ... ;

int output_cols = ... ;

Next, implement backprop – this will look a lot like the forward path.

Up next is weight updates. Again, the control flow looks pretty similar.

Last thing to do is update biases. This one looks a little different, and our old friend atomicAdd will probably need to make an appearance.

main.cu

Once everything is implemented, we need to hook it all together!

We'll put together a pretty simple network, two linear layers each with our simple sigmoid activation function.

It's useful to keep references to the linear layers, as we can print their values to debug things. You'll need two layers. Here's the whole network:


LinearLayer ll1 = LinearLayer("linear_1", Shape(2, 2));

nn.addLayer(&ll1);

nn.addLayer(new SigmoidActivation("sigmoid_1"));

LinearLayer ll2 = LinearLayer("linear_2", Shape(2, 1));

nn.addLayer(&ll2);

nn.addLayer(new SigmoidActivation("sigmoid_2"));

And you're done!

Training

Once you have everything implemented and the network architecture created. It's time to run training.

In dataset.cu comment out XNOR_GATE to create the AND gate dataset and uncomment it to create the XNOR dataset.

Deliverables

Assignment

Our final assignment is open-ended. We would like for you to go explore a little bit and try to build something fun of your own interest (that uses a GPU, of course!)

We have two choices, and you are welcome to do whichever you prefer.

Option A: Jetson AI Certification

nVidia will certify you as a "Jetson AI Specialist" if you complete a small, original project using the Jeston. This might be a fun additional certification to add to your collection if it is something you are interested in.

The details of the program are here.

For this option, you must submit to us the same materials you will submit to nVidia for certification (see the "Hands-On, Project-Based Assessment" section at the bottom of the page).

You won't get your response from nVidia until after the end of the term, so don't worry about that; we will grade your project independently.

Option B: Research Replication

Modern machine learning research is rapidly improving its artifact dissemination and documentation practices since the rise in attention to the "ML replication crisis". As a result, most interesting new papers have datasets and processing pipelines available in public repositories.

For this option, pick a recent (say, within the last 5~10 years) paper and perform a replication study. That is, you should run their experiments on your hardware and compare the results. Unless they happened to run on the same TX2 hardware, the absolute performance will likely be different, but the relative performance and trends should largely hold.

In your writeup, explain briefly the goal of the study that you chose to reproduce, as well as any obstacles you had to overcome to get their artifact running in your environment. Discuss how your results compare to the original results and try to explain when things diverge.

What to Submit

Prepare a report document with answers for each of the Report Deliverables above.

Lab (Neural Network)

Assignment

Option 1

For the Jetson AI Certification, we will be evaluating the same criteria that NVIDIA will be reviewing. The requirements are produced below for your convenience.

Option 2