This assignment is an introduction to parallel programming using a shared memory model.
In this assignment, we will be parallelizing a toy particle simulation (similar simulations are used in mechanics, biology, and astronomy). In our simulation, particles interact by repelling one another. A run of our simulation is shown here:
The particles repel one another, but only when closer than a cutoff distance highlighted around one particle in grey.
If we were to naively compute the forces on the particles by iterating through every pair of particles, then we would expect the asymptotic complexity of our simulation to be O(n^2).
However, in our simulation, we have chosen a density of particles sufficiently low so that with n particles, we expect only O(n) interactions. An efficient implementation can reach this time complexity. The first part of your assignment will be to implement this linear time solution in a serial code, given a naive O(n^2) implementation.
Suppose we have a code that runs in time T = O(n) on a single processor. Then we'd hope to run close to time T/p when using p processors. After implementing an efficient serial O(n) solution, you will attempt to reach this speedup using OpenMP.
Dear remote students, we are thrilled to be a part of your parallel computing learning experience and to share these resources with you! To avoid confusion, please note that the assignment instructions, deadlines, and other assignment details posted here were designed for the local students. You should check with your local instruction team about submission, deadlines, job-running details, etc. and utilize Moodle for questions. With that in mind, the problem statement, source code, and references should still help you get started (just beware of institution-specific instructions). Best of luck and we hope you enjoy the assignment!
You're responsible for finding a group. You may work in groups of 2 or 3. If you choose to work in a group of 3 people, then one person in your group should be a non-EECS/CS student. 2 person teams does not have any limitation. After you have chosen a group, please self sign up for a group on bCourses (use the HW2 group set). Note that you can work individually or in teams for this assignment, but either way you must sign-up for a group on bCourses.
The starter code is available on Bitbucket at https://bitbucket.org/Berkeley-CS267/hw2-1.git and should work out of the box. To get started, we recommend you log in to Cori and download the first part of the assignment. This will look something like the following:
student@local:~> ssh demmel@cori.nersc.gov
student@cori04:~> git clone https://bitbucket.org/Berkeley-CS267/hw2-1.git
student@cori04:~> cd hw2-1
student@cori04:~/hw2-1> ls
CMakeLists.txt common.h job-openmp job-serial main.cpp openmp.cpp serial.cpp
There are five files in the base repository. Their purposes are as follows:
CMakeLists.txt
The build system that manages compiling your code.
main.cpp
A driver program that runs your code.
common.h
A header file with shared declarations
job-openmp
A sample job script to run the OpenMP executable
job-serial
A sample job script to run the serial executable
serial.cpp - - - You may modify this file.
A simple O(n^2) particle simulation algorithm. It is your job to write an O(n) serial algorithm within the simulate_one_step function.
openmp.cpp - - - You may modify this file.
A skeleton file where you will implement your openmp simulation algorithm. It is your job to write an algorithm within the simulate_one_step function.
Please do not modify any of the files besides serial.cpp and openmp.cpp.
First, we need to make sure that the CMake module is loaded and that the GNU compiler is selected.
student@cori04:~/hw2-1> module load cmake
student@cori04:~/hw2-1> module swap PrgEnv-intel PrgEnv-gnu
You should put these commands in your ~/.bash_profile.ext
file to avoid typing them every time you log in.
Next, let's build the code. CMake prefers out of tree builds, so we start by creating a build directory.
student@cori04:~/hw2-1> mkdir build
student@cori04:~/hw2-1> cd build
student@cori04:~/hw2-1/build>
Next, we have to configure our build. We can either build our code in Debug mode or Release mode. In debug mode, optimizations are disabled and debug symbols are embedded in the binary for easier debugging with GDB. In release mode, optimizations are enabled, and debug symbols are omitted. For example:
student@cori04:~/hw2-1/build> cmake -DCMAKE_BUILD_TYPE=Release ..
-- The C compiler identification is GNU 8.3.0
...
-- Configuring done
-- Generating done
-- Build files have been written to: /global/homes/s/student/hw2-1/build
Once our build is configured, we may actually execute the build:
student@cori04:~/hw2-1/build> make
Scanning dependencies of target serial
[ 16%] Building CXX object CMakeFiles/serial.dir/main.cpp.o
[ 33%] Building CXX object CMakeFiles/serial.dir/serial.cpp.o
[ 50%] Linking CXX executable serial
[ 50%] Built target serial
Scanning dependencies of target openmp
[ 66%] Building CXX object CMakeFiles/openmp.dir/main.cpp.o
[ 83%] Building CXX object CMakeFiles/openmp.dir/openmp.cpp.o
[100%] Linking CXX executable openmp
[100%] Built target openmp
student@cori04:~/hw2-1/build> ls
CMakeCache.txt CMakeFiles cmake_install.cmake Makefile openmp serial
We now have two binaries (openmp and serial) and two job scripts (job-openmp and job-serial).
Both executables have the same command line interface. Without losing generality, we discuss how to operate the serial program here. The program can be simply run:
student@cori04:~/hw2-1/build> ./serial
Simulation Time = 1.43277 seconds for 1000 particles.
By default, the program runs with 1000 particles. The number of particles can be changed with the "-n" command line parameter:
student@cori04:~/hw2-1/build> ./serial -n 10000
Simulation Time = 195.029 seconds for 10000 particles.
If we rerun the program, the initial positions and velocities of the particles will be randomized because the particle seed is unspecified. By default, the particle seed will be unspecified; this can be changed with the "-s" command line parameter:
student@cori04:~/hw2-1/build> ./serial -s 150
Simulation Time = 1.45459 seconds for 1000 particles.
This will set the particle seed to 150 which initializes the particles in a reproducible way. We will test the correctness of your code by randomly selecting several particle seeds and ensuring the particle positions are correct when printed with the "-o" command line parameter. You can print the particle positions to a file specified with the "-o" parameter:
student@cori04:~/hw2-1/build> ./serial -o serial.parts.out
Simulation Time = 1.78357 seconds for 1000 particles.
This will create a serial.parts.out file with the particle positions after each step listed. You can use the hw2-rendering tool to convert this into a .gif file of your particles. See the below section on Rendering Output for more information.
You can use the "-h" command line parameter to print the help menu summarizing the parameter options:
student@cori04:~/hw2-1/build> ./serial -h
Options:
-h: see this help
-n <int>: set number of particles
-o <filename>: set the output file name
-s <int>: set particle initialization seed
There will be two types of scaling that are tested for your parallel codes:
While the scripts we are providing have small numbers of particles 1000 to allow for the O(n2) algorithm to finish execution, the final codes should be tested with values much larger (50000-1000000) to better see their performance.
We will grade your assignment by reviewing your assignment write-up, measuring the scaling of both the openmp and serial implementations, and benchmarking your code's raw performance. To benchmark your code, we will compile it with the exact process detailed above, with the GNU compiler. We will run your submissions on Cori's KNL processors.
Supposing your custom group name is XYZ, follow these steps to create an appropriate submission archive:
student@cori04:~/hw2-1/build> cmake -DGROUP_NAME=XYZ ..
student@cori04:~/hw2-1/build> make package
This second command will fail if the PDF is not present.
student@cori04:~/hw2-1/build> tar tfz cs267XYZ_hw2_1.tar.gz
cs267XYZ_hw2_1/cs267XYZ_hw2_1.pdf
cs267XYZ_hw2_1/serial.cpp
cs267XYZ_hw2_1/openmp.cpp
Write-up Details
Notes:
The output files that are produced from running the program with the "-o" command line parameter can be fed into the hw2-rendering tool made available to convert them into .gif files. These animations will be a useful tool in debugging. To get started clone the hw2-rendering repo:
student@cori04:~> git clone https://bitbucket.org/Berkeley-CS267/hw2-rendering.git
This tool uses python. This can be loaded on Cori with the following command:
student@cori04:~> module load python/3.6-anaconda-5.2
We can then convert the output files to gifs with the following command:
student@cori04:~/hw2-1/build> ~/hw2-rendering/render.py serial.parts.out particles.gif 0.01
Here serial.parts.out is an output file from the "-o" command line parameter. You should find a particles.gif file in your directory. The number 0.01 is the cutoff distance (will be drawn around each particle).
The output files that are produced from running the program with the "-o" command line parameter can be fed into the hw2-correctness tool made available to perform a correctness check. This is the same correctness check we will be performing when grading the homework, however, we will randomly select the particle seeds. To get started clone the hw2-correctness repo:
student@cori04:~> git clone https://bitbucket.org/Berkeley-CS267/hw2-correctness.git
This tool uses python. This can be loaded on Cori with the following command:
student@cori04:~> module load python/3.6-anaconda-5.2
We can then test the output files for correctness with the following command:
student@cori04:~/hw2-1/build> ~/hw2-correctness/correctness-check.py serial.parts.out correct.parts.out
If the program prints an error, then your output is incorrect. Here serial.parts.out is an output file from the "-o" command line parameter from your code. This can be substituted for any output you wish to test the correctness for. The correct.parts.out can be generated from the provided O(n^2) serial implementation. Remember to specify a particle seed with "-s" to ensure the same problem is solved between the two output files. The hw2-correctness repo provides the "verf.out" file which is the correct output with particle seed set to 1 "-s 1".