5.1 Install the MPI benchmark
https://github.com/LLNL/mpiBench
and measure the performance of collective operations as follows:
Run the default set of tests: (for your VM cluster 2,4,8)
mpirun -n 2 mpiBench
Run the given message size range and iteration count for Alltoall, Scatter, Bcast, Allreduce, Allgather and Barrier:
mpirun -n 2 -b 32 -e 2K -i 100 -ppdebug mpiBench Alltoall, Scatter, Bcast, Allreduce, Allgather,Barrier Barrier
Using cruch_mpiBench to show data
crunch_mpiBench -op Alltoall,Scatter,Bcast,Allreduce,Allgather,Barrier out.txt
For each operation, plots the bw for all test buffer size, for x axis =VM size .
5.2 Consider matrix multiplication example below. How is the communication setup?
Modify the code to use block style matrix multiplication as in the lecture.
Divide A,B, C into block of size N/4 x N/4
and distribute into each node for both A,B. The process computes each block of C using its A,B blocks.
Then, the block A,B are circulated. Assume the size N = 4K x 4K containing random number (numpy array)
(https://edoras.sdsu.edu/~mthomas/sp17.605/lectures/MPI-MatMatMult.pdf)
Compare the time with the example in mpi4py.
-Running on a cluster
Run sudo apt-get install -y python-mpi4py
on all nodes.
Test the installation: mpiexec -n 5 python -m mpi4py helloworld
Create machinefile in ~/
with the ip-addresses of the nodes:
farmer@192.168.17.11 farmer@192.168.17.12 farmer@192.168.17.13 farmer@192.168.17.14
Run:
mpirun -n 4 -machinefile ~/machinefile python -m mpi4py helloworld
Resources:
- https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi
-https://github.com/happy-labs/mpi/blob/master/multi_process_multiplier.py