linux openmpi multicore

OpenMPI on CentOS 5

Necessary steps before setting up OpenMPI

1) write working MPI code, test on a working cluster. An example program which demonstrates use of MPI is

Program simp

Implicit None

include 'mpif.h'

integer :: ierr,rank,num_CPUs

rank =0

num_CPUs=1

call mpi_init(ierr)

call mpi_comm_size(MPI_COMM_WORLD,num_CPUs,ierr)

call mpi_comm_rank(MPI_COMM_WORLD,rank,ierr)

write(*,*) 'hello from CPU',rank,'of',num_CPUs

call mpi_finalize(ierr)

end program

This program runs on an arbitrary number of CPUs

2) install CentOS on two or more computers.

http://www.howtoforge.com/installation-guide-centos5.1-desktop

I chose to install Desktop-Gnome and Clustering.

3) Optional: apply all updates; reboot as needed

OpenMPI on a single machine

http://dvbmonkey.wordpress.com/2009/02/27/getting-started-with-open-mpi-on-fedora/

http://cs.ucsb.edu/~hnielsen/cs140/openmpi-install.html

http://na-inet.jp/na/pccluster/centos_x86_64-en.html

http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/

yum install gcc compat-gcc* gcc-gfortran openmpi openmpi-devel glibc-devel

source ~/.bashrc

1) serial compilers gcc and gfortran are not installed by default

su

yum install gcc

yum install compat-gcc*

yum install gcc-gfortran

2) install MPI

su

yum install openmpi

yum install openmpi-devel

3) set MPI environment variables.

This only applies to the user you are running as, so you probably want to log out of root first and into

your normal account

First we'll see which serial compiler the MPI environment is tied to:

mpi-selector --query

There is nothing returned because we have not set it yet. We need to determine which compilers we can use:

mpi-selector --list

openmpi-1.4-gcc-i386

openmpi-1.4-gcc-x86_64

We will use the first choice:

mpi-selector --set openmpi-1.4-gcc-i386

However, mpif90 is still not a valid command

4) to make mpif90 available as a command, log out and log back in

OR just reload the shell to get updated list of commands

source ~/.bashrc

5) verify MPI commands are available

mpirun

mpirun --version

mpicc

mpif77

mpif90

6) compile working parallel code from step 1

mpif90 example.mpi.f90

However, this fails due to a library issue.

7) To correct the library problem encountered, install libc

su

yum install glibc-devel

8) [repeat of 9] compile the code. [It works!]. Run it

mpif90 example.mpi.f90

./a.out

mpirun -np 1 ./a.out

mpirun -np 2 ./a.out

9) Although these run, there is a complaint about InfiniBand

http://www.open-mpi.org/community/lists/users/2010/12/14992.php

To correct, suppress use of Infiniband:

mpirun -mca btl ^openib -np 2 ./a.out

10) For proper use of MPI, create a host file which specifies the resources available.

http://cs.calvin.edu/curriculum/cs/374/homework/MPI/01/multicoreHostFiles.html

gedit .mpi_hostfile

# hostfile for OpenMPI

# The master node has 'slots=2' because it is a dual-processor machine

localhost slots=2

11) Now run the program while suppressing Infiniband usage and use the host file:

mpirun -mca btl ^openib -np 2 --hostfile .mpi_hostfile ./a.out

We have now successfully set up OpenMPI on a single computer

Necessary set up for connection to remote machines

OpenMPI uses SSH to authenticate between the master node and the slave nodes. Thus we need to set up SSH between nodes. However, we don't want to specify the password for each node every time we run an MPI program. Therefore we set up SSH without passwords.

passwordless SSH authentication

http://www.ctkn.net/2011/11/passwordless-ssh-public-key-authentication-for-centos-5-centos-6-ubuntu-debian-linux-in-general/

1) on the master computer, generate the SSH key pair. [You can also do this on the slave nodes, I'm not sure if it is necessary.]

ssh-keygen -t dsa

2) copy the public key from the master to the remote slave computer. The command below assumes ~/.ssh folder already exists on the remote slave computer.

cat ~/.ssh/id_dsa.pub | ssh ubuntuServer 'cat >> ~/.ssh/authorized_keys'

You will need to change "ubuntuServer" to the appropriate username and computer name combination for the slave node. You will need to repeat this for all the slave nodes.

3) seLinux is paranoid about permissions. The error when attempting to log in without the correction below is in /var/log/secure

On the slave node,

chmod 0600 ~/.ssh/authorized_keys

4) for the host file, we cannot specify both the SSH user name and system name. To get around this, create ssh config file on the master computer

http://www.saltycrane.com/blog/2008/11/creating-remote-server-nicknames-sshconfig/

One entry per slave node

gedit ~/.ssh/config

Host nicknameslavenode1

User usernamehere

HostName computer1name.domain.com

Host nicknameslavenode2

User usernamehere

HostName computer2name.domain.com

chmod 0600 ~/.ssh/config

5) test to verify you can access remote machines without password. From the master computer,

ssh nicknameslavenode1

MPI uses arbitrary ports for connections after SSH authentication

http://www.open-mpi.org/community/lists/users/2009/02/7991.php

http://stackoverflow.com/questions/2495198/unable-to-run-openmpi-across-more-than-two-machines

http://wiki.centos.org/HowTos/Network/IPTables

http://www.howtogeek.com/wiki/Using_Iptables_on_Linux#Allowing_All_Traffic_from_an_IP_Address

http://www.cyberciti.biz/faq/rhel-fedorta-linux-iptables-firewall-configuration-tutorial/

On the master computer, we need to allow incoming connections from the slave nodes

1) To view the current iptables rule set,

/sbin/iptables --line-numbers -n -L

Observer that there are the "INPUT" rule set in the default CentOS iptables depends on "RH-Firewall-1-INPUT"

2) We want to add the IP address of the slave nodes. Need to determine the IP address of the slave nodes:

/sbin/ifconfig

3) Add the IP address of the slave node to the iptables of the master node:

/sbin/iptables -I RH-Firewall-1-INPUT 8 -s 192.143.21.3 -j ACCEPT

Here "-I" is for insert, and "8" is the position to insert into. The IP address "192.143.21.3" should be the one you found in step 2.

4) save and restart the iptables

/sbin/service iptables save

/sbin/service iptables restart

5) As a check that this works, from the slave node run a port scan of the master

su

yum install nmap

nmap 192.143.21.3

Add remote boxes to hostfile

1) Now that we can ssh from the master to the slave with the "nicknamehere" shortcut, and we have allowed all incoming connections from the slave node to the master, we can add the slave node to the list of accessible computers.

gedit .mpi_hostfile

# hostfile for OpenMPI

# The master node has 'slots=2' because it is a dual-processor machine

localhost slots=2

nicknameslavenode1 slots=2

nicknameslavenode2 slots=2

If you want to limit the number of CPUs for a given computer, use

nicknameslavenode1 slots=2 max-slots=2

in the host file

2) when running the mpi command

mpirun -mca btl ^openib -np 4 --hostfile .mpi_hostfile ./a.out

a copy of the executable needs to be available on each slave node.

scp a.out usernamehere@computer1name.domain.com:~

Troubleshooting

Run a compiled MPI program on three CPUs using a host file

mpirun -mca btl ^openib -np 4 --hostfile .mpi_hostfile ./a.out

mpirun --debug-daemons -mca btl ^openib -np 4 --hostfile .mpi_hostfile ./a.out

mpirun --debug-daemons -mca btl ^openib -H localhost -np 2 ./a.out

Check to see whether the nodes respond properly (verifies SSH, the host file, and iptables are all properly configured; independent of the program you wrote)

mpirun --mca plm_base_verbose 5 --hostfile .mpi_hostfile -pernode hostname

Set up shared folder

http://linuxwave.blogspot.com/2008/08/nfs-howto-for-centos-5.html

http://www.cyberciti.biz/tips/linux-more-on-user-id-password-and-group-management.html

server = 192.168.0.1

client = 192.168.0.3

On the server

mkdir /home/<username>/share

su

# gedit /etc/exports

/home/<username>/share 192.168.0.3(rw,sync)

# gedit /etc/hosts.allow

portmap: 192.168.0.3

# /etc/init.d/nfs start

# /etc/init.d/portmap start

?

ssh -t nickname "cd /tmp ; bash"

On the client

# /etc/init.d/portmap start

$ mkdir /home/<username>/

# mount 192.168.0.1:/home/<username>/share /home/<username>

$ cd /home/<username>/share

$ ls

[optional] Check /var/log/messages for any error that might occur

# tailf /var/log/messages

[optional] Use mount to check if the folder is mounted properly

# mount

192.168.0.1:/home/sharing on /mnt type nfs (rw,addr=192.168.0.1)

[optional] Edit /etc/fstab to mount the shared folder on boot

# vi /etc/fstab

192.168.0.1:/mnt/sdb1/backup /mnt nfs rw,hard,intr 0 0

On the server

$ cd /home/<username>/share

$ mpif90 example.mpi.f90

$ mpirun

Torque installation

http://www.adaptivecomputing.com/products/open-source/torque/

http://www.adaptivecomputing.com/resources/downloads/torque/

http://en.wikipedia.org/wiki/TORQUE_Resource_Manager

All steps below are done on master node

1) Download the latest version

wget http://www.adaptivecomputing.com/resources/downloads/torque/torque-4.1.0.tar.gz

2) Extract downloaded file

tar -zxvf torque-4.1.0.tar.gz

cd torque-4.1.0

./configure

3) configure: error: TORQUE needs lib openssl-devel and libxml2-devel in order to build. Read the "INSTALL" directions

su

yum install openssl-devel libxml2-devel

4) now ready for "make" which compiles source on the local architecture. Read the "INSTALL" directions

make

su

make install

5) Torque is installed. Need to start daemon

su

cd contrib/init.d/

./trqauthd start

6) start Torque

./torque.setup usernamehere

http://ubuntuforums.org/showthread.php?t=289767

https://twiki.cern.ch/twiki/bin/view/Main/PbsInstallationAndConfiguration

yum install torque-server torque-client

If you plan to use the sample fifo scheduler, than also install torque-scheduler (maui and moab are more advanced schedulers). There are other packages too. 'yum list torque\*'.

Todo list:

how to set up a diskless cluster

MPI on Ubuntu

Here I assume a single multicore system. No remote boxes are used

source: http://heather.cs.ucdavis.edu/~matloff/MPI/NotesLAM.NM.html

$ sudo apt-get install gfortran

See which packages are recommended for mpif90:

$ mpif90

$ sudo apt-get install <whatever was recommended>

like

sudo apt-get install libopenmpi-dev libmpich1.0-dev libmpich-mpd1.0-dev

Next, you'll need to create a list of computers where the mpi will run. Since we are on one computer with multiple CPU, use

$ gedit ~/lamboothosts

localhost cpu=4

$ lamboot -v ~/lamboothosts

$ mpif90 filename.mpi.f90

$ mpirun -np 4 a.out

mpiexec versus mpirun

QUESTION

What is the difference between mpiexec and mpirun?

ANSWER

On some systems these are the same, and this will be explicitly stated in

man mpirun

On other systetms, there is a difference:

Mpiexec is used to initialize a parallel job from within a PBS batch/interactive environment. Mpiexec uses the task manager

library of PBS to spawn copies of the executable on all the nodes in a PBS allocation. Always use mpiexec in the PBS environment.

Mpirun is used outside of the PBS environment. For more information "man mpiexec" "man mpirun" "man MPI".