Welcome. This page will assist new users of the NAL cluster with basic operations. I assume the user has some experience with Linux. Please take a look at the Essential Linux Guide if you require a quick reference to commands and the basic linux file system which you may have forgotten if you're not a regular Linux user.
If you're a windows user, you will need to get an SSH client in order to use the cluster. The free PuTTY is available for download here: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
The cluster is based on the Gentoo Linux distribution. There are 56 nodes in the cluster, each with 2 dual core processors - effectively 4 processors per blade. The kernel has been patched with openMosix which is a tool for automatically distributing workload between nodes. This system is not perfect as process migration doesn't work for all applications (for example, Java programs). There are several scripts available to help better use the nodes for applications that can't do process migration.
The /home folder resides on the master node and is shared via NFS to all other nodes. This means that anything you put in your home folder will be available to you in your home folder on all the other nodes as well. In general, you should login to the master node first before logging in to other nodes, but this is not mandatory.
The /usr folder also is shared via NFS to all other nodes. This of course means that any programs installed globally on the master node is available for execution on all nodes. If you would like to be able to install applications on the cluster for all users, please contact one of the administrators (me, Khash or Ali) to give you root access. Installation of applications should be done using gentoo's built in emerge tool which is a package management system and which will maintain dependencies between packages.
The cluster is separated from the internet by a single gateway computer. You must first login to the gateway machine before logging into the master node. The gateway machine is available at passport1.comm.utoronto.ca. Since the gateway machine is not part of the cluster, it has a separate username and password. When you get an account to the cluster, it is important to set your passwords on both the gateway, and the master node.
To set your password on the gateway, use the passwd command:
userme@gateway ~ $ passwd
(You will be prompted to change the password)
Setting your password on the cluster is different. This is because the password management is performed by a NIS server in order to centralize passwords. To set your password on a node in the cluster, use the yppasswd command:
userme@master ~ $ yppasswd
(The user will be prompted to change the password. It is very important you do not use passwd!)
Using SSH Key Based Authentication, you do not have to enter a password when SSHing between nodes. Start by login in to your account and execute the following commands:
userme@master ~ $ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/khash/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/khash/.ssh/id_dsa.
Your public key has been saved in /home/khash/.ssh/id_dsa.pub.
The key fingerprint is:
(When prompted for anything, just hit Enter. The passphrase should be empty as well.)
userme@master ~ $ cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
(This will created the authorized_keys file in the shared home folder. And you're done!)
You can SSH to another computer using the following command:
userme@master ~ $ ssh <yourusername>@<hostname>
(You will be prompted for a password, unless SSH key based authentication is being used)
You can also use SSH to executed commands remotely. This is particularly useful if you want to use SSH to run a program on several nodes of the cluster. The syntax is as follows:
userme@master ~ $ ssh userme@node04 java MyThread
This will run the java program, MyThread on node04 as userme. Since the user, "userme" is logged in to the master node, it is not necessary to specify userme as the user logging into node04.
To copy a file from one computer to another, you should use SCP.
userme@master ~ $ scp userme@gateway:~/myfile.txt ~/.
(This will copy the file "myfile.txt" from the home folder on the gateway machine to the home folder on the master node.)
In general, you can scp files to or from a machine. To copy a file to the gateway machine:
userme@master ~ $ scp myfile.txt userme@gateway:~/.
Remember that the ~ simply means the home folder of the current user. Absolute paths may of course be specified.
Some helpful scripts have been written to make your gentoo use smooth. These scripts are available in the /usr/bin/node folder.
The nodesCopy command allows the user to copy a file from one node (generally the master node) to all the other nodes. In general, you will not need this command as the /home folder is shared. You will only need to use this command if settings files need to be updated (files that are outside of the home folder).
userme@master ~ $ nodesCopy /etc/settings.conf /etc
This will copy the settings.conf file to all nodes
The nodesPing command will let you know if any nodes are down. If any nodes are down, then they may need to be restarted.
userme@master ~ $ nodesPing
All nodes are up (all responded to ping).
The nodesRunParallel command allows you to execute a command on all the nodes in parallel. This is very useful if you would like to run a program on all the nodes at once (for instance a distributed simulator). You should only use this command if the execution order of the commands do not matter.
userme@master ~ $ nodesRunParallel java Newscast
This will run the java program Newscast on all the nodes at once.
The nodesRunSeries command allows you to execute a command one after the next on all the nodes. This is actually a very slow command because it takes time to login to a node, then execute a command, then logout. If you can use nodesRunParallel, then you should.
userme@master ~ $ nodesRunSeries ls
This will list the files in the home folder of every node, one after the next.
Sometimes, nodesRunSeries will not even work. For the previous case, using nodesRunSeries on Newscast would not be effective because nodesRunSeries will wait until the program terminates on one node before executing it again on another node. This is not the behaviour we want. We would like for the Newscast program to be running on all the nodes at the same time. For this case, it is necessary to run the program using nodesRunParallel. For the second case, we would probably want to use nodesRunSeries to list the files so that the listing is in order starting from node01.
You can also check to see the load of the various nodes in the cluster. The tool mosmon will show a text based display of the load of the nodes in the cluster.