Installing Debian Linux, DRBL and the Sun Grid Engine (SGE)

NOTE: This page from 2011 is severely outdated, but I'm leaving it here for the sake of posterity because it is still consistently visited

This page documents how I installed Debian Linux, DRBL (Diskless Remote Boot in Linux) and the Sun Grid Engine (SGE) on my homemade "Helmer" Linux cluster in January, 2011. Before I begin, let me give my warm thanks to Padraig Kitterick for his awesome post on configuring a diskless SGE cluster. I must have visited his page a hundred times while setting up my own cluster. The information provided by Padraig is much too valuable to be lost in case this guy (or girl?) ever decides to stop blogging and his site disappears, so any info that I borrowed from Padraig, I will repeat here on this page for the sake of posterity.

Note that I had almost zero admin experience with Linux prior to this, and went through a lot of trial and error -- an important skill for survival in the wilderness of Geekdom -- before I got everything working. This means there is a lot of "this-worked-but-don't-ask-me-why" kind of stuff here, so please don't try to spank me if you find something stupid, but do kindly send me any suggestions/comments (then again, I may not mind the "spanking" part if you are or look like Kate Winslet). I first started with just two motherboards, using one as a master host and the other as an execution host. Once I figured everything out, I bought the rest of the equipment to put the whole cluster together.

I. Installing Debian Linux (Lenny) on a RAID0 Array

I chose Debian Linux for its stability. My master host has no CD/DVD-ROM or floppy disk drive, and I intend to keep it that way. To install Linux, I used UNetbootin to create a bootable USB flash drive, and booted my master host from the flash drive. The installation instructions were pretty straightforward, and NFS could be conveniently included in the installation.

However, one problem I encountered was that the installation program kept installing the kernel onto my flash drive instead of the local hard drive. I managed to work around this by returning to the main menu at the disk partition stage, unplugging my USB flash drive, then starting over from the "detect disk" stage. At this point, the installer had no choice but to install the kernel on my hard drive, since there was nowhere else to install it to!

When I did this, however, the installer designated my disks as "sdb" and "sdc" instead of "sda" and "sdb" which caused a problem later on but could be corrected. My disks were partitioned as follows:

RAID0 device #0 - 480.1 GB software RAID device #1 480.1 GB F ext3 / scsi6 (0,0,0) (sdb) - 250.1 GB ATA ST3250318AS #1 primary 10.0 GB B F ext2 /boot #5 logical 240.1 GB K raid scsi7 (0,0,0) (sdc) - 250.1 GB ATA ST3250410AS #6 logical 240.1 GB K raid #5 logical 10.0 GB F swap swap

Following some advice found on the Web, I set aside 10 GB in the first disk to store the boot information. The remaining 240 GB was used in conjunction with the other drive as a RAID0 array. The drive space needs to be equal for both drives in the RAID0 array, so I used the remaining 10 GB in the second drive as swap space. I don't know if this is a good way to partition Linux drives; I just read about someone who did this, and tried the same, and it worked, so there!

Another thing to note: For some reason, the "/boot" directory had to be installed on the first disk (sdb), not the second disk (sdc), else the system wouldn't boot, regardless of the BIOS setting.

Now for the "sdb" instead of "sda" problem. Upon booting the system, I would keep getting an error like

fsck.ext2: attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1 could this be a zero-length partition?"

The OS would still boot and give me a command prompt, so I could easily fix this by correcting the /etc/fstab file. Essentially, the contents of "/etc/fstab" should be consistent with the output of "fdisk -l". On my system, "fdisk -l" gives me:

Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x518a5189 Device Boot Start End Blocks Id System /dev/sda1 * 1 1216 9767488+ 83 Linux /dev/sda2 1217 30401 234428512+ 5 Extended /dev/sda5 1217 30401 234428481 fd Linux raid autodetect Disk /dev/sdb: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00027204 Device Boot Start End Blocks Id System /dev/sdb1 1 30401 244196001 5 Extended /dev/sdb5 1 1216 9767457 82 Linux swap / Solaris /dev/sdb6 1217 30401 234428481 fd Linux raid autodetect Disk /dev/md0: 480.1 GB, 480109395968 bytes 2 heads, 4 sectors/track, 117214208 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table

I edited "/etc/fstab" so that it now gives me:

# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 /dev/md0 / ext3 errors=remount-ro 0 1 /dev/sda1 /boot ext2 defaults 0 2 /dev/sdb5 swap swap sw 0 0 /dev/sdc1 /mnt/usb auto noauto,user,ro 0 0

I also added the entry for "/dev/sdc1" above for mounting USB hard drives.

II. Enable SSH Access and SAMBA for Windows File Sharing

Installing SSH on the master host is as easy as pie. Logged in as root, I just did

apt-get install ssh

Next, I added myself as a samba user by:

smbpasswd -a woojay

I also edited the "/etc/samba/smb.conf" file so that I can write files on my home directory from Windows:

[homes] read only = no

Restarted samba by:

/etc/init.d/samba restart

Now, for some reason, I found that I had to configure samba in order to be able to SSH to my master host using its machine name instead of IP address. Don't ask me why. Just don't.

With samba enabled, I can right-click on "My Computer" on my Windows XP machine and select the "map network drive" option to map a drive to my home directory on the master host. With SSH enabled, I can do everything from this point on using a PuTTY terminal window on my Windows machine. No more sitting in the closet for me!

III. Setting up the Network Interface

The master host has two network interfaces, eth0, which is connected to the home network router (which is connected to my Windows PC and the Internet), and eth1, which is connected to the Gigabit switch and the 6 execution hosts.

I assigned 192.168.1.x to be all the stuff on my home network, and 192.168.2.x to be the internal network used by the Linux cluster. I set the address of eth0 to 192.168.1.200, for which the gateway is my router, 192.168.1.1. The address of eth1 is set to 192.168.2.1, and the host itself is also the gateway for this address range. All this is done by editing the "/etc/network/interfaces" file, which looks like this:

# This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). # The loopback network interface auto lo iface lo inet loopback # The primary network interface allow-hotplug eth0 #iface eth0 inet dhcp iface eth0 inet static address 192.168.1.200 netmask 255.255.255.0 broadcast 192.168.1.255 gateway 192.168.1.1 dns-domain lan dns-nameservers 192.168.1.1 allow-hotplug eth1 iface eth1 inet static address 192.168.2.1 netmask 255.255.255.0 gateway 192.168.2.1 metric 100 # prevent eth1 from being first default gateway

The trick above is to set the "metric" variable of eth1 to some high number so that it never becomes the default gateway over eth0. When I type "route" as root, I get the following output:


Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.2.0 * 255.255.255.0 U 0 0 0 eth1 192.168.1.0 * 255.255.255.0 U 0 0 0 eth0 default 192.168.1.1 0.0.0.0 UG 0 0 0 eth0 default helmer-eth1 0.0.0.0 UG 100 0 0 eth1

Without the metric variable set, helmer-eth1 (or 192.168.2.1) sometimes appears above 192.168.1.1 in the list above. In that case, the master host becomes unable to access the Internet because the OS thinks that 192.168.2.1 -- which is just connected to the internal switch -- is also the gateway to the Internet.

IV. Installing DRBL

Nothing too complicated here; I just followed the instructions on the DRBL website, choosing the Single System Image (SSI) mode as instructed by Padraig. Here's what my "/etc/hosts" file looked like after installation:

127.0.0.1 localhost 192.168.1.200 helmer # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.2.1 helmer-eth1 192.168.2.11 helmer111 192.168.2.12 helmer112 192.168.2.13 helmer113 192.168.2.14 helmer114 192.168.2.15 helmer115

V. Installing the Sun Grid Engine

I basically followed the installation instructions on the Grid Engine website to install qmaster via "./inst_sge -m". I used Padraig's specifications, which I am going to quote here:

  • Install as root user: You don't have to do this, but it simplifies the process of getting SGE to run on the DRBL nodes. Please note that in recommending this, I am presuming that your cluster network is private, and the nodes won’t be accessible by non-privileged users (i.e. not you).

  • Do opt to verify file permissions

  • Select to use BerkleyDB, but without a spool server

  • Use the ID range suggested in the manual: 20000-20100

  • Accept to install startup scripts

  • Accept to load a file which contains the hostnames of your nodes. Here you enter the full path to file you created before running the install script.

  • Use normal scheduling

I will just add that if the installer asks if you want to enable a JMX MBean server, you can answer no.

After installation, I ran:

source /opt/oge/default/common/settings.sh

to configure various environment variables. I also added this command to my .bashrc file.

The "qconf -sh" command gave me:

helmer helmer111 helmer112 helmer113 helmer114 helmer115 helmer116

Above, "helmer" is my master host, and "helmer111" to "helmer116" are the dedicated execution hosts.

I then installed an execution host on the master host using "./inst_sge -x". Again, I used the following specifications from Padraig:

  • Specify a local spool dir: /var/tmp/spool

  • Accept to install startup scripts

The next step was to configure the default "all.q" queue, alotting 2 slots to "helmer", and 4 slots each to "helmer111"~"helmer116". I also wanted the default shell for jobs to be bash. This was done by executing "qconf -mq all.q" and editing the following lines:

slots 26,[helmer=2],[helmer111=4],[helmer112=4],[helmer113=4], \ [helmer114=4],[helmer115=4],[helmer116=4] tmpdir /tmp shell /bin/bash

I don't remember if I had to set all the execution hosts as administrative hosts, but here's the command for it anyway (do the same for helmer112 ~ helmer116):

qconf -ah helmer111

Each execution host could be configured the same way as the master host. I simply ran the following commands (Padraig actually provides some scripts to automate this process, but I just ran them manually because they were simple enough):

qconf -sconf helmer > helmer.conf cp helmer.conf helmer111 cp helmer.conf helmer112 cp helmer.conf helmer113 cp helmer.conf helmer114 cp helmer.conf helmer115 qconf -Aconf helmer111 qconf -Aconf helmer112 qconf -Aconf helmer113 qconf -Aconf helmer114 qconf -Aconf helmer115

I designated the master host "helmer" as a submission host by:

qconf -as helmer

Next, I configured DRBL so that each execution host will run the SGE service when booting:

cp /opt/drbl/conf/client-extra-service.example /opt/drbl/conf/client-extra-service vi /opt/drbl/conf/client-extra-service

I edited "/opt/drbl/conf/client-extra-service" as follows:

# You can put the extra service you want client to run, put them in one line. # The necessary services (nfs, firstboot, xfs...) for DRBL client are already # specified in the drblpush, so you do not have to add them. # Example: # service_extra_added="webmin apmd" # service_extra_added="sgeexecd.helmer"

I also added all the host names to the @allhosts group by running

qconf -mhgrp @allhosts

and making the following edit:

group_name @allhosts hostlist helmer helmer111 helmer112 helmer113 helmer114 helmer115

Now for the final stage, baby! Update DRBL using

/opt/drbl/sbin/drblpush -i

or:

/opt/drbl/sbin/drblpush -c /etc/drbl/drblpush.conf

Then reboot everything, AND WITNESS YOUR SPANKING NEW 26-CORE SGE CLUSTER IN ITS FULL GLORY. You can check that everything is up and running using the "qstat -f" and "qhost" commands.

VI. Logging into the Execution Hosts

To directly login to each execution host (for whatever reason) from the master host, I had to add some options to the "ssh" command. I aliased the whole command as "ssho" to avoid having to type it every time:

alias ssho='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'

To SSH to "helmer111", I would then do

ssho helmer111

VII. Testing the Network Speed

I wanted to test the speed of the internal network for the cluster, and see how close to "Gigabit" the "Gigabit Ethernet" really is. This could be done by first installing "iperf":

apt-get install iperf

I SSHed to "helmer112" and ran

iperf -s

then from "helmer", I ran

iperf -c helmer112

I got the following output:

Client connecting to helmer112, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 37762 connected with 192.168.2.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 963 MBytes 807 Mbits/sec

807 Mbits/sec seems close enough to 1 Gigabits.

VIII. Allowing External SSH Access

To my pleasant surprise, allowing access to the master host via SSH from outside my home network turned out to be quite simple. All I needed to do was to configure my router so that port 22 gets forwarded to my master host's etho0, 192.168.1.200, and then set up a DynDNS account to associate my home IP address with a named address. I entered the DynDNS login information into my router so that it can update the IP address linked to the named address whenever the IP address changes (since it is dynamic, not static).

IX. Turning On and Shutting Down the Execution Hosts Remotely

Another pleasant surprise was how easy it is to remotely turn on and turn off the execution hosts using WOL (Wake On Lan) functionality. I quickly discovered that my execution hosts often run idle for long periods of time, wasting electricity, if I depend only on the physical power switches to turn them on and off. The WOL function already seemed to be built into the execution hosts' boot image, and I didn't even need to change the BIOS settings on my ASUS P8-H67M LE mobos.

First, I needed to obtain the MAC addresses of each execution host. This could be done by logging into each execution host, running "ifconfig", and looking for the "HWaddr" variable under "eth0".

I gathered all the MAC addresses and created the following script file "wakeall.sh" (note that the MAC addresses below are fake):

wakeonlan -i 192.168.2.255 ab:23:45:67:89:01 wakeonlan -i 192.168.2.255 ab:23:45:67:89:02 wakeonlan -i 192.168.2.255 ab:23:45:67:89:03 wakeonlan -i 192.168.2.255 ab:23:45:67:89:04 wakeonlan -i 192.168.2.255 ab:23:45:67:89:05 wakeonlan -i 192.168.2.255 ab:23:45:67:89:06

The trick above was to include the "-i 192.168.2.255" option, which I figured out after some trial and error, instead of using the default broadcast address "255.255.255.255". When using the default address, the "magic packets" did not seem to reach the execution hosts at all, and my guess is that the packets were sent to the default gateway 192.168.1.200 in eth0, which is connected to the home network. By specifying 192.168.2.255 as the address, the packets are now correctly sent to gateway 192.168.2.1 in eth1, which is connected to the internal switch.

Now all I need to do whenever I want to wake up the execution hosts is to simply run "./wakeall.sh". It's just so cool to sit at my desk and turn my head toward the closet (which is three feet away) to see all six LED's on my Helmer light up all at once when I enter this single command. Of course, I could always buy a $5 set of Christmas lights to obtain the same effect, but that wouldn't be much fun, would it?

To turn off the execution hosts, I created another script file "shutdownall.sh":

ssho helmer111 "shutdown -h now" ssho helmer112 "shutdown -h now" ssho helmer113 "shutdown -h now" ssho helmer114 "shutdown -h now" ssho helmer115 "shutdown -h now" ssho helmer116 "shutdown -h now"

While logged in as root, I can simply do "source shutdownall.sh" and all the commands get executed without prompting me for a login password. Now I can turn my execution hosts on and off whenever I want, from wherever I want, allowing me to have them powered on only when I actually need them. Not only is my Helmer Linux cluster red, it's also green!

X. Other

The rest of the tinkering involved installing Python, numeric/scientific Python with ATLAS and LAPACK, then writing some scripts for submitting jobs -- especially array jobs -- to the SGE cluster, but this is another story. Building scientific Python from source was a bit of a pain, but still doable.

Any comments, questions, and suggestions are welcome!

-----------------------------------------------

This page last updated Feb 15, 2011