Installing Debian Linux, DRBL and the Sun Grid Engine (SGE)

This page documents how I installed Debian Linux, DRBL (Diskless Remote Boot in Linux) and the Sun Grid Engine (SGE) on my homemade "Helmer" Linux cluster in January, 2011. Before I begin, let me give my warm thanks to Padraig Kitterick for his awesome post on configuring a diskless SGE cluster. I must have visited his page a hundred times while setting up my own cluster. The information provided by Padraig is much too valuable to be lost in case this guy (or girl?) ever decides to stop blogging and his site disappears, so any info that I borrowed from Padraig, I will repeat here on this page for the sake of posterity.

Note that I had almost zero admin experience with Linux prior to this, and went through a lot of trial and error -- an important skill for survival in the wilderness of Geekdom -- before I got everything working. This means there is a lot of "this-worked-but-don't-ask-me-why" kind of stuff here, so please don't try to spank me if you find something stupid, but do kindly send me any suggestions/comments (then again, I may not mind the "spanking" part if you are or look like Kate Winslet). I first started with just two motherboards, using one as a master host and the other as an execution host. Once I figured everything out, I bought the rest of the equipment to put the whole cluster together. 


I chose Debian Linux for its stability. My master host has no CD/DVD-ROM or floppy disk drive, and I intend to keep it that way. To install Linux, I used UNetbootin to create a bootable USB flash drive, and booted my master host from the flash drive. The installation instructions were pretty straightforward, and NFS could be conveniently included in the installation.

However, one problem I encountered was that the installation program kept installing the kernel onto my flash drive instead of the local hard drive. I managed to work around this by returning to the main menu at the disk partition stage, unplugging my USB flash drive, then starting over from the "detect disk" stage. At this point, the installer had no choice but to install the kernel on my hard drive, since there was nowhere else to install it to!

When I did this, however, the installer designated my disks as "sdb" and "sdc" instead of "sda" and "sdb" which caused a problem later on but could be corrected. My disks were partitioned as follows:
   RAID0 device #0 - 480.1 GB software RAID device
   #1            480.1 GB     F   ext3 /
   scsi6 (0,0,0) (sdb) - 250.1 GB ATA ST3250318AS
   #1 primary     10.0 GB  B  F   ext2 /boot
   #5 logical    240.1 GB     K   raid
   scsi7 (0,0,0) (sdc) - 250.1 GB ATA ST3250410AS
   #6 logical    240.1 GB     K   raid
   #5 logical     10.0 GB     F   swap swap
Following some advice found on the Web, I set aside 10 GB in the first disk to store the boot information. The remaining 240 GB was used in conjunction with the other drive as a RAID0 array. The drive space needs to be equal for both drives in the RAID0 array, so I used the remaining 10 GB in the second drive as swap space. I don't know if this is a good way to partition Linux drives; I just read about someone who did this, and tried the same, and it worked, so there!

Another thing to note: For some reason, the "/boot" directory had to be installed on the first disk (sdb), not the second disk (sdc), else the system wouldn't boot, regardless of the BIOS setting.

Now for the "sdb" instead of "sda" problem. Upon booting the system, I would keep getting an error like
   fsck.ext2: attempt to read block from filesystem resulted in short read 
   while trying to open /dev/sdb1 could this be a zero-length partition?"
The OS would still boot and give me a command prompt, so I could easily fix this by correcting the /etc/fstab file. Essentially, the contents of "/etc/fstab" should be consistent with the output of "fdisk -l". On my system, "fdisk -l" gives me:
   Disk /dev/sda: 250.0 GB, 250059350016 bytes
   255 heads, 63 sectors/track, 30401 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes
   Disk identifier: 0x518a5189

      Device Boot      Start         End      Blocks   Id  System
   /dev/sda1   *           1        1216     9767488+  83  Linux
   /dev/sda2            1217       30401   234428512+   5  Extended
   /dev/sda5            1217       30401   234428481   fd  Linux raid autodetect

   Disk /dev/sdb: 250.0 GB, 250059350016 bytes
   255 heads, 63 sectors/track, 30401 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes
   Disk identifier: 0x00027204

      Device Boot      Start         End      Blocks   Id  System
   /dev/sdb1               1       30401   244196001    5  Extended
   /dev/sdb5               1        1216     9767457   82  Linux swap / Solaris
   /dev/sdb6            1217       30401   234428481   fd  Linux raid autodetect

   Disk /dev/md0: 480.1 GB, 480109395968 bytes
   2 heads, 4 sectors/track, 117214208 cylinders
   Units = cylinders of 8 * 512 = 4096 bytes
   Disk identifier: 0x00000000

   Disk /dev/md0 doesn't contain a valid partition table
I edited "/etc/fstab" so that it now gives me:
   # /etc/fstab: static file system information.
   #
   #                
   proc            /proc           proc    defaults        0       0
   /dev/md0        /               ext3    errors=remount-ro 0       1
   /dev/sda1       /boot           ext2    defaults        0       2
   /dev/sdb5       swap            swap    sw              0       0
   /dev/sdc1       /mnt/usb        auto    noauto,user,ro  0       0
I also added the entry for "/dev/sdc1" above for mounting USB hard drives.


Installing SSH on the master host is as easy as pie. Logged in as root, I just did
   apt-get install ssh
Next, I added myself as a samba user by:
   smbpasswd -a woojay
I also edited the "/etc/samba/smb.conf" file so that I can write files on my home directory from Windows:
   [homes]
      read only = no
Restarted samba by:
   /etc/init.d/samba restart
Now, for some reason, I found that I had to configure samba in order to be able to SSH to my master host using its machine name instead of IP address. Don't ask me why. Just don't.

With samba enabled, I can right-click on "My Computer" on my Windows XP machine and select the "map network drive" option to map a drive to my home directory on the master host. With SSH enabled, I can do everything from this point on using a PuTTY terminal window on my Windows machine. No more sitting in the closet for me!


The master host has two network interfaces, eth0, which is connected to the home network router (which is connected to my Windows PC and the Internet), and eth1, which is connected to the Gigabit switch and the 6 execution hosts.

I assigned 192.168.1.x to be all the stuff on my home network, and 192.168.2.x to be the internal network used by the Linux cluster. I set the address of eth0 to 192.168.1.200, for which the gateway is my router, 192.168.1.1. The address of eth1 is set to 192.168.2.1, and the host itself is also the gateway for this address range. All this is done by editing the "/etc/network/interfaces" file, which looks like this:
   # This file describes the network interfaces available on your system
   # and how to activate them. For more information, see interfaces(5).

   # The loopback network interface
   auto lo
   iface lo inet loopback

   # The primary network interface
   allow-hotplug eth0
   #iface eth0 inet dhcp
   iface eth0 inet static
    address 192.168.1.200
    netmask 255.255.255.0
    broadcast 192.168.1.255
    gateway 192.168.1.1
    dns-domain lan
    dns-nameservers 192.168.1.1

   allow-hotplug eth1
   iface eth1 inet static
    address 192.168.2.1
    netmask 255.255.255.0
    gateway 192.168.2.1
    metric 100 # prevent eth1 from being first default gateway
The trick above is to set the "metric" variable of eth1 to some high number so that it never becomes the default gateway over eth0. When I type "route" as root, I get the following output:
   Kernel IP routing table
   Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
   192.168.2.0     *               255.255.255.0   U     0      0        0 eth1
   192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
   default         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
   default         helmer-eth1     0.0.0.0         UG    100    0        0 eth1
Without the metric variable set, helmer-eth1 (or 192.168.2.1) sometimes appears above 192.168.1.1 in the list above. In that case, the master host becomes unable to access the Internet because the OS thinks that 192.168.2.1 -- which is just connected to the internal switch -- is also the gateway to the Internet.


Nothing too complicated here; I just followed the instructions on the DRBL website, choosing the Single System Image (SSI) mode as instructed by Padraig. Here's what my "/etc/hosts" file looked like after installation:
   127.0.0.1       localhost
   192.168.1.200   helmer
   # The following lines are desirable for IPv6 capable hosts
   ::1     localhost ip6-localhost ip6-loopback
   fe00::0 ip6-localnet
   ff00::0 ip6-mcastprefix
   ff02::1 ip6-allnodes
   ff02::2 ip6-allrouters
   ff02::3 ip6-allhosts
   192.168.2.1 helmer-eth1
   192.168.2.11 helmer111
   192.168.2.12 helmer112
   192.168.2.13 helmer113
   192.168.2.14 helmer114
   192.168.2.15 helmer115


I basically followed the installation instructions on the Grid Engine website to install qmaster via "./inst_sge -m". I used Padraig's specifications, which I am going to quote here:
  • Install as root user: You don't have to do this, but it simplifies the process of getting SGE to run on the DRBL nodes. Please note that in recommending this, I am presuming that your cluster network is private, and the nodes won’t be accessible by non-privileged users (i.e. not you).
  • Do opt to verify file permissions
  • Select to use BerkleyDB, but without a spool server
  • Use the ID range suggested in the manual: 20000-20100
  • Accept to install startup scripts
  • Accept to load a file which contains the hostnames of your nodes. Here you enter the full path to file you created before running the install script.
  • Use normal scheduling
I will just add that if the installer asks if you want to enable a JMX MBean server, you can answer no.

After installation, I ran:
   source /opt/oge/default/common/settings.sh
to configure various environment variables. I also added this command to my .bashrc file.

The "qconf -sh" command gave me:
   helmer
   helmer111
   helmer112
   helmer113
   helmer114
   helmer115
   helmer116
Above, "helmer" is my master host, and "helmer111" to "helmer116" are the dedicated execution hosts.

I then installed an execution host on the master host using "./inst_sge -x". Again, I used the following specifications from Padraig:
  • Specify a local spool dir: /var/tmp/spool
  • Accept to install startup scripts
The next step was to configure the default "all.q" queue, alotting 2 slots to "helmer", and 4 slots each to "helmer111"~"helmer116". I also wanted the default shell for jobs to be bash. This was done by executing "qconf -mq all.q" and editing the following lines:
   slots    26,[helmer=2],[helmer111=4],[helmer112=4],[helmer113=4], \
            [helmer114=4],[helmer115=4],[helmer116=4]
   tmpdir   /tmp
   shell    /bin/bash
I don't remember if I had to set all the execution hosts as administrative hosts, but here's the command for it anyway (do the same for helmer112 ~ helmer116):
   qconf -ah helmer111
Each execution host could be configured the same way as the master host. I simply ran the following commands (Padraig actually provides some scripts to automate this process, but I just ran them manually because they were simple enough):
   qconf -sconf helmer > helmer.conf
   cp helmer.conf helmer111
   cp helmer.conf helmer112
   cp helmer.conf helmer113
   cp helmer.conf helmer114
   cp helmer.conf helmer115
   qconf -Aconf helmer111
   qconf -Aconf helmer112
   qconf -Aconf helmer113
   qconf -Aconf helmer114
   qconf -Aconf helmer115
I designated the master host "helmer" as a submission host by:
   qconf -as helmer
Next, I configured DRBL so that each execution host will run the SGE service when booting:
   cp /opt/drbl/conf/client-extra-service.example /opt/drbl/conf/client-extra-service
   vi /opt/drbl/conf/client-extra-service
I edited "/opt/drbl/conf/client-extra-service" as follows:
   # You can put the extra service you want client to run, put them in one line.
   # The necessary services (nfs, firstboot, xfs...) for DRBL client are already
   # specified in the drblpush, so you do not have to add them.
   # Example:
   # service_extra_added="webmin apmd"
   #
   service_extra_added="sgeexecd.helmer"
I also added all the host names to the @allhosts group by running
   qconf -mhgrp @allhosts
and making the following edit:
   group_name @allhosts
   hostlist helmer helmer111 helmer112 helmer113 helmer114 helmer115
Now for the final stage, baby! Update DRBL using
   /opt/drbl/sbin/drblpush -i
or:
   /opt/drbl/sbin/drblpush -c /etc/drbl/drblpush.conf
Then reboot everything, AND WITNESS YOUR SPANKING NEW 26-CORE SGE CLUSTER IN ITS FULL GLORY. You can check that everything is up and running using the "qstat -f" and "qhost" commands.


To directly login to each execution host (for whatever reason) from the master host, I had to add some options to the "ssh" command. I aliased the whole command as "ssho" to avoid having to type it every time:
   alias ssho='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'
To SSH to "helmer111", I would then do
   ssho helmer111

I wanted to test the speed of the internal network for the cluster, and see how close to "Gigabit" the "Gigabit Ethernet" really is. This could be done by first installing "iperf":
   apt-get install iperf
I SSHed to "helmer112" and ran
   iperf -s
then from "helmer", I ran
   iperf -c helmer112
I got the following output:
   Client connecting to helmer112, TCP port 5001
   TCP window size: 16.0 KByte (default)
   ------------------------------------------------------------
   [  3] local 192.168.2.1 port 37762 connected with 192.168.2.12 port 5001
   [ ID] Interval       Transfer     Bandwidth
   [  3]  0.0-10.0 sec    963 MBytes    807 Mbits/sec
807 Mbits/sec seems close enough to 1 Gigabits.


To my pleasant surprise, allowing access to the master host via SSH from outside my home network turned out to be quite simple. All I needed to do was to configure my router so that port 22 gets forwarded to my master host's etho0, 192.168.1.200, and then set up a DynDNS account to associate my home IP address with a named address. I entered the DynDNS login information into my router so that it can update the IP address linked to the named address whenever the IP address changes (since it is dynamic, not static).


Another pleasant surprise was how easy it is to remotely turn on and turn off the execution hosts using WOL (Wake On Lan) functionality. I quickly discovered that my execution hosts often run idle for long periods of time, wasting electricity, if I depend only on the physical power switches to turn them on and off. The WOL function already seemed to be built into the execution hosts' boot image, and I didn't even need to change the BIOS settings on my ASUS P8-H67M LE mobos.

First, I needed to obtain the MAC addresses of each execution host. This could be done by logging into each execution host, running "ifconfig", and looking for the "HWaddr" variable under "eth0".

I gathered all the MAC addresses and created the following script file "wakeall.sh" (note that the MAC addresses below are fake):
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:01
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:02
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:03
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:04
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:05
   wakeonlan -i 192.168.2.255 ab:23:45:67:89:06
The trick above was to include the "-i 192.168.2.255" option, which I figured out after some trial and error, instead of using the default broadcast address "255.255.255.255". When using the default address, the "magic packets" did not seem to reach the execution hosts at all, and my guess is that the packets were sent to the default gateway 192.168.1.200 in eth0, which is connected to the home network. By specifying 192.168.2.255 as the address, the packets are now correctly sent to gateway 192.168.2.1 in eth1, which is connected to the internal switch.

Now all I need to do whenever I want to wake up the execution hosts is to simply run "./wakeall.sh". It's just so cool to sit at my desk and turn my head toward the closet (which is three feet away) to see all six LED's on my Helmer light up all at once when I enter this single command. Of course, I could always buy a $5 set of Christmas lights to obtain the same effect, but that wouldn't be much fun, would it?

To turn off the execution hosts, I created another script file "shutdownall.sh":
   ssho helmer111 "shutdown -h now"
   ssho helmer112 "shutdown -h now"
   ssho helmer113 "shutdown -h now"
   ssho helmer114 "shutdown -h now"
   ssho helmer115 "shutdown -h now"
   ssho helmer116 "shutdown -h now"
While logged in as root, I can simply do "source shutdownall.sh" and all the commands get executed without prompting me for a login password. Now I can turn my execution hosts on and off whenever I want, from wherever I want, allowing me to have them powered on only when I actually need them. Not only is my Helmer Linux cluster red, it's also green!


The rest of the tinkering involved installing Python, numeric/scientific Python with ATLAS and LAPACK, then writing some scripts for submitting jobs -- especially array jobs -- to the SGE cluster, but this is another story. Building scientific Python from source was a bit of a pain, but still doable.

Any comments, questions, and suggestions are welcome!

-----------------------------------------------
This page last updated Feb 15, 2011
shopify analytics