HA NFS

<UNDER CONSTRUCTION!!!>

Notes for implementing HA NFS on a Debian based Xen paravirtual client running under OpenSuSE 11.0.

I wanted to model an HA NFS solution using my OpenSuSE 11.0 system with a Debian Xen paravirtual client. As my home system does not have Hardware Virtualization support in the CPU, I'm forced to use paravirtual. The challenge was getting a Debian Xen client. I settled for installing Debian 4.0 i386 into a VMWare instance, and installed the Xen kernel and initrd files. I then copied the installation over to my OpenSuSE system.

On the OpenSuse system, I created a file-based loop-back filesystem and copied the Debian root into it. I then created a file-based swapfile, and custom created the Xen configuration file. I also copied the Debian Xen kernel and initrd files to the host filesystem. My directory structure looks like:

    • /opt/xen/images

    • /opt/xen/images/deb01 --> this is the top of my Debian Xen client directory tree

    • /opt/xen/images/deb01/root.img --> this is the client root filesystem (6GB)

    • /opt/xen/images/deb01/swap.img --> this is the client swap file (1GB)

    • /opt/xen/images/deb01/opt.img --> Work space (2GB)

    • /opt/xen/images/deb01/initrd.img-2.6.18-6-xen-686 --> Debian Xen initrd

    • /opt/xen/images/deb01/vmlinuz-2.6.18-6-xen-686 --> Debian Xen kernel

Here is my Xen client configuration file, named "deb01":

memory = "256"

name = "deb01"

vif = ['mac=00:16:3E:17:35:50' ]

disk = ['file:/opt/xen/images/deb01/root.img,sda1,w', 'file:/opt/xen/images/deb01/swap.img,sdb1,w', 'file:/opt/xen/images/deb01/opt.img,sdd1,w' ]

root = "/dev/sda1 ro"

kernel="/opt/xen/images/deb01/vmlinuz-2.6.18-6-xen-686"

ramdisk="/opt/xen/images/deb01/initrd.img-2.6.18-6-xen-686"

boot = "n"

This client is named "deb01" and is used to create my other clients. The xen-tools was installed on deb01, and this package was used to create the client images. My host system uses directory /opt/xen to hold my xen clients and support files. This directory (/opt/xen) was NFS exported and mounted on deb01 at /opt/xen.

The architecture for an HA NFS system is as follows:

    • Base Debian 4.0 Xen paravirtual client with:

      • LVM2

      • Heartbeat

      • DRBD

      • NFS Kernel Server

      • ntpdate & ntpd

    • 2 Systems:

      • The names and addresses for these machines:

        • nfs1.tsand.org, 192.168.224.190/24

        • nfs2.tsand.org, 192.168.224.191/24

      • A floating address is needed to represent the NFS service. It's address is:

        • nfsc.tsand.org, 192.168.224.199/24

      • Both systems will have the following disk arrangement:

        • /dev/sda1 --> root filesystem

        • /dev/sda2 --> swap file

        • /dev/sdb1 --> DRBD meta-disk

        • /dev/sdb2 --> LVM2 data disk 1

        • /dev/sdb3 --> LVM2 data disk 2

The DRBD software needed to be compiled against the running kernel. This will require the installation of the Kernel headers for the running kernel, besides all the necessary kernel module build tools. I went thru the module build process once, installed the drbd module, and pre-configured the nfs HA system. I then shutdown the instance and tarballed the root filesystem. The xen-tools package contains the xen-create-image program that will pre-provision an image based on the contents of a given tarball. This greatly simplifies client configuration as you are able to pass system specific data, such as hostname and IP Address to the program and have those setting automatically applied to the newly created client image.

Here's my simple script to create an HA NFS Xen client image: (called /opt/xen/bin/new_node)

#! /bin/bash

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

# Expected arguments:

# $1 == FQDN of client

# $2 == IP Address of client

#

PNAME=$(basename $0)

FQDN=${1?Must declare FQDN}

HSTNAME=${FQDN%%.*}

IP=${2?Must declare IP Address}

TMP1=$(mktemp /tmp/${PNAME}.XXXXXX)

echo ">> Execute xen-create-image"

xen-create-image \

--force \

--verbose \

--arch=i386 \

--dist=etch \

--fs=ext3 \

--image=sparse \

--initrd=/opt/xen/kernel/initrd.img-2.6.18-6-xen-686 \

--kernel=/opt/xen/kernel/vmlinuz-2.6.18-6-xen-686 \

--memory=128Mb \

--size=1Gb \

--swap=128Mb \

--tar=/opt/xen/tar/deb4_minimal.tgz \

--gateway=192.168.224.1 \

--netmask=255.255.255.0 \

--ip=$IP \

--dir=/opt/xen \

--role=builder \

--hostname=$FQDN

#######

cp /etc/xen/${FQDN}.cfg /opt/xen/domains/${FQDN}/${HSTNAME}.cfg

cd /opt/xen/domains/$FQDN

sed "s/^name .*\$/name = '$HSTNAME'/g" ${HSTNAME}.cfg > $TMP1

sed "s/^vif .*\$/vif = [ 'mac=$(/opt/xen/bin/new_mac),ip=$IP' ]/g" $TMP1 > ${HSTNAME}.cfg

exit 0

I also used a simple bash script that calls python to generate a MAC address, based on the Xen mac address rules. Here is that script, located in /opt/xen/bin and called new_mac:

#! /bin/bash

python -c 'import random; r=random.randint; print "00:16:3E:%02X:%02X:%02X" % (r(0, 0x7f), r(0, 0xff), r(0, 0xff))'

I mentioned before that I tarballed the deb01 instance root disk. This is passed to the xen-create-image script. This way I speed up the creation of new clients. This also reduces the configuration time as the drbd module is already compiled and installed. The only thing I have left to do once the script is finished, is to start up the new client.

My configuration for DRBD is:

    • /etc/drbd.conf

resource r0 {

protocol C;

incon-degr-cmd "halt -f";

disk {

on-io-error panic;

}

syncer {

rate 10M; # Note: 'M' is MegaBytes, not MegaBits

}

on nfs1.tsand.org {

device /dev/drbd0;

disk /dev/vg0/lv1;

address 192.168.224.190:7789;

meta-disk /dev/sdb1[0];

}

on nfs2.tsand.org {

device /dev/drbd0;

disk /dev/vg0/lv1;

address 192.168.224.191:7789;

meta-disk /dev/sdb1[0];

}

}

    • I used LVM2 for the backing store for /dev/drbd0. I did this because I wanted to be able to resize the filesystem.

    • The steps I used to create the LVM2 volume:

      1. pvcreate /dev/sdb2

      2. vgcreate vg0 /dev/sdb2

      3. lvcreate -L 128M -n lv1 vg0

    • I can always add the other physical volume (/dev/sdb3) at a later time with:

      • vgextend vg0 /dev/sdb3

    • Notice that I put the meta-disk data for DRBD onto a separate disk. As I wanted to be able to resize the lv1 volume, and that if I specified "internal" for the meta-disk, the meta-data would have been placed at the end (the last 128MB) of lv1. Since lv1 will hold a filesystem, I didn't want to deal with corrupting and messing with filesystem rebuilds. So to keep things simple I just used a separate meta-disk.

    • As for a filesystem on lv1, I needed something that would allow online resizing. Some filesystem will allow online resize. Ext3 is NOT one of them. Correction: I've found with various Linux releases, some would support online resize with ext3 while others would not. Could have been related to the Debian Sarge release (4.0) I was running. However, reiserfs does allow online grow, but not shrink. I find that acceptable as you typically will not shrink a filesystem. I want to emphasis that this is a nice feature. I will be NFS exporting the /dev/vg0/lv1 volume, and I can grow the filesystem WHILE nfs clients have it mounted, and they will pick up the new filesystem size "on-the-fly", while it is mounted. This is important, since I am building a Highly Available NFS server.

    • Create a reiser filesystem on the volume. But since DRBD is active, you put the filesystem on the /dev/drb0 device, as follows:

      • mkreiserfs /dev/drbd0

    • Now for the heartbeat software, here are my configuration files:

    • /etc/ha.d/ha.conf:

logfacility local0

keepalive 2

deadtime 10

bcast eth0

node nfs1.tsand.org nfs2.tsand.org

    • /etc/ha.d/haresources:

nfs1.tsand.org IPaddr::192.168.224.199/24/eth0 \

drbddisk::r0 \

Filesystem::/dev/drbd0::/data::reiserfs \

killnfsd \

nfs-common \

nfs-kernel-server \

Notice \

Delay::3::0

    • NOTE: Some of the online tutorials I found that outlined this process had put the IPaddr entry at the end of the haresources entry. This causes problems for portmap, and you'll see errors in syslog that indicate "error -5 cannot contact portmap". You need to make the IPaddr entry first for the tasks to do for a heartbeat take-over.

    • /etc/ha.d/resource.d/killnfsd:

killall -9 nfsd ; exit 0

    • Since this is an NFS service, we need to remove the NFS startup services.

    • update-rc.d -f nfs-kernel-server remove

    • update-rc.d -f nfs-common remove

    • We need to update the /etc/default/nfs-common STATDOPTS and set the name for the NFS service. We'll use the floating IP address for this:

    • /etc/default/nfs-common:

# If you do not set values for the NEED_ options, they will be attempted

# autodetected; this should be sufficient for most people. Valid alternatives

# for the NEED_ options are "yes" and "no".

# Options for rpc.statd.

# Should rpc.statd listen on a specific port? This is especially useful

# when you have a port-based firewall. To use a fixed port, set this

# this variable to a statd argument like: "--port 4000 --outgoing-port 4001".

# For more information, see rpc.statd(8) or http://wiki.debian.org/?SecuringNFS

STATDOPTS="-n nfsc"

# Some kernels need a separate lockd daemon; most don't. Set this if you

# want to force an explicit choice for some reason.

NEED_LOCKD=

# Do you want to start the idmapd daemon? It is only needed for NFSv4.

NEED_IDMAPD=

# Do you want to start the gssd daemon? It is required for Kerberos mounts.

NEED_GSSD=