Clustering with Pacemaker, Corosync, and DRBD on Ubuntu 10.04

Requirements

  • Root access to the server.
  • Access to an up to date Ubuntu repository.
  • Source code that has been patched for pengine issues.

Section 1: Set up Crossover Cable for Pacemaker and DRBD traffic

For both nodes:

  1. On the second NIC of each cluster machine, connect a crossover cable.
  2. If the second NIC is not already configured, add something similar to this to the bottom of /etc/network/interfaces:
    auto eth1
    iface eth1 inet static
            address 192.168.200.3
            netmask 255.255.255.0
      
  3. Restart the networking service: /etc/init.d/networking restart
  4. Check to see if the new interface is up: ifconfig -a

Section 2: Pacemaker, Corosync Install and Initial Configuration

    For both nodes:

  1. Update the repository DB: aptitude update
  2. aptitude install install build-essential pacemaker corosync
  3. Pacemaker version 1.0.8+hg15494-2ubuntu2 has pengine flaws that can cause the cluster to crash.  Check to see what is installed, and follow instructions to create a debian package using patched source if needed.
  4. Create /etc/corosync/corosync.conf.  (See attached example.)
  5. Make sure that the crossover connection is the first connection in the config.  This will help prevent the connection from not coming up, and both nodes being assigned the same node number based on the NIC IP address.
  6. Create /etc/corosync/authkey and copy it to all cluster nodes: corosync-keygen
  7. Verify that /etc/corosync/authkey is owner=root, group=root, and 400 on all nodes.
  8. Put this line in /etc/rc.local:  /usr/sbin/corosync-cfgtool -r
  9. Change the entry in /etc/default/corosync from no to yes to allow corosync to start at boot.
  10. Restart server: sync;sync;init 6
  11. Check cluster status: crm_mon

Section 3: DRBD and Filesystem Configuration

  1. Create a partition, leaving space at the end of the disk for metadata: cfdisk /dev/<partition>
  2. Update the repository DB: aptitude update
  3. Install Linux headers: aptitude install linux-headers-<kernel-rev-server>
  4. Install DRDB: aptitude install drbd8-utils
  5. Edit the DRBD config: vi /etc/drbd.conf
  6. Update the "r0" "r1" resources, etc with the required information.  (Hostname, ip assigned to the crossover link at each end, device that is to be mirrored.)  Make sure that the meta data type is correct.
  7. Create the resource - both nodes: drbdadm create-md <resource_name or all>
  8. Attach the resource - both nodes : drbdadm attach <resource_name or all>
  9. Connect the resources - both nodes: drbdadm connect <resource_name or all>
  10. chgrp haclient /sbin/drbdsetup 
  11. chmod o-x /sbin/drbdsetup 
  12. chmod u+s /sbin/drbdsetup 
  13. chgrp haclient /sbin/drbdmeta 
  14. chmod o-x /sbin/drbdmeta 
  15. chmod u+s /sbin/drbdmeta
  16. Decide which node is primary - this will be the one that has current data that needs to be retained.  (If any.)
  17. Start synchronization - primary node only: drbdadm -- --overwrite-data-of-peer primary <resource_name or all> 
  18. Check syncronization - both nodes: while [ 1 ]; do cat /proc/drbd; sleep 5; done
  19. On one node make the disk set(s) available: drbdadm primary all
  20. mkfs -t ext4 /dev/drbd<x>
  21. mkdir </mountpoint>

Section 4: Configure IPMI (via ILO on HP DL360 G6s Running Ubuntu)

  1. Choose an IP address for the ILO interface.
  2. Connect the ILO interface to the network.
  3. While the server is booting, press a key to see the boot messages when prompted.
  4. Press F8 to enter the ILO configuration when prompted.
  5. Arrow over to the User menu; create a new user and password will full access, then save.
  6. Arrow over to Network, then to DNS/DHCP.
  7. Change DHCP Enable to OFF by pressing the spacebar.
  8. Exit ILO, allow server to boot.
  9. Reboot server - when server is coming back up, go back to the ILO configuration.
  10. Select network, put in the IP address, netmask, and gateway.
  11. Exit ILO, allow server to boot.
  12. Put in the HP Agent CDROM
  13. mount /dev/cdrom /cdrom/
  14. apt-cdrom -m -d=/cdrom add
  15. aptitude update
  16. Install HP Agents: aptitude install hpacucli hp-health hponcfg hp-snmp-agents hpsmh
  17. Install IPMI packages: aptitude install ipmitool openipmi openhpi-plugin-ipmidirect openhpi-plugin-ipmi libopenipmi0
  18. If you see this error, follow the next step.: ipmievd: using pidfile /var/run/ipmievd.pid0
    Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
    Unable to open interface
    invoke-rc.d: initscript ipmievd, action "start" failed.
    Unable to start ipmievd during installation.  Trying to disable.
  19. Download ipmi.init.basic (available on this page), and run it:./ipmi.init.basic
  20. Test: /usr/bin/ipmitool -I lanplus -U <ILO_USERNAME> -H <IP_to_ILO> -a chassis power status

Section 4: Create Pacemaker Resources

Configure STONITH Resources


This is an example for IPMI over HP ILO.
  1. crm
  2. configure
  3. primitive <stonith_server-1> stonith:external/ipmi \
  4. params hostname="<LNX-SERVER-1>" ipaddr="<191.168.1>" userid="<stonith>" passwd="<yourpass>" interface="lanplus" \
  5. op monitor interval="15" timeout="15"
  6. location <stonith_server-1_loc> <stonith_server-1> rule -inf: \#uname ne <LNX-SERVER-2>
  7. If no errors, then exit, and say yes to commiting changes.

Configure 2 Node Cluster to Fail Over Properly via CRM shell
  1. crm
  2. configure
  3. property no-quorum-policy=ignore 
  4. commit

Configure DRBD Resource
  1. Remove /etc/rc scripts for DRBD: update-rc.d -f drbd remove
  2. crm
  3. configure
  4. primitive <drbd0_rsc> ocf:linbit:drbd \
  5. params drbd_resource="<filesystem_name>" \
  6. op monitor interval="15s" \
  7. op start timeout="240" \
  8. op stop timeout="100"
    ms <ms-drbd0> <drbd0_rsc> \
  9. meta master-max="1" master-node-max="1" \
  10. clone-max="2" clone-node-max="1" \
  11. notify="true"
  12. location <ms-drbd0-master-on-SERVER-1> <ms-drbd0> \
  13. rule $id="<ms-drbd0-master-on-SERVER-1-rule>" $role="master" 100: #uname eq <SERVER-1>
  14. If no errors, then exit, and say yes to commiting changes.


ċ
corosync.conf
(2k)
Cubicle Graffiti,
May 9, 2011, 1:36 PM
ċ
ipmi.init.basic
(0k)
Cubicle Graffiti,
Sep 13, 2011, 12:53 PM
Comments