Software Appliances

<under construction>

A software appliance is nothing more than a system image designed to run on a virtual system. There are already a couple of web sites that have pre-configured images that are designed for running under vmware, xen and virtualbox. The objective is to make setting up a system as fast as possible, and as simple as possible. I've created several "software appliance" images, each with a particular purpose. For example:

    • subversion and cvs server

    • web (apache2) cluster

    • nfs HA server

    • mysql HA server

    • configuration management HA server (puppet)

    • HA load balancer (pound)

What's particularly nice about a software appliance, is that it could easily be imaged onto a physical server. This would fit well into the idea of making a dedicated purpose computer into an appliance, something like being able to buy a "web-server appliance" from your local computer store. Cisco is already looking into creating a general purpose "appliance". My goal, is to go a step further, and create a physical appliance to host virtual appliances. Imagine being able to buy an off-the-shelf "box" that has a compact-flash or usb slot. You purchase the "personality image" you desire (i.e. web-cluster) on a CF stick, and just plug it into your "box". Neat, huh?

So the first step is to create the software appliance. The OpenSuSE group has created an exceptional tool for doing just that. You must check out the SuSE Studio website. http://susestudio.com/ A couple of years back, SuSE introduced the concept of software "stacks"...the same thing. RedHat is just starting to look at that, but are far behind what SuSE offers with SuSE Studio. You can create an image that is virtual (i.e. xen), or phyical (hard drive), or fits on a USB stick. You can select what packages you want. You can customize with your own banners & images, besides software configuration. And, as a bonus, if you create a virtual image, SuSE Studio can "host" and run the image for you; nice for evaluation and testing.

But I want more that just a simple software appliance, running on a single server. I need clustering, high-availability. So I've been working with Heartbeat2, glusterfs, drbd8 and debian under Xen to create some models of what I consider useful appliances.

I'll be adding more information about sw-app's, but here's a diagram of a simple web-cluster, with MySQL server, Pound Load-balancer, and Puppet configuration manager, all using glusterfs as a raid-1 n-way cluster.

Basic Design:

    • 3 virtual nodes.

    • Each VM is Debian 5.0 (lenny), with "experimental" added for latest version of puppetmaster.

    • Each VM is a Xen Paravirtual image.

    • Each image has 4 disks, disk.img (2GB, root), sdb1.img & sdb2.img (LVM2 PV's, each 2GB), swap.img (swap. 128MB)

    • Each VM has 256MB ram.

    • Each VM has 1 NIC. I also installed 8021.q VLAN/Trunking support. The network architecture above is flat, my latest design is to separate the traffic out using a vlan. 4 vlan's configured.

    • The base VM's are n1.tsand.org, n2.tsand.org & n3.tsand.org

    • The base VM's run a heartbeat2 cloned service, apache2. This means that apache2 is expected to run on all VM's.

    • 3 floating (singleton) services will float over the cluster, puppetmaster, pound & mysql.

    • Heartbeat2 handles services failover & monitoring. There's also a nice heartbeat2-gui available for GUI based management.

    • The cluster filesystem (glusterfs) is a raid-1 with 3-way mirroring. The backing store is LVM2. I used LVM2 for ease of growing filesystems. That's the nice thing about glusterfs; it sits on top of your existing filesystem and disk structures. Here's a nice article in Linux Magazine about glusterfs http://www.linux-mag.com/id/7833/

    • I've put the apache2, mysql, pound, and puppet configurations on gluster filesystems, thus every node can see their configurations. Thus I have no need to "failover" a filesystem. Nice! All gluster filesystems mount on all nodes, always, regardless of the service that runs on that node.

    • I also installed dsh (dancer shell/distributed shell) across all nodes, and exchanged root public keys. Ya, I know, root ssh. The key to securing that is to use /etc/security/access.conf to restrict exactly where root can ssh from. That helps to tighten up security. As long as root can ssh between virtual nodes; all is good. Then you just create an admin account that can ssh in from external, then su to root. Dsh is nice because it is used to execute commands across a cluster of systems, Excellent for installing packages, etc.

    • I also installed webmin. Created a special webmin "admin" account for external web-based management. My goal is to create an appliance, and most users are accustomed to using a web-browser to configure their home wifi, etc. Webmin isn't a perfect solution, but it gets me closer to secure web-based administration.

    • I also install monit for monitoring. Need to work out the details on exactly what to monitor. I don't want to monitor a service (such as mysql) on a node that isn't running it. Still working on that task.

    • And finally, I'm able to run all this on a little piece of junk EMachines Sempron 3000+ with only 2GB of ram and 500GB ide disk. Not real fast, but I don't have clock drift issues, and performance is reasonable for prototyping & proof-of-concept.

Next Version:

This next diagram depicts my further design changes on the web-cluster architecture:

This diagram represents a view of the 3 virtual nodes, used to implement the cluster:

This diagram shows the disks allocated on the 3 virtual nodes:

The following is the /etc/glusterfsd.vol file on all three servers:

# /etc/glusterfs/glusterfsd.vol

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

#########################################################

## vsftpd volume

volume vsftpd_posix

type storage/posix

option directory /data/vsftpd

end-volume

volume vsftpd_locks

type features/locks

subvolumes vsftpd_posix

end-volume

volume vsftpd_brick

type performance/io-threads

option thread-count 8

subvolumes vsftpd_locks

end-volume

#########################################################

#########################################################

## phpmyadmin volume

volume phpmyadmin_posix

type storage/posix

option directory /data/phpmyadmin

end-volume

volume phpmyadmin_locks

type features/locks

subvolumes phpmyadmin_posix

end-volume

volume phpmyadmin_brick

type performance/io-threads

option thread-count 8

subvolumes phpmyadmin_locks

end-volume

#########################################################

#########################################################

## dbdata volume

volume dbdata_posix

type storage/posix

option directory /data/dbdata

end-volume

volume dbdata_locks

type features/locks

subvolumes dbdata_posix

end-volume

volume dbdata_brick

type performance/io-threads

option thread-count 8

subvolumes dbdata_locks

end-volume

#########################################################

#########################################################

## mysql volume

volume mysql_posix

type storage/posix

option directory /data/mysql

end-volume

volume mysql_locks

type features/locks

subvolumes mysql_posix

end-volume

volume mysql_brick

type performance/io-threads

option thread-count 8

subvolumes mysql_locks

end-volume

#########################################################

#########################################################

## global volume

volume global_posix

type storage/posix

option directory /data/global

end-volume

volume global_locks

type features/locks

subvolumes global_posix

end-volume

volume global_brick

type performance/io-threads

option thread-count 8

subvolumes global_locks

end-volume

#########################################################

#########################################################

## pound volume

volume pound_posix

type storage/posix

option directory /data/pound

end-volume

volume pound_locks

type features/locks

subvolumes pound_posix

end-volume

volume pound_brick

type performance/io-threads

option thread-count 8

subvolumes pound_locks

end-volume

#########################################################

#########################################################

## www volume

volume www_posix

type storage/posix

option directory /data/www

end-volume

volume www_locks

type features/locks

subvolumes www_posix

end-volume

volume www_brick

type performance/io-threads

option thread-count 8

subvolumes www_locks

end-volume

#########################################################

#########################################################

## apache2 volume

volume apache2_posix

type storage/posix

option directory /data/apache2

end-volume

volume apache2_locks

type features/locks

subvolumes apache2_posix

end-volume

volume apache2_brick

type performance/io-threads

option thread-count 8

subvolumes apache2_locks

end-volume

#########################################################

#########################################################

## puppet volume

volume puppet_posix

type storage/posix

option directory /data/puppet

end-volume

volume puppet_locks

type features/locks

subvolumes puppet_posix

end-volume

volume puppet_brick

type performance/io-threads

option thread-count 8

subvolumes puppet_locks

end-volume

#########################################################

#########################################################

## nginx volume

volume nginx_posix

type storage/posix

option directory /data/nginx

end-volume

volume nginx_locks

type features/locks

subvolumes nginx_posix

end-volume

volume nginx_brick

type performance/io-threads

option thread-count 8

subvolumes nginx_locks

end-volume

#########################################################

volume server

type protocol/server

option transport-type tcp

option auth.addr.puppet_brick.allow 192.168.224.*

option auth.addr.nginx_brick.allow 192.168.224.*

option auth.addr.apache2_brick.allow 192.168.224.*

option auth.addr.www_brick.allow 192.168.224.*

option auth.addr.pound_brick.allow 192.168.224.*

option auth.addr.global_brick.allow 192.168.224.*

option auth.addr.mysql_brick.allow 192.168.224.*

option auth.addr.dbdata_brick.allow 192.168.224.*

option auth.addr.phpmyadmin_brick.allow 192.168.224.*

option auth.addr.vsftpd_brick.allow 192.168.224.*

subvolumes puppet_brick nginx_brick apache2_brick www_brick pound_brick global

_brick mysql_brick dbdata_brick phpmyadmin_brick vsftpd_brick

end-volume

The following is the /etc/glusterfs_apache2.vol. This file is the same for all volumes (puppet,www,dbdata,mysql,global,pound,phpmyadmin,nginx,vsftpd,apache2) with the volume name substituted:

# /etc/glusterfs/glusterfs_apache2.vol

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

volume remote1

type protocol/client

option transport-type tcp

option remote-host n1.tsand.org

option remote-subvolume apache2_brick

end-volume

volume remote2

type protocol/client

option transport-type tcp

option remote-host n2.tsand.org

option remote-subvolume apache2_brick

end-volume

volume remote3

type protocol/client

option transport-type tcp

option remote-host n3.tsand.org

option remote-subvolume apache2_brick

end-volume

volume replicate

type cluster/replicate

subvolumes remote1 remote2 remote3

end-volume

volume writebehind

type performance/write-behind

option window-size 1MB

subvolumes replicate

end-volume

Here's the /etc/fstab, this is the same on all cluster nodes:

# /etc/fstab: static file system information.

#

# <file system> <mount point> <type> <options> <dump> <pass>

proc /proc proc defaults 0 0

/dev/sda1 none swap sw 0 0

/dev/sda2 / ext3 noatime,nodiratime,errors=remount-ro 0 1

/dev/n1_vg0/puppet /data/puppet ext3 defaults 0 1

/dev/n1_vg0/nginx /data/nginx ext3 defaults 0 1

/dev/n1_vg0/apache2 /data/apache2 ext3 defaults 0 1

/dev/n1_vg0/www /data/www ext3 defaults 0 1

/dev/n1_vg0/pound /data/pound ext3 defaults 0 1

/dev/n1_vg0/global /data/global ext3 defaults 0 1

/dev/n1_vg0/mysql /data/mysql ext3 defaults 0 1

/dev/n1_vg0/dbdata /data/dbdata ext3 defaults 0 1

/dev/n1_vg0/phpmyadmin /data/phpmyadmin ext3 defaults 0 1

/dev/n1_vg0/vsftpd /data/vsftpd ext3 defaults 0 1

/etc/glusterfs/glusterfs_puppet.vol /etc/puppet glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_nginx.vol /etc/nginx glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_apache2.vol /etc/apache2 glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_www.vol /var/www glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_pound.vol /etc/pound glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_global.vol /usr/global glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_mysql.vol /etc/mysql glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_dbdata.vol /var/lib/mysql glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_phpmyadmin.vol /etc/phpmyadmin glusterfs defaults,direct-io-mode=disable 0 0

/etc/glusterfs/glusterfs_vsftpd.vol /var/vsftpd glusterfs defaults,direct-io-mode=disable 0 0

Here's some scripts I use to automatically create the glusterfs files. The input to the main program is called "input", and

here is it's content:

global /data/global 192.168.224.* /usr/global n1.tsand.org n2.tsand.org n3.tsand.org

puppet /data/puppet 192.168.224.* /etc/puppet n1.tsand.org n2.tsand.org n3.tsand.org

mysql /data/mysql 192.168.224.* /etc/mysql n1.tsand.org n2.tsand.org n3.tsand.org

dbdata /data/dbdata 192.168.224.* /var/lib/mysql n1.tsand.org n2.tsand.org n3.tsand.org

vsftpd /data/vsftpd 192.168.224.* /etc/vsftpd n1.tsand.org n2.tsand.org n3.tsand.org

www /data/www 192.168.224.* /var/www n1.tsand.org n2.tsand.org n3.tsand.org

pound /data/pound 192.168.224.* /etc/pound n1.tsand.org n2.tsand.org n3.tsand.org

apache2 /data/apache2 192.168.224.* /etc/apache2 n1.tsand.org n2.tsand.org n3.tsand.org

Structure is column based:

volume-name source-directory-mount authorized-ipaddr dest-directory-mount cluster-host(s) ...

The program is called "doit" (okay, I'm not creative on the names of my utilities).

It's a ruby program that uses templates to generate the /etc/glusterfs/glusterfsd.vol and all the associated /etc/glusterfs/glusterfs_name.vol files.

Here's the program:

#! /usr/bin/ruby

# vi:set nu ai aw ap smd showmatch tabstop=4 shiftwidth=4:

require "erb"

# Create template.

template = %q{# /etc/glusterfs/glusterfsd.vol

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

##### automatically generated by Ruby ERB ########

##### created on <% time = Time.new; out = time.inspect %><%= out %>

<% i = 0; while ( i < VOL.length ); %>

############################################

## <%= VOL[i] %> volume

volume <%= VOL[i] %>_posix

type storage/posix

option directory <%= DIR[i] %>

end-volume

volume <%= VOL[i] %>_locks

type features/locks

subvolumes <%= VOL[i] %>_posix

end-volume

volume <%= VOL[i] %>_brick

type performance/io-threads

option thread-count 8

subvolumes <%= VOL[i] %>_locks

end-volume

############################################

<% i += 1 %><% end %>

volume server

type protocol/server

option transport-type tcp<% i = 0; while ( i < VOL.length ); %>

option auth.addr.<%= VOL[i] %>_brick.allow <%= IP[i] %><% i += 1 %><% end %>

subvolumes <% VOL.each do |vol| %><%= vol %>_brick <% end %>

end-volume

}.gsub(/^ /, '')

message = ERB.new(template, 0, "%<>")

count = 0

VOL = []

DIR = []

IP = []

MOUNT = []

HOSTS = []

input = File.new("input","r")

while (line=input.gets)

entry = line.split(" ")

VOL[count] = entry[0]

DIR[count] = entry[1]

IP[count] = entry[2]

MOUNT[count] = entry[3]

HOSTS[count] = entry[4...entry.size]

count += 1

end

fileout = File.new("glusterfsd.vol","w")

fileout.write(message.result())

fileout.close()

### now create client files

template = %q{# /etc/glusterfs/glusterfs_<%= $VOLNAME %>.vol

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

##### automatically generated by Ruby ERB ########

##### created on <% time = Time.new; out = time.inspect %><%= out %>

##############################################

<% $RHOST.each do |rhost| %>

volume <%= rhost %>

type protocol/client

option transport-type tcp

option remote-host <%= rhost %>

option remote-subvolume <%= $VOLNAME %>_brick

end-volume

<% end %>

volume replicate

type cluster/replicate

subvolumes <% $RHOST.each do |rhost| %><%= rhost %> <% end %>

end-volume

volume writebehind

type performance/write-behind

option window-size 1MB

subvolumes replicate

end-volume

##############################################

}.gsub(/^ /, '')

message = ERB.new(template, 0, "%<>")

i = 0

while ( i < VOL.length )

fname = "glusterfs_" + VOL[i] + ".vol"

fileout = File.new(fname,"w")

$VOLNAME = VOL[i]

$RHOST = HOSTS[i]

fileout.write(message.result())

fileout.close()

i += 1

end

This program will create all the glusterfs files in the current directory. The "input" file is also expected to be in the current directory

All cluster filesystems mount at boot time. There is no failover action on clustered filesystems. This reduces failover time, and makes it possible to edit filesystem content on any node, regardless of the associated service running on that node.

The versions of glusterfs server and client (and fuse module) are:

n1:~# dpkg -l |grep gluster

ii glusterfs-client 3.0.5-1 clustered file-system (client package)

ii glusterfs-server 3.0.5-1 clustered file-system (server package)

ii libglusterfs0 3.0.5-1 GlusterFS libraries and translator modules

n1:~# dpkg -l |grep fuse

ii fuse-utils 2.7.4-1.1+lenny1 Filesystem in USErspace (utilities)

ii libfuse2 2.7.4-1.1+lenny1 Filesystem in USErspace library

n1:~#

The gluster and fuse packages were obtained from the Debian Lenny testing distribution.

These are installed on all three nodes.

The /etc/modules file contains the following:

n1:~# cat /etc/modules

# /etc/modules: kernel modules to load at boot time.

#

# This file contains the names of kernel modules that should be loaded

# at boot time, one per line. Lines beginning with "#" are ignored.

# Parameters can be specified after the module name.

fuse

8021q

Update: 12/15/2010

I've been rather pleased with how the cluster architecture has been going. One item that I've been thinking about is how to handle the disaster recovery (DR) site. As we can't extend the cluster filesystem over the wan, I needed a way of "replicating" the volumes to the DR site. I also needed that replication to run, regardless of what host in the cluster is functional.

I solved the problem by configuring another heartbeat service, called rsyncd. It's a singleton service, and binds to a floating ip-address, also managed by heartbeat. The client systems (DR hosts) will pull data (replicate) from the floating rsyncd service running on the cluster.

I only had to install the rsync package on all cluster hosts. As this package can run on any cluster node, I put the rsyncd.conf file into a cluster directory, /usr/global/etc/rsyncd.conf. This will ensure that where ever the rsyncd service runs, it will get the same configuration file.

On Debian, we have a default configuration file for the rsync RC script. Here is my /etc/default/rsync configuration file:

# defaults file for rsync daemon mode

# start rsync in daemon mode from init.d script?

# only allowed values are "true", "false", and "inetd"

# Use "inetd" if you want to start the rsyncd from inetd,

# all this does is prevent the init.d script from printing a message

# about not starting rsyncd (you still need to modify inetd's config yourself).

#RSYNC_ENABLE=false

RSYNC_ENABLE=true

# which file should be used as the configuration file for rsync.

# This file is used instead of the default /etc/rsyncd.conf

# Warning: This option has no effect if the daemon is accessed

# using a remote shell. When using a different file for

# rsync you might want to symlink /etc/rsyncd.conf to

# that file.

# RSYNC_CONFIG_FILE=

RSYNC_CONFIG_FILE=/usr/global/etc/rsyncd.conf

# what extra options to give rsync --daemon?

# that excludes the --daemon; that's always done in the init.d script

# Possibilities are:

# --address=123.45.67.89 (bind to a specific IP address)

# --port=8730 (bind to specified port; default 873)

RSYNC_OPTS='--address=192.168.224.171'

# run rsyncd at a nice level?

# the rsync daemon can impact performance due to much I/O and CPU usage,

# so you may want to run it at a nicer priority than the default priority.

# Allowed values are 0 - 19 inclusive; 10 is a reasonable value.

RSYNC_NICE='10'

# run rsyncd with ionice?

# "ionice" does for IO load what "nice" does for CPU load.

# As rsync is often used for backups which aren't all that time-critical,

# reducing the rsync IO priority will benefit the rest of the system.

# See the manpage for ionice for allowed options.

# -c3 is recommended, this will run rsync IO at "idle" priority. Uncomment

# the next line to activate this.

# RSYNC_IONICE='-c3'

RSYNC_IONICE='-c3'

# Don't forget to create an appropriate config file,

# else the daemon will not start.

As this file is located in /etc/default, and considering that we do not put /etc/default into a cluster volume, I use puppet to ensure the rsync configuration file is consistent across all cluster nodes.

And here is the /usr/global/etc/rsyncd.conf file:

# sample rsyncd.conf configuration file

# GLOBAL OPTIONS

#motd file=/etc/motd

log file=/var/log/rsyncd

# for pid file, do not use /var/run/rsync.pid if

# you are going to run rsync out of the init.d script.

pid file=/var/run/rsyncd.pid

syslog facility=daemon

#socket options=

# MODULE OPTIONS

[www]

read only = yes

hosts allow = beast2.tsand.org

hosts deny = *

dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.iso *.bz2 *.tbz

ignore errors = no

ignore nonreadable = yes

timeout = 600

comment = web sites

path = /var/www

use chroot = yes

max connections=10

lock file = /var/lock/rsyncd

I also needed to add an account on the cluster nodes that is used when the DR client initiates an rsync. I called the account "rsync", and left the account locked. I specified the rsync account home directory as /usr/global/home/rsync.

In the /usr/global/home/rsync directory, I created sub-directory .ssh, and populated the authorized_keys file with the root public key from the DR client machine, then set permissions appropriately. The rsync account is secured, as no one can login to it, and only root from the DR client machine can connect to it.

So on the DR client machine, we have a cron-entry (under root account) that performs the following:

rsync -avz --delete rsyncd.tsand.org::www /var/www

You'll notice my use of "rsyncd.tsand.org::www". This allows me to change the actual path of the source directory, without ever having to update the clients rsync operation. Of course I intend on creating a simple little script on the DR clients that will wrap the rsync operation and check for errors and post them appropriately.

<more later>

Thanks!

Tom