Cluster Load
I wanted a script that would display the load averages of all the nodes in a cluster.
I used a combination of tools:
glusterfs: used to share the programs. Cluster directory is /usr/global/bin.
xinted: A network service running on port 450. Runs on all nodes.
The program is called "busy". Original, huh? ;-)
Here's the script (/usr/global/bin/busy):
#! /bin/bash
# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:
#
# Name: busy
# Author: Tom Sandholm (tom.sandholm@gmail.com)
# Summary: Program to determine load on remote system
# Release: 1.0
# Date: Sun Jan 23 16:32:04 UTC 2011
# Description: This program relies on a custom xinetd service
# called "load".
# the list of remote machines
SRC=/etc/dsh/machines.list
# port number that the load service is on
port=450
# save our IFS
OIFS=$IFS
# declare IFS to be newline
IFS='
'
# loop thru each line in the remote machines list
for x in $(<$SRC)
do
# strip off just the hostname part from the FQDN.
thost=${x%%.*}
# use netcat to access the remote service
out=$(nc $x $port)
# print the results & pipe to sort
echo "${thost}: $out"
done | sort -n -k2 -k3 -k4
exit 0
Here's my /etc/dsh/machines.list file (this is part of the dsh [dancer/distributed shell] package...got this from Debian):
n1.tsand.org
n2.tsand.org
n3.tsand.org
Here's the xinetd.d file/service (/etc/xinetd.d/load):
# /etc/xinetd.d/load
# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:
service load
{
socket_type = stream
protocol = tcp
wait = no
user = root
server = /usr/global/bin/load
instances = 20
}
Here's the load program that the xinetd service will call (/usr/global/bin/load):
#! /bin/bash
# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:
uptime | \
sed -n -e 's/.*load average: \(.*\), \(.*\), \(.*\)$/\1 \2 \3/gp'
Here's the /etc/services entry you will need (for service on port 450):
load 450/tcp # TFS system load
load 450/udp # TFS system load
And finally, here's a sample of the output:
n1:~# busy
n1: 0.00 0.00 0.00
n3: 0.01 0.01 0.00
n2: 0.01 0.03 0.00
n1:~#
I ordered the listing such that the first line will show the least busy server. The sort is configured to sort on columns 1, then 2, then 3.
So, if I want a quick tool to pick a least-busy node to spawn some service on, all I need to extract is the first line, first field, thus,
in this case, node "n1".