Cluster Load

I wanted a script that would display the load averages of all the nodes in a cluster.

I used a combination of tools:

    1. glusterfs: used to share the programs. Cluster directory is /usr/global/bin.

    2. xinted: A network service running on port 450. Runs on all nodes.

The program is called "busy". Original, huh? ;-)

Here's the script (/usr/global/bin/busy):

#! /bin/bash

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

#

# Name: busy

# Author: Tom Sandholm (tom.sandholm@gmail.com)

# Summary: Program to determine load on remote system

# Release: 1.0

# Date: Sun Jan 23 16:32:04 UTC 2011

# Description: This program relies on a custom xinetd service

# called "load".

# the list of remote machines

SRC=/etc/dsh/machines.list

# port number that the load service is on

port=450

# save our IFS

OIFS=$IFS

# declare IFS to be newline

IFS='

'

# loop thru each line in the remote machines list

for x in $(<$SRC)

do

# strip off just the hostname part from the FQDN.

thost=${x%%.*}

# use netcat to access the remote service

out=$(nc $x $port)

# print the results & pipe to sort

echo "${thost}: $out"

done | sort -n -k2 -k3 -k4

exit 0

Here's my /etc/dsh/machines.list file (this is part of the dsh [dancer/distributed shell] package...got this from Debian):

n1.tsand.org

n2.tsand.org

n3.tsand.org

Here's the xinetd.d file/service (/etc/xinetd.d/load):

# /etc/xinetd.d/load

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

service load

{

socket_type = stream

protocol = tcp

wait = no

user = root

server = /usr/global/bin/load

instances = 20

}

Here's the load program that the xinetd service will call (/usr/global/bin/load):

#! /bin/bash

# vi:set nu ai ap aw smd showmatch tabstop=4 shiftwidth=4:

uptime | \

sed -n -e 's/.*load average: \(.*\), \(.*\), \(.*\)$/\1 \2 \3/gp'

Here's the /etc/services entry you will need (for service on port 450):

load 450/tcp # TFS system load

load 450/udp # TFS system load

And finally, here's a sample of the output:

n1:~# busy

n1: 0.00 0.00 0.00

n3: 0.01 0.01 0.00

n2: 0.01 0.03 0.00

n1:~#

I ordered the listing such that the first line will show the least busy server. The sort is configured to sort on columns 1, then 2, then 3.

So, if I want a quick tool to pick a least-busy node to spawn some service on, all I need to extract is the first line, first field, thus,

in this case, node "n1".