Linux Shell Scripts

Monitoring Linux with Nagios

How to parse SNMP traps

Example 1

This is an example of how to configure snmptrapd to parse MIBS from 3rd party Vendor.

By default, Net-SNMP clients and servers only understand a set of default MIBs.

  1. Identify the MIBs for your platform/product.

  2. Will choose by instance this one and in particular the alertTrap block:

    1. alertTrap TRAP-TYPE

    2. ENTERPRISE tripwireEnterprise

    3. VARIABLES { twNodeName, twRuleName, twElementName, twSeverity,

    4. twNodeId, twRuleId, twElementId, twChangeType }

    5. DESCRIPTION

    6. "This event is generated when Tripwire detected a

    7. change in an element. The event includes the origin

    8. node, the rule that detected the change, the element

    9. for which the change was detected, and the severity

    10. of the change as indicated by the rule."

    11. ::= 0

  3. Install them into your $MIBDIR, usually $PREFIX/share/snmp/mibs/

  4. /usr/share/snmp/mibs/TRIPWIRE-ENTERPRISE.MIB.txt

  5. Update the snmptrapd configuration file (/etc/snmp/snmptrapd.conf)

  6. traphandle TRIPWIRE-ENTERPRISE-MIB::tripwireEnterprise /usr/sbin/mparser.sh

  7. Restart the snmptrapd network service;

Here's the skeleton of the bash script /usr/sbin/mparser.sh:

DATE_VERBOSE="$(date +"%Y.%m.%d %H:%M:%S (%a)")"

while read oid val; do

case "$oid" in

TRIPWIRE-ENTERPRISE-MIB::*)

eval ${oid/*::/}="'$(echo "$val")'" ;;

SNMP-COMMUNITY-MIB::snmpTrapAddress.0)

snmpTrapAddress="$val" ;;

*) ;;

esac

done

# remove heading string "Wrong Type (should be IpAddress):"

[[ "$twNodeName" =~ ^Wrong\ Type ]] && twNodeName="${twNodeName/*\ /}"

(

echo -e "\

[$DATE_VERBOSE - MIB alertTrap]

snmpTrapAddress\t$snmpTrapAddress"

for snmptrapvar in \

twNodeName twRuleName twElementName twSeverity twChangeType; do

echo -en "$snmptrapvar\t"; echo "${!snmptrapvar}"

done

echo

) >> /var/log/mibparser.log

In the log you'll find a list of blocks similar to the following one:

[2012.08.31 22:15:30 (Fri) - MIB alertTrap] snmpTrapAddress 192.168.1.100 twNodeName "server01.domain.com" twRuleName "RHEL System Configuration Files" twElementName "/etc/passwd" twSeverity 100 twChangeType 2

Example 2

This example explains how to configure the Linux snmptrapd daemon (net-snmp) for catching a specific SNMP trap sent by a QRadar cluster when the master node has switched from one node another one.

The QRadar MIB follow.

systemTrapsGroup NOTIFICATION-GROUP NOTIFICATIONS { xcbLogin, xcbLoginFailure, xcbLogout, xcbConfigChange, xcbAlert, xcbError, xcbBackupFailed, xcbArchiveFailed, xcbDBError, xcbLimitReached, xcbHaNodeChanged, xcbTimestampError, xcbTimeSyncLost, xcbRaidStatus, xcbHWError, xcbFirmwareTainted, xcbDiskFull } STATUS current DESCRIPTION "System related traps ::= { groups 1 } xcbHaNodeChanged NOTIFICATION-TYPE OBJECTS { description, node } STATUS current DESCRIPTION "HA node state changed" ::= { traps 12 }

One line has to be added to the snmptrapd configuration file (/etc/snmp/snmptrapd.conf)

traphandle XCB-SNMP-MIB::xcbHaNodeChanged /usr/sbin/qradar_trap_handler.sh

So a shell script will be executed each time an SNMP trap will be catched by the snmptrad daemon and Nagios will be notified via en external command. Here's the script.

#!/bin/bash # Send a notification to Nagios when an SNMP trap # "XCB-SNMP-MIB:xcbHaNodeChanged" is received. # Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com> [ $(id -u) -eq 0 ] || { echo "${0##*/}: This script requires root. Aborting..." 1>&2 && exit 1; } NOW=`date +%s 2>/dev/null` NAGIOS_CMDFILE="/var/spool/nagios/cmd/nagios.cmd" LOGFILE="/var/log/snmptrapd_qradar_node_changed.log" HOST_NAME="QRADAR_VIP" SERVICE_NAME="SNMPTRAP_QRADARP_NODE_CHANGED" RETCODE="2" MSG="CRITICAL: Node has changed" # for debug only - this block can be removedwhile test -n "$1"; do case "$1" in --debug|-d) RETCODE="0" MSG="OK: No SNMP traps catched" ;; esac shift done ( /usr/bin/printf "\ [%lu] PROCESS_SERVICE_CHECK_RESULT;$HOST_NAME;$SERVICE_NAME;0;$MSG\n" $NOW \ > $NAGIOS_CMDFILE ) >> $LOGFILE 2>&1

See the Nagios documentation about externalcommands for more informations.

We can simulate such a SNMP trap by executing the follow command on the Nagios poller command line:

/usr/bin/snmptrap -v 2c -c public 127.0.0.1 '' XCB-SNMP-MIB::xcbHaNodeChanged

The nagios log will reports a couple of lines:

[1395307951] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;\ QRADAR_VIP;SNMPTRAP_QRADARP_NODE_CHANGED;0;CRITICAL: Node has changed [1395307956] PASSIVE SERVICE CHECK: QRADAR_VIP;SNMPTRAP_QRADARP_NODE_CHANGED;\ 2;CRITICAL: Node has changed

Nagios/NRPE plugin skeleton

#!/bin/bash

# check_something.sh: Check something...

# Copyright (C) 2014 ... <...@...>

PROGNAME=`/bin/basename $0`

PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

REVISION="<SCRIPT-VERSION>"

. $PROGPATH/utils.sh

print_usage() {

echo "Usage: $PROGNAME <...plugin-options...>"

echo " $PROGNAME --help"

echo " $PROGNAME --version"

echo

}

print_help() {

echo "$PROGNAME v$REVISION"

echo "Nagios Plugin: check something..."

echo "Copyright (C) 2012 ... <...@...>"

echo ""

print_usage

}

# Make sure the correct number of command line

# arguments have been supplied

if [ $# -lt 1 ]; then

print_usage

exit $STATE_UNKNOWN

fi

# Set default values

SCRIPT_OPTION=

EXIT_STATUS=$STATE_UNKNOWN #default

# Grab the command line arguments

while test -n "$1"; do

case "$1" in

--help|-h)

print_help

exit $STATE_OK

;;

--version|-V)

echo "$PROGNAME v$REVISION"

exit $STATE_OK

;;

--<long-option>|-<short-option>)

SCRIPT_OPTION="$2"

shift

;;

*)

echo "Unknown argument: $1"

print_usage

exit $STATE_UNKNOWN

;;

esac

shift

done

# <...SCRIPT-CODE...>

# if OK:

echo "OK"; EXIT_STATUS=$STATE_OK

# if WARNING:

echo "WARNING: <WARNING-MESSAGE>"; EXIT_STATUS=$STATE_WARNING

# if critical:

echo "CRITICAL: <CRITICAL-MESSAGE>"; EXIT_STATUS=$STATE_CRITICAL

exit $EXIT_STATUS

Nagios/NSCA plugin skeleton

#!/bin/bash

# monitor_something.sh: Send alerts to Nagios ...

# Copyright (C) 2014 ... <...@...>

NSCA_HOSTNAME="SERVER_NAME"

NSCA_TARGET_IP="192.168.1.1" # this is the nagios IP address

NSCA_SERVICE_NAME="NAME-OF-THE-NAGIOS-PASSIVE-SERVICE"

NSCA_MESSAGE_OK="this is an OK message"

NSCA_MESSAGE_CRITICAL="this is a CRITICAL message"

LOG="/var/log/nsca_logging.log"

NSCA_STATE_OK=0

NSCA_STATE_CRITICAL=2

# <code-that-do-something-and-set-RET>

if [ "$RET" = 0 ]; then

echo -e "\

$NSCA_HOSTNAME\t$NSCA_SERVICE_NAME\t$NSCA_STATE_OK\tOK: $NSCA_MESSAGE_OK" | \

/usr/sbin/send_nsca $NSCA_TARGET_IP -c /etc/nagios/send_nsca.cfg \

>> $LOG 2>&1

else

echo -e "\

$NSCA_HOSTNAME\t$NSCA_SERVICE_NAME\t$NSCA_STATE_CRITICAL\tCRITICAL: $NSCA_MESSAGE_CRITICAL" | \

/usr/sbin/send_nsca $NSCA_TARGET_IP -c /etc/nagios/send_nsca.cfg \

>> $LOG 2>&1

fi

Monitoring Linux boxes via SNMP

#!/bin/bash

# check the health of a Linux appliance

# Copyright (C) 2013-2014 Davide Madrisan <davide.madrisan@gmail.com>

PROGNAME=`/bin/basename $0`

PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

REVISION=

. $PROGPATH/utils.sh

print_usage() {

echo "Usage: $PROGNAME -H <hostname> -C <snmp-community> --check <action>"

echo " $PROGNAME --help"

echo " $PROGNAME --version"

echo " $PROGNAME -H targetsrv -C public --check uptime"

echo " $PROGNAME -H targetsrv -C public --check disk:37"

echo " $PROGNAME -H targetsrv -C public --check netif:1"

echo

echo "List of known checks:"

echo " disk"

echo " load15"

echo " memory"

echo " netif"

echo " swap"

echo " uptime"

}

print_help() {

echo "$PROGNAME v$REVISION"

echo "Nagios Plugin: check the health of a Linux appliance"

echo "Copyright (C) 2013 Davide Madrisan <davide.madrisan@gmail.com>"

echo

print_usage

}

EXIT_STATUS=$STATE_UNKNOWN #default

WARN=0

CRIT=0

# Grab the command line arguments

while test -n "$1"; do

case "$1" in

--help|-h)

print_help

exit $STATE_OK

;;

--version|-V)

echo "$PROGNAME v$REVISION"

exit $STATE_OK

;;

--host|-H)

HOST="$2"; shift

;;

--community|-C)

COMMUNITY="$2"; shift

;;

--check)

SNMP_CHECK="$2"; shift

;;

--warning|-w)

WARN="$2"; shift

;;

--critical|-c)

CRIT="$2"; shift

;;

*)

echo "Unknown argument: $1"

print_usage

exit $STATE_UNKNOWN

;;

esac

shift

done

[ "$SNMP_CHECK" ] || { print_usage; exit $STATE_UNKNOWN; }

die {

echo "CRITICAL: ${1:-SNMPWALK error while querying $HOST}"

exit $STATE_CRITICAL

}

snmp_helper() {

local OID="$1" snmp_opts="Ov"

[ "$2" = "alltokens" ] || snmp_opts="${snmp_opts}q"

snmp_output="$(\

LC_ALL=C \

/usr/bin/snmpwalk -v2c -c $COMMUNITY -$snmp_opts $HOST $OID 2>/dev/null)"

case "$snmp_output" in

"No Such Instance currently exists at this OID"|"") die ;;

*) echo "$snmp_output" ;;

esac

}

round0() { printf "%.0f" "$1"; }

case "$SNMP_CHECK" in

disk*)

[ "${SNMP_CHECK/*:}" = "$SNMP_CHECK" ] &&

{ echo "UNKNOWN: no filesystem set! (configuration BUG)";

exit $STATE_UNKNOWN; }

snmp_diskid="${SNMP_CHECK/*:}"

mountpoint="$(\

snmp_helper HOST-RESOURCES-MIB::hrStorageDescr.${snmp_diskid})" || die

disk_size_total="$(\

snmp_helper HOST-RESOURCES-MIB::hrStorageSize.${snmp_diskid})" || die

disk_size_used="$(\

snmp_helper HOST-RESOURCES-MIB::hrStorageUsed.${snmp_diskid})" || die

disk_used_perc=$(( (100 * $disk_size_used) / $disk_size_total ))

if [ $CRIT -ne 0 -a $CRIT -le $disk_used_perc ]; then

EXIT_STATUS=$STATE_CRITICAL

echo -n "CRITICAL: "

elif [ $WARN -ne 0 -a $WARN -le $disk_used_perc ]; then

EXIT_STATUS=$STATE_WARNING

echo -n "WARNING: "

else

EXIT_STATUS=$STATE_OK

echo -n "OK: "

fi

echo -n "\

Filesystem $mountpoint: $disk_used_perc% used \

($disk_size_used / $disk_size_total) | $mountpoint=$disk_used;"

[ "$WARN" -eq 0 ] && echo -n ";" || \

echo -n "$(($WARN * $disk_size_total / 100));"

[ "$CRIT" -eq 0 ] && echo -n ";" || \

echo -n "$(($CRIT * $disk_size_total / 100));"

echo "0;$disk_size_total"

;;

load15)

load15="$(snmp_helper UCD-SNMP-MIB::laLoad.3)" || die

load15_rounded="$(round0 $load15)"

if [ $CRIT -ne 0 -a $CRIT -le $load15_rounded ]; then

EXIT_STATUS=$STATE_CRITICAL

echo -n "CRITICAL: "

elif [ $WARN -ne 0 -a $WARN -le $load15_rounded ]; then

EXIT_STATUS=$STATE_WARNING

echo -n "WARNING: "

else

EXIT_STATUS=$STATE_OK

echo -n "OK: "

fi

echo -n "load average (15min): $(round0 $load15) | load_15min=$load15_rounded;"

[ "$WARN" -eq 0 ] && echo -n ";" || echo -n "$WARN;"

[ "$CRIT" -eq 0 ] && echo -n ";" || echo -n "$CRIT;"

echo

;;

memory)

ram_total="$(snmp_helper UCD-SNMP-MIB::memTotalReal.0)" || die

ram_free="$(snmp_helper UCD-SNMP-MIB::memAvailReal.0)" || die

ram_cache="$(snmp_helper UCD-SNMP-MIB::memAvailCached.0)" || die

ram_used=$(( $ram_total - ($ram_free + $ram_cache) ))

ram_used_perc=$(( (100 * $ram_used) / $ram_total ))

if [ $CRIT -ne 0 -a $CRIT -le $ram_used_perc ]; then

EXIT_STATUS=$STATE_CRITICAL

echo -n "CRITICAL: "

elif [ $WARN -ne 0 -a $WARN -le $ram_used_perc ]; then

EXIT_STATUS=$STATE_WARNING

echo -n "WARNING: "

else

EXIT_STATUS=$STATE_OK

echo -n "OK: "

fi

echo -n "\

RAM used: $ram_used_perc% ($ram_used / $ram_total) | ram_used=$ram_used;"

[ "$WARN" -eq 0 ] && echo -n ";" || \

echo -n "$(( $WARN * $ram_total / 100 ));"

[ "$CRIT" -eq 0 ] && echo -n ";" || \

echo -n "$(( $CRIT * $ram_total / 100 ));"

;;

netif*)

[ "${SNMP_CHECK/*:}" = "$SNMP_CHECK" ] &&

{ echo "UNKNOWN: no network interface set! (configuration BUG?)"

exit $STATE_UNKNOWN; }

snmp_netifid="${SNMP_CHECK/*:}"

netif_ifName="$(snmp_helper IF-MIB::ifName.${snmp_netifid})" || die

netif_ifOperStatus="$(snmp_helper IF-MIB::ifOperStatus.${snmp_netifid})" \

|| die

netif_ifInOctets="$(snmp_helper IF-MIB::ifInOctets.${snmp_netifid})" \

|| die

netif_ifInErrors="$(snmp_helper IF-MIB::ifInErrors.${snmp_netifid})" \

|| die

netif_ifOutOctets="$(snmp_helper IF-MIB::ifOutOctets.${snmp_netifid})" \

|| die

netif_ifOutErrors="$(snmp_helper IF-MIB::ifOutErrors.${snmp_netifid})" \

|| die

case "$netif_ifOperStatus" in

down*)

EXIT_STATUS=$STATE_CRITICAL

echo -n "CRITICAL: network interface '$netif_ifName' is DOWN"

;;

up*)

EXIT_STATUS=$STATE_OK

echo -n "OK: network interface '$netif_ifName' is UP"

;;

*)

echo "UNKNOWN: unexpected SNMP output: $netif_ifOperStatus"

exit $STATE_UNKNOWN

;;

esac

echo " | \

${netif_ifName}_ifInOctets=$netif_ifInOctets \

${netif_ifName}_ifInErrors=$netif_ifInErrors \

${netif_ifName}_ifOutOctets=$netif_ifOutOctets \

${netif_ifName}_ifOutErrors=$netif_ifOutErrors"

;;

swap)

swap_total="$(snmp_helper UCD-SNMP-MIB::memTotalSwap.0)" || die

swap_free="$(snmp_helper UCD-SNMP-MIB::memAvail.0)" || die

if [ $swap_total -eq 0 ]; then

swap_used=0

swap_used_perc=0

else

swap_used=$(( $swap_total - $swap_free ))

swap_used_perc=$(( (100 * $swap_used) / $swap_total ))

fi

if [ $CRIT -ne 0 -a $CRIT -le $swap_used_perc ]; then

EXIT_STATUS=$STATE_CRITICAL

echo -n "CRITICAL: "

elif [ $WARN -ne 0 -a $WARN -le $swap_used_perc ]; then

EXIT_STATUS=$STATE_WARNING

echo -n "WARNING: "

else

EXIT_STATUS=$STATE_OK

echo -n "OK: "

fi

echo -n "SWAP used: $swap_used_perc ($swap_used / $swap_total) | swap_used=$swap_used;"

[ "$WARN" -eq 0 ] && echo -n ";" || \

echo -n "$(( $WARN * $swap_total / 100 ));"

[ "$CRIT" -eq 0 ] && echo -n ";" || \

echo -n "$(( $CRIT * $swap_total / 100 ));"

echo "0;$swap_total"

;;

uptime)

uptimemsg="$(snmp_helper DISMAN-EVENT-MIB::sysUpTimeInstance alltokens)" \

|| die

echo "UPTIME: $uptimemsg"

EXIT_STATUS=$STATE_OK

;;

*)

echo "UNKNOWN: unsupported SNMP check: $SNMP_CHECK"

exit $STATE_UNKNOWN

;;

esac

exit $EXIT_STATUS

IP Subnet Calculator

#!/bin/bash

# NETCMPT (netcompute) v2.1.2

# Copyright (C) 2001-2004 Davide Madrisan <davide.madrisan@gmail.com>

# NETCP is free software; you can redistribute it and/or modify it under the

# terms of the GNU General public License as published by the Free Software

# Foundation; either version 2 of the License, or (at your option) any later

# version. This program is distributed in the hope that it will be useful,

# but WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# See the GNU General Public License for more details.

script_name="netcmpt"

version_num="2.1.2"

function usage() {

echo "

$(echo $script_name | tr a-z A-Z) (NETcompute), version $version_num

Computes mask, wildcard mask, network and broadcast addresses

Copyright (C) 2004 Davide Madrisan <davide.madrisan@gmail.com>

Usage: $script_name <ipv4_address ipv4_mask>

$script_name <ipv4_address/bitmask>

$script_name --help

Examples are:

$script_name 192.168.229.161 255.255.248.0

$script_name 192.168.229.161/21

For bugs and suggestions, please contact me by e-mail"

exit 0

}

# Check $1 and $2 (if not empty) for syntax errors.

# The first or the first two args must contain a valid ipv4 address.

# If no erros are detected the function returns 0 and set

# 'ip', 'mask_bit', and 'mask'. Otherwhise it returs 1.

function check_args() {

# is $1 == "N1.N2.N3.N4" or == "N1.N2.N3.N4/M"?

if ! echo $1 | grep -Eq "^([0-9]+\.){3}[0-9]+(/[0-9]+){0,1}$"; then

return 1

fi

ip=$(echo $1 | cut -f1 -d/)

for i in 1 2 3 4; do

local byte=$(echo $ip | cut -f$i -d.)

[ $byte -le 255 ] || return 1

done

mask_bits=$(echo $1 | cut -f2 -s -d/)

if [ $mask_bits ]; then # the bitmask has been provided

[ $mask_bits -le 32 ] || return 1

# fill `mask' with the extended version of the given mask

# i.e. mask_bits = 24 --> mask = 255.255.255.0

local mb=$mask_bits

for i in 1 2 3 4; do

if [ $mb -ge 8 ]; then

mask=$mask"255"

let "mb -= 8"

else

mask=$mask$[256 - (2 << (7-$mb))]

[ $mb -gt 0 ] && let mb=0

fi

[ $i -lt 4 ] && mask=$mask"."

done

else

if ! echo $2 | grep -Eq "^([0-9]+\.){3}[0-9]+$"; then

return 1

mask=$2

local prev_byte=255

local prev_bit=1

# calculate `mask_bits' using `mask'

let "mask_bits = 0"

for i in 1 2 3 4; do

local byte=$(echo $mask | cut -f$i -d.)

[ $byte -gt 255 ] && return 1

# the previous byte != '255'? --> the current one must be 0'

[ $prev_byte -lt 255 -a $byte -ne 0 ] && return 1

for j in $(seq 7 -1 0); do

local bit_value=$[($byte >> j) & 1]

# the previous bit = '0'? --> the current one cannot be set to '1'

[ $prev_bit -eq 0 -a $bit_value -eq 1 ] && return 1

[ $bit_value -eq 1 ] && let mask_bits+=1

prev_bit=$bit_value

done

prev_byte=$byte

done

return 0

}

[ "$1" = "--help" -o "$1" = "-h" -o -z "$1" ] && usage

check_args $1 $2 ||

{ echo "\

ERROR: illegal IP address or mask

enter \`$script_name --help' if you need help"; exit 1; }

# calculate wildcard mask, broadcast and network addresses

for i in 1 2 3 4; do

ip_byte=$(echo $ip | cut -f$i -d.) # select the byte number 'i' from 'ip'

mask_byte=$(echo $mask | cut -f$i -d.) # same for 'mask'

let "ip_and_mask = $(($ip_byte & $mask_byte))"

let "mask_cplm = (255 - $mask_byte)"

# create the strings 'wildcard', 'broadcast', and 'network'

wildcard=$wildcard$mask_cplm

network=$network$ip_and_mask

broadcast=$broadcast$[ip_and_mask + mask_cplm]

if [ $i -lt 4 ]; then

network=$network"."

broadcast=$broadcast"."

wildcard=$wildcard"."

fi

done

echo "\

IP and mask $ip $mask (/$mask_bits)

wildcard mask $wildcard

network $network

broadcast $broadcast"