Monitoring Linux Operating System

Software

Nagios Plugins for Linux

A suite of Nagios plugins for monitoring Linux servers and appliances

Name: nagios-plugins-linux

Tags: Utilities/Monitoring

License: GPL v3+

Operating System: Linux

Implementation: C (C99)

This package is known to compile with:

but should compile on all relatively recent Linux distributions.

gcc 4.9.0-4.9.2 and clang 3.1 and 3.5.1 (openmamba GNU/Linux 2.90+).

gcc 4.1.2 (Red Hat Enterprise Linux 5, CentOS 5),

gcc 4.4 (Red Hat Enterprise Linux 6, CentOS 6),

gcc 4.8.2 (Red Hat Enterprise Linux 7, CentOS 7),

List of the Linux kernels that have been successfully tested: 2.6.18, 2.6.32, 3.10, 3.14.

Last stable version: version 17 - "400th (git) commit"

Available plugins:

check_clock

check_cpu - improved in version 15 and 16

check_cpufreq - new plugin in version 16

check_cswch

check_fc - new plugin in version 17

check_ifmountfs

check_intr

check_iowait

check_load

check_memory - improved in version 15

check_multipath - improved in version 16

check_nbprocs

check_network

check_paging

check_readonlyfs

check_swap

check_tcpcount

check_temperature

check_uptime

check_users

Developement : GitHub

Documentation : GitHub/README

Nagios Exchange Page : Nagios Plugins Linux

Download the latest stable source archive here

How to build the source code

This package uses the GNU autotools for configuration and installation.

If you have cloned the git repository then you will need to run autoreconf to generate the required files.

Run ./configure --help to see a list of available install options.

The plugin will be installed by default into LIBEXECDIR.

It is highly likely that you will want to customise this location to suit your needs, i.e.:

./configure --libexecdir=/usr/lib/nagios/plugins

After ./configure has completed successfully run

make install

(as root) and you're done!

Or much better, but for advanced users, do create a package in the format supported by your Linux distribution and install it.

Available Nagios Plugins

A Linux server or appliance can be fully monitored by the Nagios/NRPE services listed below.

All the binaries are provided by the nagios-plugins-linux software.

--- Monitoring Time and System Uptime ---

LNX_CLOCK - returns the number of seconds elapsed between local time and Nagios time

[ /etc/nrpe.d/check_clock ]

command[check_clock]=/usr/lib/nagios/plugins/check_clock --refclock $ARG1$ -w 60 -c 120

where $ARG1$ is the number of secondss since the "Epoch"

(1970-01-01 00:00:00 UTC) -- $(date '+%s') -- provided by the Nagios poller.

Usage note

This check is intended for alerting when the number of seconds elapsed between the Nagios poller and the monitored server exceeds a given threshold (60 seconds for the warning state, and 120 seconds for a critical notification, in the example above).

The clock of the Nagios server needs, of course, to be synchronized to an NTP server.

This plugin returns the number of seconds elapsed between

the host local time and Nagios time.

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_clock [-w COUNTER] [-c COUNTER] --refclock TIME

Options:

-r, --refclock COUNTER the clock reference (in seconds since the Epoch)

-w, --warning COUNTER warning threshold

-c, --critical COUNTER critical threshold

-v, --verbose show details for command-line debugging

(Nagios may truncate output)

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_clock -w 60 -c 120 --refclock $ARG1$

# where $ARG1$ is the number of seconds since the Epoch: "$(date '+%s')"

# provided by the Nagios poller

Example of output

clock OK - time delta 39s | clock_delta=39

Performance data

clock_delta

LNX_UPTIME - check how long the system has been running

[ /etc/nrpe.d/check_uptime ]

command[check_uptime]=/usr/lib/nagios/plugins/check_uptime

command[check_uptime_notify]=/usr/lib/nagios/plugins/check_uptime --critical 30:

Usage note

In the example above, a notification will be sent by Nagios when the uptime of the monitored server will be less than 30 minutes. This will catch, for instance, an unexpected reboot of a servers caused by a non-maskable interrupt (a signal of a non-recoverable hardware error).

A note on the implementation of "check_uptime" provided by nagios-plugins 2.0+

This new Nagios plugin is based on the POSIX function clock_gettime() associated with the clock monotonic option (CLOCK_MONOTONIC).

According to the POSIX specifications "the value returned by clock_gettime() represents the amount of time (in seconds and nanoseconds) since an unspecified point in the past (for example, system start-time, or the Epoch)".

The (recent) Linux kernels returns a value that is somehow related to the system start-time but can be different from the output of the command uptime (procps), or the first value of /proc/uptime.

$ /usr/bin/uptime

18:45:00 up 8:46, 7 users, load average: 0.67, 1.79, 2.49

$ awk '{printf("%02d:%02d\n",($1/60/60%24),($1/60%60))}' /proc/uptime

08:46

$ ./clock_monotonic

4 hours 37 min

(On OpenBSD 5.0, the clock monotonic function returns the same value as uptime, which is confirming this behaviour is platform dependent).

The implementation followed by nagios-plugins-linux is compatible with uptime and /proc/uptime.

This plugin checks how long the system has been running.

Copyright (C) 2010,2012-2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_uptime [OPTION]

Options:

-m, --clock-monotonic use the monotonic clock for retrieving the time

-w, --warning PERCENT warning threshold

-c, --critical PERCENT critical threshold

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_uptime

check_uptime --critical 15: --warning 30:

check_uptime --clock-monotonic -c 15: -w 30:

See the Nagios Developer Guidelines for range format:

<https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>

Example of output

uptime OK: 23 hours 56 min | uptime=1436

Performance data

uptime (in minutes)

--- Monitoring CPU and System Load ---

LNX_CPU - check the CPU (user mode) utilization

[ /etc/nrpe.d/check_cpu ]

command[check_cpu]=/usr/lib/nagios/plugins/check_cpu -f -w 85% -c 95%

This plugin checks the CPU (user mode) utilization

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_cpu [-v] [-m] [-p] [-w PERC] [-c PERC] [delay [count]]

check_cpu --cpuinfo

Options:

-m, --no-cpu-model do not display the CPU model in the output message

-p, --per-cpu display the utilization of each CPU

-w, --warning PERCENT warning threshold

-c, --critical PERCENT critical threshold

-v, --verbose show details for command-line debugging

(Nagios may truncate output)

-i, --cpuinfo show the CPU characteristics (for debugging)

-h, --help display this help and exit

-V, --version output version information and exit

delay is the delay between updates in seconds (default: 1sec)

count is the number of updates (default: 2)

1 means the percentages of total CPU time from boottime.

Examples:

check_cpu -m -p -w 85% -c 95%

check_cpu -w 85% -c 95% 1 2

check_cpu --cpuinfo

Example of output

cpu CPU: OK - cpu user 79.5% | cpu_user=79.5% cpu_system=20.5% cpu_idle=0.0% cpu_iowait=3% cpu_steal=0%

Performance data

cpu_user

cpu_system

cpu_idle

cpu_iowait

cpu_steal

This plugin can also display some CPU informations ("check_cpu --cpuinfo")

-= CPU Characteristics =-

Architecture: i686

CPU op-mode(s): 32-bit

Byte Order: Little Endian

CPU(s): 2

Thread(s) per core: 2

Core(s) per socket: 1

Socket(s): 1

Vendor ID: GenuineIntel

CPU Family: 6

Model: 28

Model name: Intel(R) Atom(TM) CPU N270 @ 1.60GHz

-CPU0-

CPU is Hot Pluggable: no

Maximum Transition Latency: 10.0us

Current CPU Frequency: 1.07GHz

Available CPU Frequencies: 1.60GHz 1.33GHz 1.07GHz 800MHz

Hardware Limits: 800MHz - 1.60GHz

CPU freq Current Governor: ondemand

CPU freq Available Governors: ondemand userspace

CPU freq Driver: acpi-cpufreq

-CPU1-

CPU is Hot Pluggable: yes (online)

Maximum Transition Latency: 10.0us

Current CPU frequency: 800MHz

Available CPU Frequencies: 1.60GHz 1.33GHz 1.07GHz 800MHz

Hardware Limits: 800MHz - 1.60GHz

CPU freq Current Governor: ondemand

CPU freq Available Governors: ondemand userspace

CPU freq Driver: acpi-cpufreq

☛ Documentation

Here are a few interesting internet links where you can find some stuff related to cpu and sysfs:

LNX_CPUFREQ - displays the CPU frequency characteristics

[ /etc/nrpe.d/check_cpufreq ]

command[check_cpufreq]=/usr/lib/nagios/plugins/check_cpufreq

This plugin checks the CPU (user mode) utilization

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_cpufreq [-m] [-w PERC] [-c PERC]

Options:

-m, --no-cpu-model do not display the CPU model in the output message

-w, --warning PERCENT warning threshold

-c, --critical PERCENT critical threshold

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_cpufreq -m -w 800000

Example of output

cpufreq CPU: | cpu0_freq=800000Hz;;;800000;1600000 cpu1_freq=1600000Hz;;;800000;1600000

Performance data

cpu0_freq

cpu1_freq

...

LNX_CSSWCH - monitors the total number of context switches across all CPUs

[ /etc/nrpe.d/check_cswch ]

command[check_cswch]=/usr/lib/nagios/plugins/check_cswch 1 2

Example of output

cswch OK - number of context switches/s 1317 | cswch/s=1317

Performance data

cswch/s

☛ Documentation

LINFO (The Linux Information Project) - Context Switch

LNX_INTERRUPTS- monitors the total number of system interrupts

[ /etc/nrpe.d/check_intr ]

command[check_intr]=/usr/lib/nagios/plugins/check_intr 1 2

Example of output

intr OK - number of interrupts/s 9318 | intr/s=9318 intr_cpu0/s=1157 intr_cpu1/s=1724 intr_cpu2/s=2862 intr_cpu3/s=3579

Performance data

intr/s

intr_cpu0/s

intr_cpu1/s

...

Usage note

The variable intr report the total number of interrupts, for each of the possible system interrupts, including unnumbered architecture specific ones.

The performance data intr_cpuN report the number of interrupts per cpu per IO device.

Since Linux 2.6.24, for the i386 and x86_64 architectures at least, this also includes interrupts internal to the system (that is, not associated with a device as such)

LNX_IOWAIT - monitor the I/O wait bottlenecks

[ /etc/nrpe.d/check_iowait ]

command[check_iowait]=/usr/lib/nagios/plugins/check_iowait -m -w 20% -c 30%

Example of output

iowait OK - cpu iowait 0% | cpu_user=31% cpu_system=8% cpu_idle=61% cpu_iowait=0% cpu_steal=0% cpu_freq=1600MHz

Performance data

see LNX_CPU

LNX_LOAD - check the current system load average

[ /etc/nrpe.d/check_load ]

command[check_load]=/usr/lib/nagios/plugins/check_load -r --load15=1.5,3.0

This plugin checks the current system load average.

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_load [-r] [--load1=w,c] [--load5=w,c] [--load15=w,c]

Options:

-r, --percpu divide the load averages by the number of CPUs

-1, --load1=WLOAD1,CLOAD1 warning and critical thresholds for load1

-5, --load5=WLOAD5,CLOAD5 warning and critical thresholds for load5

-L, --load15=WLOAD15,CLOAD15 warning and critical thresholds for load15

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_load -r --load1=2,3 --load15=1.5,2.5

Example of output

load OK - average: 2.66, 2.95, 2.01 | load1=2.660;0.000;0.000;0, load5=2.950;0.000;0.000;0, load15=2.010;0.000;0.000;0

Performance data

load1

load5

load15

--- Monitoring Filesystems and Disks ---

LNX_DISK - You can use the official Nagios Plugins (check_disk)

LNX_IFMOUNTFS - check whether the given filesystems are mounted

[ /etc/nrpe.d/check_ifmountfs ]

command[check_ifmountfs]=/usr/lib/nagios/plugins/check_ifmountfs /mnt/nfs-data,/dev/cdrom

LNX_MULTIPATH - check the multipath topology status

[ /etc/nrpe.d/check_multipath ]

command[check_multipath]=/usr/bin/sudo /usr/lib/nagios/plugins/check_multipath

LNX_READONYFS - check for readonly filesystems

[ /etc/nrpe.d/check_readonlyfs ]

command[check_rofs]=/usr/lib/nagios/plugins/check_readonlyfs -l -X cgroup -X tmpfs

--- Monitoring Memory, Swap and Paging ---

LNX_MEMORY - check the memory usage

[ /etc/nrpe.d/check_memory ]

command[check_memory]=/usr/lib/nagios/plugins/check_memory -b -w 85% -c 95%

This plugin checks the system memory utilization.

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_memory [-a] [-b,-k,-m,-g] -s -w PERC -c PERC

Options:

-a, --available display the free/available memory

-b,-k,-m,-g show output in bytes; KB (the default), MB, or GB

-s, --vmstats display the virtual memory perfdata

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_memory --available -w 20%: -c 10%:

check_memory --vmstats -w 80% -c 90%

Example of output

memory OK: 26.08% (266580 kB) used | mem_total=1023312kB,\

mem_used=266580kB, mem_free=171548kB, mem_shared=51244kB,\

mem_buffers=34744kB, mem_cached=550440kB, mem_available=674712kB,\

mem_active=325136kB, mem_anonpages=240464kB,\ mem_committed=1704152kB, mem_dirty=604kB, mem_inactive=468904kB,\ vmem_pageins/s=128, vmem_pageouts/s=0, vmem_pgmajfaults/s=0

Performance data

mem_total Total usable physical RAM

mem_used Total amount of physical RAM used by the system

mem_free Amount of RAM that is currently unused

mem_shared Now always zero; not calculated

mem_buffers Amount of physical RAM used for file buffers

mem_cached In-memory cache for files read from the disk

(the page cache)

mem_available kernel >= 2.6.27: memory available for starting new

applications, without swapping

mem_available kernel < 2.6.27: same as 'mem_free'

mem_active Memory that has been used more recently

mem_anonpages Non-file backed pages mapped into user-space page tables

mem_committed The amount of memory presently allocated on the system

mem_dirty Memory which is waiting to get written back to the disk

mem_inactive Memory which has been less recently used

vmem_pageins

vmem_pageouts The number of memory pages the system has written in

and out to disk

vmem_pgmajfault The number of memory major pagefaults

Internet documentation on this topic:

LNX_PAGING - check the memory and swap paging

[ /etc/nrpe.d/check_paging ]

command[check_paging]=/usr/lib/nagios/plugins/check_paging --swapping -w 10 -c 25

LNX_SWAP - check the swap usage

[ /etc/nrpe.d/check_swap ]

command[check_swap]=/usr/lib/nagios/plugins/check_swap -b -w 50% -c 80%

--- Hardware Monitoring ---

LNX_TEMPERATURE - monitors the hardware's temperature

[ /etc/nrpe.d/check_temperature ]

command[check_temp_zone0]=/usr/lib/nagios/plugins/check_temperature -t thermal_zone0 -w 80 -c 90

This plugin monitors the hardware's temperature.

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_temperature [-f|-k] [-t <thermal_zone>] [-w COUNTER] [-c COUNTER]

Options:

-f, --fahrenheit use fahrenheit as the temperature unit

-k, --kelvin use kelvin as the temperature unit

-t, --thermal_zone only consider a specific thermal zone

-w, --warning COUNTER warning threshold

-c, --critical COUNTER critical threshold

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_temperature -w 80 -c 90

check_temperature -t 0 -w 78 -c 83

Example of output

temperature OK - 65.5 degrees C (thermal zone: 0, type: "acpitz") | temp=41C;0;85

Performance data

temp

Usage note

This plugins monitors the hardware temperature reported by the Linux kernel in /sys/class/thermal/.

Unless a thermal zone is specified at command line, by using the option '-t', all the values reported by sysfs are taken into account and the highest temperature is selected by the plugin.

Documentation

Official Linux kernel documentation: sysfs-api

--- Monitoring Processes and Threads ---

LNX_NBPROCS - displays the number of running processes per user

[ /etc/nrpe.d/check_nbprocs ]

command[check_nbprocs]=/usr/lib/nagios/plugins/check_nbprocs --threads -w 1500 -c 2000

--- Monitoring Network Interfaces Statistics and Connections ---

LNX_NETWORK - displays some network interfaces statistics

[ /etc/nrpe.d/check_network ]

command[check_network]=/usr/lib/nagios/plugins/check_network

LNX_TCP_COUNT - check the tcp network usage (tcp eshablished connections)

[ /etc/nrpe.d/check_tcpcount ]

command[check_tcp4count]=/usr/lib/nagios/plugins/check_tcpcount -w 1500 -c 2000

command[check_tcp6count]=/usr/lib/nagios/plugins/check_tcpcount --tcp6 -w 1500 -c 2000

command[check_tcpcount]=/usr/lib/nagios/plugins/check_tcpcount --tcp --tcp6 -w 1500 -c 2000

--- Monitoring connected Users ---

LNX_USERS - display the number of users that are currently logged on

[ /etc/nrpe.d/check_users ]

command[check_users]=/usr/lib/nagios/plugins/check_users -w 1

This plugin displays the number of users that are currently logged on.

Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

Usage:

check_users [-w COUNTER] [-c COUNTER]

Options:

-w, --warning COUNTER warning threshold

-c, --critical COUNTER critical threshold

-v, --verbose show details for command-line debugging

(Nagios may truncate output)

-h, --help display this help and exit

-V, --version output version information and exit

Examples:

check_users -w 1

Example of output

users WARNING - 2 users logged on | logged_users=2

--- Checks planned but not implemented (yet) ---

LNX_REPORTIO - Not Available

N/A