Monitoring Linux Operating System
Software
Nagios Plugins for Linux
A suite of Nagios plugins for monitoring Linux servers and appliances
Name: nagios-plugins-linux
Tags: Utilities/Monitoring
License: GPL v3+
Operating System: Linux
Implementation: C (C99)
This package is known to compile with:
but should compile on all relatively recent Linux distributions.
● gcc 4.9.0-4.9.2 and clang 3.1 and 3.5.1 (openmamba GNU/Linux 2.90+).
● gcc 4.1.2 (Red Hat Enterprise Linux 5, CentOS 5),
● gcc 4.4 (Red Hat Enterprise Linux 6, CentOS 6),
● gcc 4.8.2 (Red Hat Enterprise Linux 7, CentOS 7),
List of the Linux kernels that have been successfully tested: 2.6.18, 2.6.32, 3.10, 3.14.
Last stable version: version 17 - "400th (git) commit"
Available plugins:
check_clock
check_cpu - improved in version 15 and 16
check_cpufreq - new plugin in version 16
check_cswch
check_fc - new plugin in version 17
check_ifmountfs
check_intr
check_iowait
check_load
check_memory - improved in version 15
check_multipath - improved in version 16
check_nbprocs
check_network
check_paging
check_readonlyfs
check_swap
check_tcpcount
check_temperature
check_uptime
check_users
Developement : GitHub
Documentation : GitHub/README
Nagios Exchange Page : Nagios Plugins Linux
Download the latest stable source archive here
How to build the source code
This package uses the GNU autotools for configuration and installation.
If you have cloned the git repository then you will need to run autoreconf to generate the required files.
Run ./configure --help to see a list of available install options.
The plugin will be installed by default into LIBEXECDIR.
It is highly likely that you will want to customise this location to suit your needs, i.e.:
./configure --libexecdir=/usr/lib/nagios/plugins
After ./configure has completed successfully run
make install
(as root) and you're done!
Or much better, but for advanced users, do create a package in the format supported by your Linux distribution and install it.
Available Nagios Plugins
A Linux server or appliance can be fully monitored by the Nagios/NRPE services listed below.
All the binaries are provided by the nagios-plugins-linux software.
--- Monitoring Time and System Uptime ---
LNX_CLOCK - returns the number of seconds elapsed between local time and Nagios time
[ /etc/nrpe.d/check_clock ]
command[check_clock]=/usr/lib/nagios/plugins/check_clock --refclock $ARG1$ -w 60 -c 120
where $ARG1$ is the number of secondss since the "Epoch"
(1970-01-01 00:00:00 UTC) -- $(date '+%s') -- provided by the Nagios poller.
☛Usage note
This check is intended for alerting when the number of seconds elapsed between the Nagios poller and the monitored server exceeds a given threshold (60 seconds for the warning state, and 120 seconds for a critical notification, in the example above).
The clock of the Nagios server needs, of course, to be synchronized to an NTP server.
This plugin returns the number of seconds elapsed between
the host local time and Nagios time.
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_clock [-w COUNTER] [-c COUNTER] --refclock TIME
Options:
-r, --refclock COUNTER the clock reference (in seconds since the Epoch)
-w, --warning COUNTER warning threshold
-c, --critical COUNTER critical threshold
-v, --verbose show details for command-line debugging
(Nagios may truncate output)
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_clock -w 60 -c 120 --refclock $ARG1$
# where $ARG1$ is the number of seconds since the Epoch: "$(date '+%s')"
# provided by the Nagios poller
Example of output
clock OK - time delta 39s | clock_delta=39
Performance data
clock_delta
LNX_UPTIME - check how long the system has been running
[ /etc/nrpe.d/check_uptime ]
command[check_uptime]=/usr/lib/nagios/plugins/check_uptime
command[check_uptime_notify]=/usr/lib/nagios/plugins/check_uptime --critical 30:
☛ Usage note
In the example above, a notification will be sent by Nagios when the uptime of the monitored server will be less than 30 minutes. This will catch, for instance, an unexpected reboot of a servers caused by a non-maskable interrupt (a signal of a non-recoverable hardware error).
☛ A note on the implementation of "check_uptime" provided by nagios-plugins 2.0+
This new Nagios plugin is based on the POSIX function clock_gettime() associated with the clock monotonic option (CLOCK_MONOTONIC).
According to the POSIX specifications "the value returned by clock_gettime() represents the amount of time (in seconds and nanoseconds) since an unspecified point in the past (for example, system start-time, or the Epoch)".
The (recent) Linux kernels returns a value that is somehow related to the system start-time but can be different from the output of the command uptime (procps), or the first value of /proc/uptime.
$ /usr/bin/uptime
18:45:00 up 8:46, 7 users, load average: 0.67, 1.79, 2.49
$ awk '{printf("%02d:%02d\n",($1/60/60%24),($1/60%60))}' /proc/uptime
08:46
$ ./clock_monotonic
4 hours 37 min
(On OpenBSD 5.0, the clock monotonic function returns the same value as uptime, which is confirming this behaviour is platform dependent).
The implementation followed by nagios-plugins-linux is compatible with uptime and /proc/uptime.
This plugin checks how long the system has been running.
Copyright (C) 2010,2012-2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_uptime [OPTION]
Options:
-m, --clock-monotonic use the monotonic clock for retrieving the time
-w, --warning PERCENT warning threshold
-c, --critical PERCENT critical threshold
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_uptime
check_uptime --critical 15: --warning 30:
check_uptime --clock-monotonic -c 15: -w 30:
See the Nagios Developer Guidelines for range format:
<https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>
Example of output
uptime OK: 23 hours 56 min | uptime=1436
Performance data
uptime (in minutes)
--- Monitoring CPU and System Load ---
LNX_CPU - check the CPU (user mode) utilization
[ /etc/nrpe.d/check_cpu ]
command[check_cpu]=/usr/lib/nagios/plugins/check_cpu -f -w 85% -c 95%
This plugin checks the CPU (user mode) utilization
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_cpu [-v] [-m] [-p] [-w PERC] [-c PERC] [delay [count]]
check_cpu --cpuinfo
Options:
-m, --no-cpu-model do not display the CPU model in the output message
-p, --per-cpu display the utilization of each CPU
-w, --warning PERCENT warning threshold
-c, --critical PERCENT critical threshold
-v, --verbose show details for command-line debugging
(Nagios may truncate output)
-i, --cpuinfo show the CPU characteristics (for debugging)
-h, --help display this help and exit
-V, --version output version information and exit
delay is the delay between updates in seconds (default: 1sec)
count is the number of updates (default: 2)
1 means the percentages of total CPU time from boottime.
Examples:
check_cpu -m -p -w 85% -c 95%
check_cpu -w 85% -c 95% 1 2
check_cpu --cpuinfo
Example of output
cpu CPU: OK - cpu user 79.5% | cpu_user=79.5% cpu_system=20.5% cpu_idle=0.0% cpu_iowait=3% cpu_steal=0%
Performance data
cpu_user
cpu_system
cpu_idle
cpu_iowait
cpu_steal
This plugin can also display some CPU informations ("check_cpu --cpuinfo")
-= CPU Characteristics =-
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 2
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
Vendor ID: GenuineIntel
CPU Family: 6
Model: 28
Model name: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
-CPU0-
CPU is Hot Pluggable: no
Maximum Transition Latency: 10.0us
Current CPU Frequency: 1.07GHz
Available CPU Frequencies: 1.60GHz 1.33GHz 1.07GHz 800MHz
Hardware Limits: 800MHz - 1.60GHz
CPU freq Current Governor: ondemand
CPU freq Available Governors: ondemand userspace
CPU freq Driver: acpi-cpufreq
-CPU1-
CPU is Hot Pluggable: yes (online)
Maximum Transition Latency: 10.0us
Current CPU frequency: 800MHz
Available CPU Frequencies: 1.60GHz 1.33GHz 1.07GHz 800MHz
Hardware Limits: 800MHz - 1.60GHz
CPU freq Current Governor: ondemand
CPU freq Available Governors: ondemand userspace
CPU freq Driver: acpi-cpufreq
☛ Documentation
Here are a few interesting internet links where you can find some stuff related to cpu and sysfs:
LNX_CPUFREQ - displays the CPU frequency characteristics
[ /etc/nrpe.d/check_cpufreq ]
command[check_cpufreq]=/usr/lib/nagios/plugins/check_cpufreq
This plugin checks the CPU (user mode) utilization
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_cpufreq [-m] [-w PERC] [-c PERC]
Options:
-m, --no-cpu-model do not display the CPU model in the output message
-w, --warning PERCENT warning threshold
-c, --critical PERCENT critical threshold
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_cpufreq -m -w 800000
Example of output
cpufreq CPU: | cpu0_freq=800000Hz;;;800000;1600000 cpu1_freq=1600000Hz;;;800000;1600000
Performance data
cpu0_freq
cpu1_freq
...
LNX_CSSWCH - monitors the total number of context switches across all CPUs
[ /etc/nrpe.d/check_cswch ]
command[check_cswch]=/usr/lib/nagios/plugins/check_cswch 1 2
Example of output
cswch OK - number of context switches/s 1317 | cswch/s=1317
Performance data
cswch/s
☛ Documentation
LINFO (The Linux Information Project) - Context Switch
LNX_INTERRUPTS- monitors the total number of system interrupts
[ /etc/nrpe.d/check_intr ]
command[check_intr]=/usr/lib/nagios/plugins/check_intr 1 2
Example of output
intr OK - number of interrupts/s 9318 | intr/s=9318 intr_cpu0/s=1157 intr_cpu1/s=1724 intr_cpu2/s=2862 intr_cpu3/s=3579
Performance data
intr/s
intr_cpu0/s
intr_cpu1/s
...
☛ Usage note
The variable intr report the total number of interrupts, for each of the possible system interrupts, including unnumbered architecture specific ones.
The performance data intr_cpuN report the number of interrupts per cpu per IO device.
Since Linux 2.6.24, for the i386 and x86_64 architectures at least, this also includes interrupts internal to the system (that is, not associated with a device as such)
LNX_IOWAIT - monitor the I/O wait bottlenecks
[ /etc/nrpe.d/check_iowait ]
command[check_iowait]=/usr/lib/nagios/plugins/check_iowait -m -w 20% -c 30%
Example of output
iowait OK - cpu iowait 0% | cpu_user=31% cpu_system=8% cpu_idle=61% cpu_iowait=0% cpu_steal=0% cpu_freq=1600MHz
Performance data
see LNX_CPU
LNX_LOAD - check the current system load average
[ /etc/nrpe.d/check_load ]
command[check_load]=/usr/lib/nagios/plugins/check_load -r --load15=1.5,3.0
This plugin checks the current system load average.
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_load [-r] [--load1=w,c] [--load5=w,c] [--load15=w,c]
Options:
-r, --percpu divide the load averages by the number of CPUs
-1, --load1=WLOAD1,CLOAD1 warning and critical thresholds for load1
-5, --load5=WLOAD5,CLOAD5 warning and critical thresholds for load5
-L, --load15=WLOAD15,CLOAD15 warning and critical thresholds for load15
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_load -r --load1=2,3 --load15=1.5,2.5
Example of output
load OK - average: 2.66, 2.95, 2.01 | load1=2.660;0.000;0.000;0, load5=2.950;0.000;0.000;0, load15=2.010;0.000;0.000;0
Performance data
load1
load5
load15
--- Monitoring Filesystems and Disks ---
LNX_DISK - You can use the official Nagios Plugins (check_disk)
LNX_IFMOUNTFS - check whether the given filesystems are mounted
[ /etc/nrpe.d/check_ifmountfs ]
command[check_ifmountfs]=/usr/lib/nagios/plugins/check_ifmountfs /mnt/nfs-data,/dev/cdrom
LNX_MULTIPATH - check the multipath topology status
[ /etc/nrpe.d/check_multipath ]
command[check_multipath]=/usr/bin/sudo /usr/lib/nagios/plugins/check_multipath
LNX_READONYFS - check for readonly filesystems
[ /etc/nrpe.d/check_readonlyfs ]
command[check_rofs]=/usr/lib/nagios/plugins/check_readonlyfs -l -X cgroup -X tmpfs
--- Monitoring Memory, Swap and Paging ---
LNX_MEMORY - check the memory usage
[ /etc/nrpe.d/check_memory ]
command[check_memory]=/usr/lib/nagios/plugins/check_memory -b -w 85% -c 95%
This plugin checks the system memory utilization.
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_memory [-a] [-b,-k,-m,-g] -s -w PERC -c PERC
Options:
-a, --available display the free/available memory
-b,-k,-m,-g show output in bytes; KB (the default), MB, or GB
-s, --vmstats display the virtual memory perfdata
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_memory --available -w 20%: -c 10%:
check_memory --vmstats -w 80% -c 90%
Example of output
memory OK: 26.08% (266580 kB) used | mem_total=1023312kB,\
mem_used=266580kB, mem_free=171548kB, mem_shared=51244kB,\
mem_buffers=34744kB, mem_cached=550440kB, mem_available=674712kB,\
mem_active=325136kB, mem_anonpages=240464kB,\ mem_committed=1704152kB, mem_dirty=604kB, mem_inactive=468904kB,\ vmem_pageins/s=128, vmem_pageouts/s=0, vmem_pgmajfaults/s=0
Performance data
mem_total Total usable physical RAM
mem_used Total amount of physical RAM used by the system
mem_free Amount of RAM that is currently unused
mem_shared Now always zero; not calculated
mem_buffers Amount of physical RAM used for file buffers
mem_cached In-memory cache for files read from the disk
(the page cache)
mem_available kernel >= 2.6.27: memory available for starting new
applications, without swapping
mem_available kernel < 2.6.27: same as 'mem_free'
mem_active Memory that has been used more recently
mem_anonpages Non-file backed pages mapped into user-space page tables
mem_committed The amount of memory presently allocated on the system
mem_dirty Memory which is waiting to get written back to the disk
mem_inactive Memory which has been less recently used
vmem_pageins
vmem_pageouts The number of memory pages the system has written in
and out to disk
vmem_pgmajfault The number of memory major pagefaults
Internet documentation on this topic:
A good article on the subject - Understanding and optimizing Memory utilization
The Qt4 Memory Monitor
LNX_PAGING - check the memory and swap paging
[ /etc/nrpe.d/check_paging ]
command[check_paging]=/usr/lib/nagios/plugins/check_paging --swapping -w 10 -c 25
LNX_SWAP - check the swap usage
[ /etc/nrpe.d/check_swap ]
command[check_swap]=/usr/lib/nagios/plugins/check_swap -b -w 50% -c 80%
--- Hardware Monitoring ---
LNX_TEMPERATURE - monitors the hardware's temperature
[ /etc/nrpe.d/check_temperature ]
command[check_temp_zone0]=/usr/lib/nagios/plugins/check_temperature -t thermal_zone0 -w 80 -c 90
This plugin monitors the hardware's temperature.
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_temperature [-f|-k] [-t <thermal_zone>] [-w COUNTER] [-c COUNTER]
Options:
-f, --fahrenheit use fahrenheit as the temperature unit
-k, --kelvin use kelvin as the temperature unit
-t, --thermal_zone only consider a specific thermal zone
-w, --warning COUNTER warning threshold
-c, --critical COUNTER critical threshold
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_temperature -w 80 -c 90
check_temperature -t 0 -w 78 -c 83
Example of output
temperature OK - 65.5 degrees C (thermal zone: 0, type: "acpitz") | temp=41C;0;85
Performance data
temp
☛ Usage note
This plugins monitors the hardware temperature reported by the Linux kernel in /sys/class/thermal/.
Unless a thermal zone is specified at command line, by using the option '-t', all the values reported by sysfs are taken into account and the highest temperature is selected by the plugin.
☛ Documentation
Official Linux kernel documentation: sysfs-api
--- Monitoring Processes and Threads ---
LNX_NBPROCS - displays the number of running processes per user
[ /etc/nrpe.d/check_nbprocs ]
command[check_nbprocs]=/usr/lib/nagios/plugins/check_nbprocs --threads -w 1500 -c 2000
--- Monitoring Network Interfaces Statistics and Connections ---
LNX_NETWORK - displays some network interfaces statistics
[ /etc/nrpe.d/check_network ]
command[check_network]=/usr/lib/nagios/plugins/check_network
LNX_TCP_COUNT - check the tcp network usage (tcp eshablished connections)
[ /etc/nrpe.d/check_tcpcount ]
command[check_tcp4count]=/usr/lib/nagios/plugins/check_tcpcount -w 1500 -c 2000
command[check_tcp6count]=/usr/lib/nagios/plugins/check_tcpcount --tcp6 -w 1500 -c 2000
command[check_tcpcount]=/usr/lib/nagios/plugins/check_tcpcount --tcp --tcp6 -w 1500 -c 2000
--- Monitoring connected Users ---
LNX_USERS - display the number of users that are currently logged on
[ /etc/nrpe.d/check_users ]
command[check_users]=/usr/lib/nagios/plugins/check_users -w 1
This plugin displays the number of users that are currently logged on.
Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>
Usage:
check_users [-w COUNTER] [-c COUNTER]
Options:
-w, --warning COUNTER warning threshold
-c, --critical COUNTER critical threshold
-v, --verbose show details for command-line debugging
(Nagios may truncate output)
-h, --help display this help and exit
-V, --version output version information and exit
Examples:
check_users -w 1
Example of output
users WARNING - 2 users logged on | logged_users=2
--- Checks planned but not implemented (yet) ---
LNX_REPORTIO - Not Available
N/A