Monitoring Linux Operating System

Software 

Nagios Plugins for Linux
A suite of Nagios plugins for monitoring Linux servers and appliances

Namenagios-plugins-linux
TagsUtilities/Monitoring
LicenseGPL v3+
Operating SystemLinux

Implementation
C (C99)
This package is known to compile with:
  
● gcc 4.1.2 (Red Hat Enterprise Linux 5, CentOS 5),
  
● gcc 4.4 (Red Hat Enterprise Linux 6
, CentOS 6),
  ● gcc 4.8.2 (Red Hat Enterprise Linux 7, CentOS 7),
  gcc 4.9.0-4.9.2 and clang 3.1 and 3.5.1 (openmamba GNU/Linux 2.90+).
but should compile on all relatively recent Linux distributions.
List of the Linux kernels that have been successfully tested: 2.6.18, 2.6.32, 3.10, 3.14.

Last stable version
:
 version 17 - "400th (git
) commit"

Available plugins
:

check_clock
check_cpu
- improved in version 15 and 16
check_cpufreq - new plugin in version 16
check_cswch
check_fc -
new plugin in version 17
check_ifmountfs
check_intr
check_iowait
check_load
check_memory 
-
improved in version 15
check_multipath improved in version 16
check_nbprocs
check_network
check_paging
check_readonlyfs
check_swap
check_tcpcount
check_temperature

check_uptime
check_users

Developement : GitHub
Documentation : GitHub/README
Nagios Exchange PageNagios Plugins Linux

    Download
    the latest stable source archive here

    How to build the source code

    This package uses the GNU autotools for configuration and installation.

    If you have cloned the git repository then you will need to run autoreconf to generate the required files.

    Run ./configure --help to see a list of available install options.
    The plugin will be installed by default into LIBEXECDIR.
    It is highly likely that you will want to customise this location to suit your needs, i.e.:

    ./configure --libexecdir=/usr/lib/nagios/plugins

    After ./configure has completed successfully run

    make install

    (as root) and you're done!
    Or much better, but for advanced users, do create a package in the format supported by your Linux distribution and install it.

    Available Nagios Plugins

    A Linux server or appliance can be fully monitored by the Nagios/NRPE services listed below.
    All the binaries are provided by the nagios-plugins-linux software.

     
    --- Monitoring Time and System Uptime ---

    LNX_CLOCK
     returns the number of seconds elapsed between local time and Nagios time

    /etc/nrpe.d/check_clock ]
    command[check_clock]=/usr/lib/nagios/plugins/check_clock --refclock $ARG1$ -w 60 -c 120

    where $ARG1$ is the number of secondss since the "Epoch"
    (
    1970-01-01 00:00:00 UTC
    -- $(date '+%s') -- provided by the Nagios poller.

    Usage note

    This check is intended for alerting when the number of seconds elapsed between the Nagios poller and the monitored server exceeds a given threshold (60 seconds for the warning state, and 120 seconds for a critical notification, in the example above).
    The clock of the Nagios server needs, of course, to be synchronized to an NTP server.


    This plugin returns the number of seconds elapsed between
    the host local time and Nagios time.
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_clock [-w COUNTER] [-c COUNTER] --refclock TIME

    Options:
      -r, --refclock COUNTER  the clock reference (in seconds since the Epoch)
      -w, --warning COUNTER   warning threshold
      -c, --critical COUNTER  critical threshold
      -v, --verbose   show details for command-line debugging
                      (Nagios may truncate output)
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_clock -w 60 -c 120 --refclock $ARG1$

      # where $ARG1$ is the number of seconds since the Epoch: "$(date '+%s')"
      # provided by the Nagios poller


    Example of output
    clock OK - time delta 39s | clock_delta=39

    Performance data
    clock_delta

    LNX_UPTIME check how long the system has been running

    /etc/nrpe.d/check_uptime ]
    command[check_uptime]=/usr/lib/nagios/plugins/check_uptime
    command[check_uptime_notify]=/usr/lib/nagios/plugins/check_uptime --critical 30:

     Usage note

    In the example above, a notification will be sent by Nagios when the uptime of the monitored server will be less than 30 minutes. This will catch, for instance, an unexpected reboot of a servers
    caused by a
     non-maskable interrupt (a signal of a non-recoverable hardware error).

    ☛ A note on the implementation of "check_uptime" provided by nagios-plugins 2.0+

    This new Nagios plugin is based on the POSIX function clock_gettime() associated with the clock monotonic option (CLOCK_MONOTONIC)
    According to the POSIX specifications "the value returned by clock_gettime() represents the amount of time (in seconds and nanoseconds) since an unspecified point in the past (for example, system start-time, or the Epoch)".
    The (recent) Linux kernels returns a value that is somehow related to the system start-time but can be different from the output of the
    command uptime (procps), or the first value of /proc/uptime.

    $ /usr/bin/uptime
    18:45:00 up  8:46
    ,  7 users,  load average: 0.67, 1.79, 2.49

    awk '{printf("%02d:%02d\n",($1/60/60%24),($1/60%60))}' /proc/uptime
    08:46

    $ ./clock_monotonic
    4 hours 37 min


    (On OpenBSD 5.0, the clock monotonic function returns the same value as uptime, which is confirming this behaviour is platform dependent).

    The implementation followed by nagios-plugins-linux is compatible with uptime and /proc/uptime.


    This plugin checks how long the system has been running.
    Copyright (C) 2010,2012-2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_uptime [OPTION]

    Options:
      -m, --clock-monotonic  use the monotonic clock for retrieving the time
      -w, --warning PERCENT   warning threshold
      -c, --critical PERCENT   critical threshold
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_uptime
      check_uptime --critical 15: --warning 30:
      check_uptime --clock-monotonic -c 15: -w 30:

    See the Nagios Developer Guidelines for range format:
    <https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>


    Example of output
    uptime OK: 23 hours 56 min | uptime=1436

    Performance data
    uptime (in minutes)


    --- Monitoring CPU and System Load ---

    LNX_CPU
     check the CPU (user mode) utilization

    /etc/nrpe.d/check_cpu ]
    command[check_cpu]=/usr/lib/nagios/plugins/check_cpu -f -w 85% -c 95%


    This plugin checks the CPU (user mode) utilization
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_cpu [-v] [-m] [-p] [-w PERC] [-c PERC] [delay [count]]
      check_cpu --cpuinfo

    Options:

         -m, --no-cpu-model  do not display the CPU model in the output message
      -p, --per-cpu   display the utilization of each CPU 
      -w, --warning PERCENT   warning threshold
      -c, --critical PERCENT   critical threshold
      -v, --verbose   show details for command-line debugging
                      (Nagios may truncate output)
      -i, --cpuinfo   show the CPU characteristics (for debugging)
      -h, --help      display this help and exit
      -V, --version   output version information and exit
      delay is the delay between updates in seconds (default: 1sec)
      count is the number of updates (default: 2)
            1 means the percentages of total CPU time from boottime.

    Examples:
      check_cpu -m -p -w 85% -c 95%
      check_cpu -w 85% -c 95% 1 2
      check_cpu --cpuinfo


    Example of output
    cpu CPU: OK - cpu user 79.5% | cpu_user=79.5% cpu_system=20.5% cpu_idle=0.0% cpu_iowait=3% cpu_steal=0% 

    Performance data
    cpu_user
    cpu_system
    cpu_idle
    cpu_iowait
    cpu_steal

    T
    his plugin can also display some CPU informations ("check_cpu --cpuinfo")


    -= CPU Characteristics =-
    Architecture:                 i686
    CPU op-mode(s):               32-bit
    Byte Order:                   Little Endian
    CPU(s):                       2
    Thread(s) per core:           2
    Core(s) per socket:           1
    Socket(s):                    1
    Vendor ID:                    GenuineIntel
    CPU Family:                   6
    Model:                        28
    Model name:                   Intel(R) Atom(TM) CPU N270   @ 1.60GHz
    -CPU0-
    CPU is Hot Pluggable:         no
    Maximum Transition Latency:   10.0us
    Current CPU Frequency:        1.07GHz
    Available CPU Frequencies:    1.60GHz 1.33GHz 1.07GHz 800MHz
    Hardware Limits:              800MHz - 1.60GHz
    CPU freq Current Governor:    ondemand
    CPU freq Available Governors: ondemand userspace 
    CPU freq Driver:              acpi-cpufreq
    -CPU1-
    CPU is Hot Pluggable:         yes (online)
    Maximum Transition Latency:   10.0us
    Current CPU frequency:        800MHz
    Available CPU Frequencies:    1.60GHz 1.33GHz 1.07GHz 800MHz
    Hardware Limits:              800MHz - 1.60GHz
    CPU freq Current Governor:    ondemand
    CPU freq Available Governors: ondemand userspace
    CPU freq Driver:              acpi-cpufreq


     Documentation
    Here are a few interesting internet links where you can find some stuff related to cpu and sysfs:
    LNX_CPUFREQ displays the CPU frequency characteristics

    /etc/nrpe.d/check_cpufreq ]
    command[check_cpufreq]=/usr/lib/nagios/plugins/check_cpufreq



    This plugin checks the CPU (user mode) utilization
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_cpufreq [-m] [-w PERC] [-c PERC]

    Options:
         -m, --no-cpu-model  do not display the CPU model in the output message
      -w, --warning PERCENT   warning threshold
      -c, --critical PERCENT   critical threshold
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_cpufreq -m -w 800000


    Example of output
    cpufreq CPU: cpu0_freq=800000Hz;;;800000;1600000 cpu1_freq=1600000Hz;;;800000;1600000

    Performance data
    cpu0_freq
    cpu1_freq
    ...

    LNX_CSSWCH - monitors the total number of context switches across all CPUs

    /etc/nrpe.d/check_cswch ]
    command[check_cswch]=/usr/lib/nagios/plugins/check_cswch 1 2

    Example of output

    cswch OK - number of context switches/s 1317 | cswch/s=1317

    Performance data
    cswch/s

     Documentation
    LINFO (The Linux Information Project) - Context Switch

    LNX_INTERRUPTSmonitors the total number of system interrupts

    /etc/nrpe.d/check_intr ]
    command[check_intr]=/usr/lib/nagios/plugins/check_intr 1 2

    Example of output

    intr OK - number of interrupts/s 9318 | intr/s=9318 intr_cpu0/s=1157 intr_cpu1/s=1724 intr_cpu2/s=2862 intr_cpu3/s=3579

    Performance data
    intr/s
    intr_cpu0/s
    intr_cpu1/s
    ...


     Usage note
    The variable intr report the total number of interrupts, for each of the possible system interrupts, including unnumbered architecture specific ones.
    The performance data intr_cpureport the number of interrupts per cpu per IO device.
    Since Linux 2.6.24, for the i386 and x86_64 architectures at least, this also includes interrupts internal to the system (that is, not associated with a device as such)

    LNX_IOWAIT monitor the I/O wait bottlenecks

    /etc/nrpe.d/check_iowait ]
    command[check_iowait]=/usr/lib/nagios/plugins/check_iowait -m -w 20% -c 30%

    Example of output
    iowait OK - cpu iowait 0% | cpu_user=31% cpu_system=8% cpu_idle=61% cpu_iowait=0% cpu_steal=0% cpu_freq=1600MHz

    Performance data
    see LNX_CPU

    LNX_LOAD check the current system load average

    /etc/nrpe.d/check_load ]
    command[check_load]=/usr/lib/nagios/plugins/check_load -r --load15=1.5,3.0


    This plugin checks the current system load average.
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_load [-r] [--load1=w,c] [--load5=w,c] [--load15=w,c]

    Options:
      -r, --percpu    divide the load averages by the number of CPUs
      -1, --load1=WLOAD1,CLOAD1   warning and critical thresholds for load1
      -5, --load5=WLOAD5,CLOAD5   warning and critical thresholds for load5
      -L, --load15=WLOAD15,CLOAD15  warning and critical thresholds for load15
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_load -r --load1=2,3 --load15=1.5,2.5


    Example of output
    load OK - average: 2.66, 2.95, 2.01 | load1=2.660;0.000;0.000;0, load5=2.950;0.000;0.000;0, load15=2.010;0.000;0.000;0

    Performance data
    load1
    load5
    load15


    --- Monitoring Filesystems and Disks ---

    LNX_DISK
     - You can use the official Nagios Plugins (check_disk)

    LNX_IFMOUNTFS - check whether the given filesystems are mounted

    /etc/nrpe.d/check_ifmountfs ]
    command[check_ifmountfs]=/usr/lib/nagios/plugins/check_ifmountfs /mnt/nfs-data,/dev/cdrom

    LNX_MULTIPATH check the multipath topology status

    /etc/nrpe.d/check_multipath ]
    command[check_multipath]=/usr/bin/sudo /usr/lib/nagios/plugins/check_multipath

    LNX_READONYFS check for readonly filesystems

    /etc/nrpe.d/check_readonlyfs ]
    command[check_rofs]=/usr/lib/nagios/plugins/check_readonlyfs -l -X cgroup -X tmpfs


    --- Monitoring Memory, Swap and Paging ---

    LNX_MEMORY check the memory usage

    /etc/nrpe.d/check_memory ]
    command[check_memory]=/usr/lib/nagios/plugins/check_memory -b -w 85% -c 95%


    This plugin checks the system memory utilization.
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_memory [-a] [-b,-k,-m,-g] -s -w PERC -c PERC

    Options:
      -a, --available display the free/available memory
      -b,-k,-m,-g     show output in bytes; KB (the default), MB, or GB
      -s, --vmstats   display the virtual memory perfdata
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_memory --available -w 20%: -c 10%:
      check_memory --vmstats -w 80% -c 90%


    Example of output
    memory OK: 26.08% (266580 kB) used | mem_total=1023312kB,\
    mem_used=266580kB, mem_free=171548kB, mem_shared=51244kB,\
    mem_buffers=34744kB, mem_cached=550440kB, mem_available=674712kB,\
    mem_active=325136kB, mem_anonpages=240464kB,\ mem_committed=1704152kB, mem_dirty=604kB, mem_inactive=468904kB,\ vmem_pageins/s=128, vmem_pageouts/s=0, vmem_pgmajfaults/s=0

    Performance data
    mem_total      Total usable physical RAM
    mem_used Total amount of physical RAM used by the system
    mem_free Amount of RAM that is currently unused
    mem_shared Now always zero; not calculated
    mem_buffers Amount of physical RAM used for file buffers
    mem_cached In-memory cache for files read from the disk
    (the page cache)
    mem_available kernel >= 2.6.27: memory available for starting new
    applications, without swapping
    mem_available kernel < 2.6.27: same as 'mem_free'
    mem_active Memory that has been used more recently
    mem_anonpages Non-file backed pages mapped into user-space page tables
    mem_committed The amount of memory presently allocated on the system
    mem_dirty Memory which is waiting to get written back to the disk
    mem_inactive Memory which has been less recently used
    vmem_pageins
    vmem_pageouts The number of memory pages the system has written in
    and out to disk
    vmem_pgmajfault The number of memory major pagefaults
    Internet documentation on this topic: 
    LNX_PAGING check the memory and swap paging

    /etc/nrpe.d/check_paging ]
    command[check_paging]=/usr/lib/nagios/plugins/check_paging --swapping -w 10 -c 25

    LNX_SWAP
     check the swap usage

    /etc/nrpe.d/check_swap ]
    command[check_swap]=/usr/lib/nagios/plugins/check_swap -b -w 50% -c 80%


    https://sites.google.com/site/davidemadrisan/files/monitoring-hardware.png
    --- Hardware Monitoring ---

    LNX_TEMPERATURE monitors the hardware's temperature

    /etc/nrpe.d/check_temperature ]
    command[check_temp_zone0]=/usr/lib/nagios/plugins/check_temperature -t thermal_zone0 -w 80 -c 90


    This plugin monitors the hardware's temperature.
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_temperature [-f|-k] [-t <thermal_zone>] [-w COUNTER] [-c COUNTER]

    Options:
      -f, --fahrenheit  use fahrenheit as the temperature unit
      -k, --kelvin    use kelvin as the temperature unit
      -t, --thermal_zone    only consider a specific thermal zone
      -w, --warning COUNTER   warning threshold
      -c, --critical COUNTER   critical threshold
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_temperature -w 80 -c 90
      check_temperature -t 0 -w 78 -c 83


    Example of output
    temperature OK - 65.5 degrees C (thermal zone: 0, type: "acpitz") | temp=41C;0;85

    Performance data
    temp

     Usage note
    This plugins monitors the hardware temperature reported by the Linux kernel in /sys/class/thermal/.
    Unless a thermal zone is specified at command line, by using the option '-t', all the values reported by sysfs are taken into account and the highest temperature is selected by the plugin.

     Documentation
    Official Linux kernel documentation: 
    sysfs-api


    --- Monitoring Processes and Threads ---

    LNX_NBPROCS displays the number of running processes per user

    /etc/nrpe.d/check_nbprocs ]
    command[check_nbprocs]=/usr/lib/nagios/plugins/check_nbprocs --threads -w 1500 -c 2000


    --- Monitoring Network Interfaces Statistics and Connections ---

    LNX_NETWORK displays some network interfaces statistics

    /etc/nrpe.d/check_network ]
    command[check_network]=/usr/lib/nagios/plugins/check_network

    LNX_TCP_COUNT - check the tcp network usage (tcp eshablished connections)

    /etc/nrpe.d/check_tcpcount ]
    command[check_tcp4count]=/usr/lib/nagios/plugins/check_tcpcount -w 1500 -c 2000
    command[check_tcp6count]=/usr/lib/nagios/plugins/check_tcpcount --tcp6 -w 1500 -c 2000
    command[check_tcpcount]=/usr/lib/nagios/plugins/check_tcpcount --tcp --tcp6 -w 1500 -c 2000


    --- Monitoring connected Users ---

    LNX_USERS display the number of users that are currently logged on

    /etc/nrpe.d/check_users ]
    command[check_users]=/usr/lib/nagios/plugins/check_users -w 1


    This plugin displays the number of users that are currently logged on.
    Copyright (C) 2014 Davide Madrisan <davide.madrisan@gmail.com>

    Usage:
      check_users [-w COUNTER] [-c COUNTER]

    Options:
      -w, --warning COUNTER   warning threshold
      -c, --critical COUNTER   critical threshold
      -v, --verbose   show details for command-line debugging
                      (Nagios may truncate output)
      -h, --help      display this help and exit
      -V, --version   output version information and exit

    Examples:
      check_users -w 1


    Example of output
    users WARNING - 2 users logged on | logged_users=2



    --- Checks planned but not implemented (yet) ---

    LNX_REPORTIO Not Available

    N/A

    Comments