Monitoring

System and application monitoring is necessary to satisfy ever increasing SLA requirements imposed by business. There are various FOSS and proprietary monitoring solutions, each with their own advantages and limitations. If cost is a factor, then FOSS is clearly a desirable solution. If support is a factor, then proprietary systems will generally include service agreements, at a cost.

The act of monitoring a system or applications involves the execution of an Agent on the monitored (target) system. The Agent that runs on the target system may or may not be executed under the host operating system of that machine. An example of this is IPMI, Intelligent Platform Management Interface. IPMI provides hardware level (platform) management and monitoring of a target system. Most hardware vendors provide IPMI access and can be configured to generate an alert (email) should an overtemp or fan fault occur. The IPMI service is provided by firmware in the machine hardware; it runs outside of the operating system. The Linux package, ipmitool, provides a cli to gain access to IPMI.

The SNMP protocol, (Simple Network Management Protocol), is an internet standard protocol for managing and monitoring devices on an IP network. All of the major monitoring vendors implement SNMP into their products. SNMP exposes management data in the form of variables. These variables can be queried, and in some cases even set. SNMP will generally run as a daemon on the target operating system.

An SNMP-managed network consists of three key components:

    • Managed device

    • Agent — software which runs on managed devices

    • Network management system (NMS) — software which runs on the manager

A managed device is a network node that implements an SNMP interface that allows unidirectional (read-only) or bidirectional access to node-specific information. Managed devices exchange node-specific information with the NMSs. Sometimes called network elements, the managed devices can be any type of device, including, but not limited to, routers, access servers, switches, bridges, hubs, IP telephones, IP video cameras, computer hosts, and printers.

An agent is a network-management software module that resides on a managed device. An agent has local knowledge of management information and translates that information to or from an SNMP specific form.

A network management system (NMS) executes applications that monitor and control managed devices. NMS's provide the bulk of the processing and memory resources required for network management. One or more NMSs may exist on any managed network.

SNMP does not define the variables that should be made available. Rather, it uses an extensible design where the data is defined by a MIB (Management Information Base). MIBs describe the structure of the management data of a device subsystem; they use a hierarchical namespace containing object identifiers (OID). Each OID identifies a variable that can be read or set via SNMP. MIBs use the notation defined by ASN.1. The use of OID's to reference SNMP objects (variables) can make using the SNMP cli tools cumbersome. Also be aware that there are not MIB's available for every service or application that may be running on your target host.

There are also packages that go beyond the use of IPMI and SNMP. Nagios is a good example of that. It provides customizable monitoring of IPMI events, SNMP events, and you can build your own rules to virtually monitor anything you want. With all this flexibility comes complexity.

Here are some links to my discussions on monitoring: