Monitoring and Measurement tools
Zenoss -- Open Source Enterprise Monitoring. Zenoss Core is an enterprise-grade network and systems monitoring product that delivers the functionality IT operations teams need to effectively manage the health and performance of their entire infrastructure through a single, integrated package.
For far too long, robust IT infrastructure monitoring was out of reach for most organizations because of the cost and complexity of the proprietary systems that offered the required functionality. Zenoss has changed the game by offering a complete, easy-to-use solution as a free (i.e. no money), downloadable, open source software product.
Big brother -- well known service/host monitoring system. Big Brother monitors System and Network-delivered services for availability. Your current network status is displayed on a color-coded web page in near-real time. When problems are detected, you're immediately notified by e-mail, pager, or text messaging.
Nagios -- Nagios╝ is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.
Ganglia -- Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Cricket -- is a high performance, extremely flexible system for monitoring trends in time-series data. Cricket was expressly developed to help network managers visualize and understand the traffic on their networks, but it can be used all kinds of other jobs, as well.
Cricket has two components, a collector and a grapher. The collector runs from cron every 5 minutes (or at a different rate, if you want), and stores data into a datastructure managed by RRD Tool. Later, when you want to check on the data you have collected, you can use a web-based interface to view graphs of the data.
Cricket reads a set of config files called a config tree. The config tree expresses everything Cricket needs to know about the types of data to be collected, how to get it, and from which targets it should collect data. The config tree is designed to minimize redundant information, making it compact and easy to manage, and preventing silly mistakes from occurring due to copy-and-paste errors. Cricket is written entirely in Perl and is distributed under the GNU General Public License.
Zabbix -- ZABBIX is software for monitoring of your applications, network and servers. ZABBIX supports both polling and trapping techniques to collect data from monitored hosts. A flexible notification mechanism allows easy and quickly configure different types of notifications for pre-defined events. ZABBIX offers advanced monitoring, alerting and visualisation features today which are missing in other monitoring systems, even some of the best commercial ones. Use of industry standards makes integration of ZABBIX into existing infrastructure trouble-free.
Monalisa -- MONitoring Agents using a Large Integrated Services Architecture. The MonALISA framework is a fully distributed service system with no single point of failure and it provides:
Distributed Registration and Discovery for Services and Applications.
Monitoring all aspects of complex systems :
System information for computer nodes and clusters.
Network information (traffic, flows, connectivity, topology) for WAN and LAN.
Monitoring the performance of Applications, Jobs or services.
End User Systems, and End To End performance measurements.
Can interact with any other services to provide in near real-time
customized information based on monitoring information.
Secure, remote administration for services and applications.
Agents to supervise applications, to restart or reconfigure them, and to
notify other services when certain conditions are detected.
The Agent system can be used to develop higher level decision services,
implemented as a distributed network of communicating agents, to perform
global optimization tasks.
Graphical User Interfaces to visualize complex information.
Global monitoring repositories for distributed Virtual Organizations.
MonALISA is currently used in several large scale distributed system and proved to be a reliable and scalable system.
R-GMA: -- Integrated Applications Management, Server Management, and Database Monitoring Software. Integrated Applications Management, Server Management, and Database Monitoring Software. R-GMA is in wide use in Grid like distributed systems.
Test harness and reporting framework -- Inca is a flexible framework for the automated testing, benchmarking and monitoring of Grid systems. It includes mechanisms to schedule the execution of information gathering scripts and to collect, archive, publish, and display data.
Originally developed for the TeraGrid project, Inca is a general framework that can be adapted and used by other Grids. Inca offers a diverse set of use cases including:
Software Stack Validation & Verification
Network Bandwidth Measurements
Grid Benchmarking
ManageEngine -- professionsl monitoring/management tool. Integrated Applications Management, Server Management, and Database Monitoring Software
Lemon RRD framework -- Lemon RRD framework is a part of the Lemon project at CERN (http://cern.ch/lemon) and is used to retrieve metric information from the MR (Monitoring Repository) and store it into time series serializes aging data structures that are stored as rrd files on a disk. These are integral part of the RRDtool project (http://www.rrdtool.org) that we used for our purposes. This is then passed over to the web interface for visualization. Framework is generic enough to allow different source of data other than MR. LRF supports grouping of machines (objects) into groups (clusters, racks, hardware models,...) and provides summary or average overview of each group independently even if certain machines are part of more of these groupings. This is all provided already at the time of gathering of information from the Monitoring Repository. The overview of the Lemon is available here and of the Lemon RRD framework is here.
sysstat package -- news, information, documentation and links software for the sysstat utilities created for Linux. The sysstat utilities are a collection of performance monitoring tools for Linux. These include sar, sadf, mpstat, iostat and sa tools.
Measurement tools
Bonnie++ -- is a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance. Then you can decide which test is important and decide how to compare different systems after running it. The main program tests database type access to a single file (or a set of files if you wish to test more than 1G of storage), and it tests creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format email.
Bonnie++ experimental - This version starts the re-write of Bonnie++! I will make it totally threaded (the new code does not use fork()). It will also support testing with a specified number of threads doing the same test, this will allow you to really thrash those RAID arrays!
IO500 - The benchmark consists of multiple subcomponents. There are bandwidth subcomponents, metadata subcomponents, and namespace searching subcomponents. The bandwidth subcomponents and metadata components use the IOR and mdtest benchmarks respectively. They both require that users submit a 'hero' number in which users can configure and tune the system and the command line arguments to maximize performance. They also require that users measure with a more challenging set of command line arguments. The namespace traversal and search can use a supplied MPI-based namespace traversal or can use custom tools.
NetLogger Anyone who has ever tried to debug or do performance analysis of complex distributed applications knows that it can be a very difficult task. Problems may be in many various software components, hardware components, networks, OS's, etc.
NetLogger is designed to make this easier. NetLogger is both a methodology for analyzing distributed systems, and a set of tools to help implement the methodology. In fact, you can use the NetLogger methodology without using any of the LBNL provided tools.
Iperf known tool for network measurement. Iperf is a tool to measure maximum TCP bandwidth, allowing the tuning of various parameters and UDP characteristics. Iperf reports bandwidth, delay jitter, datagram loss.
NetPerf is a benchmark that can be used to measure the performance of many different types of networking.
Network MOnitoring tools -- large list of available monitoring/measuring tools (SLAC.STANFORD.EDU)
IOzone good benchmark tool for file systems IOzone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems. Iozone is useful for performing a broad filesystem analysis of a vendor's computer platform. Benchmark Features:
ANSII C source
POSIX async I/O
Mmap() file I/O
Normal file I/O
Single stream measurement
Multiple stream measurement
Distributed fileserver measurements (Cluster)
POSIX pthreads
Multi-process measurement
Excel importable output for graph generation
Latency plots
64bit compatible source
Large file compatible
Stonewalling in throughput tests to eliminate straggler effects
Processor cache size configurable
Selectable measurements with fsync, O_SYNC
Builds for: AIX, BSDI, HP-UX, IRIX, FreeBSD, Linux, OpenBSD, NetBSD,
OSFV3, OSFV4, OSFV5, SCO OpenServer, Solaris, Windows95/98/NT
Internet End-to-end Performance Monitoring not bad intro into the matter
Distributed Systems Department at LBL (here is good source of information on netowrking)
Various measurement/taxonomy tools The CAIDA Tools site contains CAIDA tools and software as well as a taxonomy of available research and visualization tools.
The list of measurement tools (SPEC, bonnie, TPC, a range of kernel tools, etc.)
Disk benchmarks -- the list of different tools to do measurement on disk I/O.
FIO is a tool that will spawn a number of threads or processes doing a particular type of io action as specified by the user. fio takes a number of global parameters, each inherited by the thread unless otherwise parameters given to them overriding that setting is given. The typical use of fio is to write a job file matching the io load one wants to simulate.
Memtest86 -- A Stand-alone Memory Diagnostic. Memtest86 is thorough, stand alone memory test for x86 architecture computers. BIOS based memory tests are a quick, cursory check and often miss many of the failures that are detected by Memtest86.
BenchMarkHQ -- pretty large collection for benchmark utilities (English and Russian)
SysBench: a system performance benchmark. SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.
The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks or even without installing a database at all. Current features allow to test the following system parameters:
file I/O performance
scheduler performance
memory allocation and transfer speed
POSIX threads implementation performance
database server performance (OLTP benchmark)
CPU/Memory/Disk/System Tests -- many testing tools for different parts of the system.
HPCToolkit - HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the nation's largest supercomputers. HPCToolkit provides accurate measurements of a program's work, resource consumption, and inefficiency, correlates these metrics with the program's source code, works with multilingual, fully optimized binaries, has very low measurement overhead, and scales to large parallel systems. HPCToolkit's measurements provide support for analyzing a program execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system.
XDD - is a command-line based tool for measuring and characterizing disk subsystem I/O on single systems and clusters of systems. It is designed to provide consistent and reproducible performance measurements of disk I/O traffic.
© 2009-2025
Andrey Ye. Shevel