To deploy an Uhoh Server, firstly download the Uhoh distribution from the Uhoh GitHub repository as a .zip file and unpack it into a folder of your choice. Within this folder you will see:
The Uhoh Server is initialised using a single configuration file called server.properties. In order to start your Uhoh Server for the first time, you will need to specify the IP broadcast address for your Uhoh Server host's network interface. Uhoh Servers advertise their presence to Uhoh Clients using UDP broadcast messages and these messages are sent to the address specified in the server.properties file.
The directive where this is set in server.properties is:
udp_broadcast_address
for example:
udp_broadcast_address: 192.168.1.255
(Use ifconfig -a or ipconfig /all to find an interface broadcast address for your host. Or, set udp_broadcast_address to localhost for environments which don't support UDP broadcast.)
Once the above directive has been set, start the server as follows:
java -cp uhoh.jar com.uhoh.Server server.properties
The Uhoh Server will log start-up messages to STDOUT and the Log Stream (server.log) file similar to the following:
Wed May 31 21:45:53 BST 2017 [S] [main]: Client/Server UDP ports: 8888 / 8889
Wed May 31 21:45:53 BST 2017 [S] [main]: Server broadcast address: 192.168.1.255
Wed May 31 21:45:53 BST 2017 [S] [main]: Web UI server TCP port: 7777
Wed May 31 21:45:53 BST 2017 [S] [main]: Client inactivity timeout: 180000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: Server will sent heartbeats every: 5000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: Server will log events to: server.log
Wed May 31 21:45:53 BST 2017 [S] [main]: Event log will roll after it reaches: 100000000 bytes
Wed May 31 21:45:53 BST 2017 [S] [main]: Dead client alert tags are: DEAD_CLIENT,RED
Wed May 31 21:45:53 BST 2017 [S] [main]: UI: Log file events held for: 600000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: UI: Process events held for: 65000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: UI: Idle client events held for: 75000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: UI: Other events held for: 125000 ms
Wed May 31 21:45:53 BST 2017 [S] [main]: Starting server on UDP port 8889
Wed May 31 21:45:53 BST 2017 [S] [main]: Broadcasting heartbeat messages to: 192.168.1.255
Listening for incoming UDP messages
Starting a REST server on: 7777
The Uhoh Server will log the following line approximately every sixty seconds:
Sun Jun 11 19:34:22 BST 2017 [S] [main]: Uhoh
When alerts start appearing in the Uhoh Server's Log Stream they will be marked with the string ALERT in the format:
<date/time> [S|C] [main]: ALERT%%<source_host>%%<category>%%<utime_ms>%%<source_type>%%<tag_list>%%<description>
... for example:
Sun Jun 04 20:08:12 BST 2017 [S] [main]: ALERT%%TheBigMac%%PROCESS%%1496603292820%%CLIENT%%LOGIN,GREEN%%Too few instances of login running
Sun Jun 04 20:08:18 BST 2017 [S] [main]: ALERT%%TheBigMac%%FILE%%1496603298706%%CLIENT%%INACTIVE_0%%test.log: Idle !!
Sun Jun 04 20:08:18 BST 2017 [S] [main]: ALERT%%TheBigMac%%FILE%%1496603298711%%CLIENT%%RANGE_3_6%%test.log: 0
Sun Jun 04 20:08:19 BST 2017 [S] [main]: ALERT%%TheBigMac%%FILE%%1496603299741%%CLIENT%%TOTAL_ALERT_RG%%test.log: Total outside of five to ten
Sun Jun 04 20:08:19 BST 2017 [S] [main]: ALERT%%TheBigMac%%FILE%%1496603299741%%CLIENT%%TOTAL_ALERT_MN%%test.log: Total less than ten
Sun Jun 04 20:08:19 BST 2017 [S] [main]: ALERT%%TheBigMac%%FILE%%1496603299741%%CLIENT%%TOTAL_ALERT_MT%%test.log: 0
Log Stream Handler programs consume the Uhoh Server's Log Stream, parsing ALERT lines and taking actions such as:
Each Uhoh Server contains an embedded web server. Pointing your browser at this web server will allow you to access the Uhoh Server Fault Management View which displays currently active alarms ordered by priority and arrival time. Alerts appear in the Fault Management View if they are tagged as follows:
(For a detailed description of alert tagging means see: Set up a Client.)
The most recently received alarm from each category is displayed at the top of each category's section. The image below shows how alarms appear in the Fault Management View. Each alarm appears as: <date/time of arrival>: <host alarm was raised on>: <text describing alarm>
The Uhoh Server Fault Management View can be accessed by pointing web browser at the host on which the Uhoh Server is running, on the port specified by the tcp_port_number directive in the Uhoh Server server.properties file (by default port 7777) - for example:
http://localhost:7777/
If no alarms are currently active, the Fault Management View will display a single green band. If the Uhoh Server is closed down, your browser will display an error.
It is also possible to filter the alarms displayed in the Fault Management View by adding tags to the end of the Fault Management View URL. For example:
http://localhost:7777/?EXIT
... will only display alarms tagged as EXIT.
http://localhost:7777/?INFO,EXIT
... will only display alarms tagged as INFO and EXIT (not INFO or EXIT).
The Uhoh Server Fault Management View can also easily be embedded into an Operations Centre Service Assurance user interface via an HTML iframe allowing it to sit side-by-side with other assurance views. The Fault Management View uses the Mootools Javascript library for AJAX and rendering.
The Uhoh Server Fault Management View fetches it's alarm list from the Uhoh Server as a JSON document. This means that other assurance tools can also make use of the same API to fetch a view of the currently-active alarm list. The URL used to fetch the JSON representation of the active alarm list is the same as for the Fault Management View, but appended by /ui for example:
http://localhost:7777/ui
The JSON document returned contains five sections:
Each alarm is represented as a list containing two elements:
A sample JSON document containing Fault Management View information is shown below:
{
"status":"ok",
"red":[
[
"Sat Jun 03 09:08:02 BST 2017: TheBigMac: Clump press B needs attention", "RED"
],
[
"Sat Jun 03 09:08:02 BST 2017: TheBigMac: Apache HTTPD is not running", "WEB_SRV,RED"
]
],
"amber":[
[
"Sat Jun 03 09:08:03 BST 2017: TheBigMac: test.log: Test file is inactive", "TEST_INACTIVE_1,AMBER"
],
[
"Sat Jun 03 09:08:03 BST 2017: TheBigMac: test.log: 0", "TEST_INACTIVE_2,AMBER"
],
[
"Sat Jun 03 09:08:02 BST 2017: TheBigMac: Too few instances of autoscale.sh running", "AMBER,CPRESS"
]
],
"green":[
]
,
"purple":[
]
}
Uhoh Clients can be configured to generate and record metric data. If the alerts containing metric data are explicitly tagged as metrics (using the METRIC_ tag prefix - see further details in on the Set up a Client page), then the Uhoh Server will record the metrics to disk and make them available for viewing in a web browser or via a REST/JSON API.
When Uhoh Performance Management metrics are viewed in a web browser, the display is managed by Google Charts and Mootools. An example of a metric view is shown below:
Metrics are viewed using the URL:
/metric/<date>/<metric>
where:
For example, in order to display a metric called COUNT_1 (from an alert tagged as METRIC_COUNT_1) for the current day, from an Uhoh Server running on your local machine, the URL would look like this:
http://localhost:7777/metric/TODAY/COUNT_1
and to view the same metric from 29th December 2015, the URL would be:
http://localhost:7777/metric/2015-12-29/COUNT_1
Metric charts are annotated with:
http://localhost:7777/metric/TODAY/COUNT_1/Delay%20(sec)
.... will set the y-axis label to 'Delay (sec)'.
At the bottom left-hand side of a metric graph are a set of controls:
These controls allow you to move to viewing a metric from a previous day (<), a metric from the next day (>), or return to the metric from the current day (O).
It is also possible for applications to request a set of metric data for a specific metric on a specific day from the Uhoh Server as a REST/JSON API call. To do this, the caller will need to use a URL with the form:
/mdata/<date>/<metric>
For example:
http://localhost:7777/mdata/2015-12-29/COUNT_1
Again, the date can be specified as TODAY to fetch the data set for the current day.
The JSON response contains a single section called items which contains a list of metrics. Each metric is itself a fixed-length list containing the following elements:
An example JSON response document is shown below:
{
"status":"ok",
"items":
[
[ 2015,12,29,23,3,7,0 ],
[ 2015,12,29,23,4,7,1 ],
[ 2015,12,29,23,5,8,0 ],
[ 2015,12,29,23,6,8,0 ],
[ 2015,12,29,23,7,8,0 ],
[ 2015,12,29,23,8,8,0 ],
[ 2015,12,29,23,9,8,0 ]
]
}
It is often useful to provide your Operations Centre with a basic view of the whole of your ecosystem, broken down into inter-dependent layers of service elements. With a small amount of configuration, the Uhoh Server is able to deliver Service Map Views - these views then use the Uhoh Server Fault Management REST/JSON API to indicate where issues have been detected with your services.
An example Service Map View is shown below, with service elements divided into four layers:
In order to create a Service Map View, you will need to:
Service Map definition files describe the layers and service elements. For example, for the Service Map shown above, the configuration will look like this:
layer: name=Activity
element: tag=MAP_COMPARE name=Compare
element: tag=MAP_REGISTER name=Register
element: tag=MAP_PURCHASE name=Purchase
element: tag=MAP_UPG name=Upgrade
element: tag=MAP_LEAVE name=Retire
layer: name=Services
element: tag=MAP_DEAL name=Deal Calculator
element: tag=MAP_CREDIT name=Credit Check
element: tag=MAP_ACTIVATE name=Activate
element: tag=MAP_DEACTIVATE name=Deactivate
layer: name=Platforms
element: tag=MAP_WEB name=Web Site
element: tag=MAP_CRM name=CRM Platform
element: tag=MAP_ERP name=ERP Platform
element: tag=MAP_PAYMENTS name=Payments Platform
element: tag=MAP_GEO name=Location Platform
layer: name=Infrastructure
element: tag=MAP_CRM_HOSTS name=CRM Hosts
element: tag=MAP_CRM_DB name=CRM Databases
element: tag=MAP_CRM_AS name=CRM App Servers
element: tag=MAP_ERP_HOSTS name=ERP Hosts
element: tag=MAP_ERP_DB name=ERP Databases
element: tag=MAP_ERP_AS name=ERP App Servers
element: tag=MAP_PAYMENTS_ROUTER name=Payments Router
layer: name=NONE
element: tag=MAP_GEO_API name=Location API
element: tag=MAP_CDN name=Content Delivery Network
element: tag=MAP_CMS name=Content Management System
Any alert that is tagged to appear as either RED, AMBER or GREEN, and so will appear in the Uhoh Server Fault Management View, can also be tagged to indicate an issue within a Service Map.
For example, with the Service Map definition given above, an alert tagged with AMBER,MAP_CRM_HOSTS,MAP_CRM,MAP_ACTIVATE,MAP_REGISTER:
Alternatively, an alert tagged with just AMBER,CRM_HOSTS,CRM would convey the presence of an issue which doesn't affect the layers Service and Activity - ie. this would indicate a problem which isn't currently affecting customer service.
If you wish to highlight a service issue within a Service Map View, but don't want a corresponding alert to appear in the Fault Management View, tag the alert as PURPLE. For example: PURPLE,MAP_CRM_HOSTS,MAP_CRM,MAP_ACTIVATE,MAP_REGISTER. This will cause the alert to appear in the Fault Management REST/JSON API, but not in the browser-based Fault Management View.
Uhoh Server Service Map Views are accessed via a URL of the format: http://<uhoh_server_host>:<uhoh_server_port>/service/<service_map_name> for example:
http://localhost:7777/service/example_service_view
Although the default directives for the Uhoh Server will be sufficient in most cases, occasionally custom configuration may be required. Here we will describe each of the directives which are used within the Uhoh server.properties file so that you can select the best possible values for your environment.
Uhoh Servers and Uhoh Clients communicate with each other on a pair of UDP ports. The udp_port_number directive is used to set the UDP port used for messaging from Uhoh Servers to Uhoh Clients. Messaging from Uhoh Clients back to Uhoh Servers uses the port specified by udp_port_number + 1. For example:
udp_port_number: 8888
... indicates that Uhoh Servers will message Uhoh Clients on port 8888 and Uhoh Clients will message Uhoh Servers on port 8889.
Messages from Uhoh Servers include:
Messages from Uhoh Clients include:
The udp_port_number directive is mandatory.
The next directive, tcp_port_number, specifies the TCP port that the Uhoh Server will listen on in order to serve:
The directive appears as follows:
tcp_port_number: 7777
The tcp_port_number directive is mandatory.
The directive, udp_broadcast_address (also mandatory), has been described in the section above.
The client_timeout directive specifies the time, in milliseconds, that an Uhoh Server should wait for an update from an Uhoh Client before it declares that the Uhoh Client is dead (has closed down, or it's host has become disconnected from the network):
client_timeout: 180000
Once an Uhoh Client has first contacted an Uhoh Server to request it's monitoring configuration, the Uhoh Server registers the Uhoh Client as present and expects an update from the Uhoh Client before the time specified by client_timeout expires. Uhoh Clients will send heardbeat updates to Uhoh Servers every 67 seconds if not sending any other events, so the client_timeout directive should normally be set to approximately three times this value.
The client_timeout directive is mandatory.
The client_remove_time directive specifies the time, in milliseconds, after which the Uhoh Server will remove an Uhoh Client from it's internal watchlist:
client_remove_time: 300000
Based on the values of client_timeout and client_remove_time, the Uhoh Server will behave as follows:
The client_remove_time directive is mandatory.
Uhoh Servers send heartbeat messages out to every Uhoh Client advertising their location. The heartbeat_interval directive specifies the time interval, in milliseconds, between each message being sent:
heartbeat_interval: 5000
The heartbeat_interval directive is mandatory.
Each Uhoh Server writes information to a log file (known as the Uhoh Server Log Stream). The Log Stream can then be consumed by Log Stream Handler programs which perform actions such as:
The name of the Log Stream file is defined using the disk_log_name directive as follows:
disk_log_name: server.log
The disk_log_name directive is mandatory.
When the Uhoh Server Log Stream file reaches a certain size, it will be renamed to append '.1' to the file name, and a new Log Stream with the original name will be created. The size of file which triggers the roll-over to a new Log Stream file is defined using the disk_log_size directive as follows:
disk_log_size: 100000000
The size of the Log Stream file is given in bytes and the disk_log_size directive is mandatory.
The Uhoh Server Fault Management View is designed to provide a window into the current health of your system. Therefore, alerts that are displayed will only appear for a set amount of time before being purged from view. Alerts will, of course, remain in the Uhoh Server Log Stream. Four Uhoh Server configuration directives specify the time, in milliseconds, that different categories of alert will remain in view.
The ui_display_time_FILE directive specifies the time that alerts derived from log file monitoring will remain in view for:
ui_display_time_FILE: 600000
For process monitoring, the time alerts will remain in view is specified by the ui_display_time_PROCESS directive:
ui_display_time_PROCESS: 65000
The time that idle Uhoh Client alerts are retained in the view for is defined by the ui_display_time_IDLE directive. Idle Uhoh Client alerts occur when:
The ui_display_time_IDLE directive is specified as follows:
ui_display_time_IDLE: 75000
Note that the Uhoh Server will continuously log messages for Uhoh Clients which have closed down or lack an Uhoh Client configuration file.
For all other alert types, the ui_display_time_ALL directive specifies the retention time for these alerts in the Uhoh Fault Management View. This directive is used as follows:
ui_display_time_ALL: 125000
All of the ui_display_time_XXX directives are mandatory.
If an Uhoh Client closes down, the Uhoh Server will generate alerts. The tags which are attached to these alerts are specified by the dead_client_tags directive as follows:
dead_client_tags: DEAD_CLIENT,RED
The dead_client_tags directive is mandatory.
If a new Uhoh Client is started, but the Uhoh Server it contacts to fetch the Client configuration from doesn't have a suitable configuration file available in its clientconfigs folder, then the Uhoh Server will log an alert to it's Log Stream containing the description 'No configuration available'. The tags attached to this alert are specified by the no_config_tags directive:
no_config_tags: NO_CLIENT_CONFIG,AMBER
The no_config_tags directive is mandatory.
Although notification of an Uhoh Server's location to Uhoh Clients is normally facilitated via UDP broadcast messaging, occasionally this method may not be appropriate. For example:
Uhoh Clients can be provided with a list of Uhoh Servers on start-up to avoid the use of UDP broadcast and this method is the preferred way of dealing with the UDP broadcast issue (see Set up a Client). However, it is also possible to configure the Uhoh Server with knowledge of the IP address locations of Uhoh Clients. This is done using the udp_unicast_address directive as follows:
udp_unicast_address: 10.1.1.8,10.1.1.9,10.1.1.10
The parameter for this directive is a comma-separated list of Uhoh Client host IP addresses. The udp_unicast_address directive is optional.
Normally, for resilience, a pair of Uhoh Servers will be set up in order to monitor a particular domain. However, as each Uhoh Server is a fully independent entity, alerts from Uhoh Clients will appear in both of the Uhoh Server's Log Streams. For programs which are consuming the Log Stream and triggering activity elsewhere (such as notifying a Slack channel), two Uhoh Servers would result in duplicate processing of alerts. A directive called secondary_server is available to allow one of the Uhoh Servers within a resilient pair to declare itself as a Fault Tolerant Secondary to programs consuming it's Log Stream. The programs can then suppress processing alerts from the secondary Uhoh Server Log Stream whilst the primary Uhoh Server is active.
The secondary_server directive is added to the primary Uhoh Server's server.properties file to indicate the host location of the secondary Uhoh Server as follows:
secondary_server: 192.168.1.210
The secondary_server directive is optional.
The Uhoh Server may also be configured to run a sub-process on start-up. This can be used to start up the Uhoh Client plus any Uhoh Log-Stream handlers at the same time as the Uhoh Server is started by instructing the Uhoh Server to run a script containing a set of start-up commands.
A sub-process is run by using the launch directive, as follows:
launch: scripts/run_all_processes.sh
The launch directive is optional.
In order to implement resilience, it is normal practice to run two Uhoh Servers for a domain. Uhoh Servers are completely self-contained and the two servers in a domain don't share any data whilst running. However:
For the second point above, we can designate one of the Uhoh Servers in a resilient pair as primary and the other as secondary. The secondary instance will periodically log a message informing Log Stream consumers to suppress processing. If the primary instance fails, then the secondary instance will stop logging the messages and it's Log Stream consumed can resume processing.
To configure primary and secondary servers, the server.properties for the primary server needs to contain the secondary_server directive (as described in the section above). This directive contains the IP address of the secondary server. When the primary server is running it sends periodic messages to the secondary server which causes the secondary server to log messages containing the string FT_SECONDARY to it's Log Stream. Consumer programs should suppress processing if they have recently consumed an FT_SECONDARY message from the Log Stream.
When there are two Uhoh Server instances running in a domain, Uhoh Clients will be aware of the location of both Uhoh Servers and will send event data to both servers. This means that both Uhoh Servers will receive all events and no sharing of data needs to occur between Uhoh Servers.
For environments which don't permit the use of UDP broadcast messaging (for example Amazon Web Services), Uhoh Clients are provided with the IP addresses of both Uhoh Servers when first started. As above, each event from an Uhoh Client is sent to both Uhoh Servers.
It is also possible to configure Dashboard Views on the Uhoh Server. Dashboards are simple combinations of Fault Management Views, Performance Management Views (Charts) and Service Map Views, arranged to be viewable on a single browser page. An example of an Uhoh Server Dashboard View is shown below:
Dashboard Views are defined using configuration files placed in the dashboards folder in the working folder of the Uhoh Server. An example Dashboard View configuration files is given below - this dashboard has three rows:
background: colour=#ffffff
row: size=30% url=/service/example_service_view
title: name=Performance Information
row: size=30% url=/metric/TODAY/A_MEDIUM/ url2=/metric/TODAY/A_LOW/Delay%20(sec)
row: size=40% url=/
Dashboard View configuration files contain row directives each of which defines a row of elements to include on the dashboard. Parameters used with the row directive are:
Dashboard View configuration files can also contain title directives which are used to insert text into Dashboards. The text to insert is specified via the name parameter.
The background directive is used to set the colour of the Dashboard View background.
Dashboards are accessed via URL of the form: http://<uhoh_server_host>:<uhoh_server_port>/dashboard/<dashboard_name> for example:
http://localhost:7777/dashboard/example_dashboard