Link Quality Microservice

In order to measure the quality of the connection between the nodes and the operator, we implement a microservice that measures the heartbeats that each node sends. This page details the translation of the raw data to a metric to measure link quality (which can be displayed in the OEO).

Each node sends a heartbeat at 10Hz (every 0.1s). In order to translate the timings of these heartbeats into a readable metric, the following method is used.

For visual reference, the figures on this page use a sample data set that was randomly generated containing heartbeats at approximately 100%, 75%, 50%, 25%, and 0% quality. 100% means all the heartbeats arrive exactly 0.1s apart, whereas 0% means none of the heartbeats arrive. 75% means 25% of the heartbeats are missing. Because our system does not lose packets, this data set simulates missing heartbeats by making any missing heartbeats arrive the next time a heartbeat arrives successfully.

The dataset has been generated in the following way:

x=0 to 50: the dataset has 100% quality

x=50 to 100: the dataset has 75% quality

x=100 to 150: the dataset has 50% quality

x=150 to 200: the dataset has 25% quality

x=200 to 250: the dataset has 0% quality

x=250 to 300: the dataset has 100% quality

Figure 1: Graphs the time vs. the # of heartbeats that arrived at that time

Note: Figure 1 has been truncated for readability due to the outlier at x=250, when 50 heartbeats arrive at once.

Figure 2: Graphs the heartbeat arrival time vs. the calculated inter_sample_delay

First, the delay between heartbeats, denoted inter_sample_delay, is measured. The expected inter_sample_delay is 0.1 as heartbeats are sent at 10Hz. In order to measure the variation of inter_sample_delay with respect to the expected value, the absolute difference is used: |inter_sample_delay - 0.1|. 

Note: Figure 2 is truncated for readability due to the outlier at x=250, when the inter_sample_delay is 50 due to the 0% quality for 50 seconds.

In order to normalize these data points, exponential smoothing is used. Compared to a simple average that assigns equal weight to every data point, exponential smoothing assigns exponentially decreasing weights to data points over time. This allows our metric to adapt to changes in link quality quickly while still maintaining an average. Not only is it adaptable, but it requires very little computation and space, making it ideal for analyzing time-series data in real-time.


Exponential smoothing is given by the following formula, where t is the heartbeat number, x[t] is the observation point (in our case, this is |inter_sample_delay - 0.1|), and s[t] is the smoothed data point:


s[0] = x[0]

s[t] = α*s[t-1] + (1-α)*x[t]

0 < α < 1

α is the smoothing factor and determines how much weight is given to the most recent data points.


For more information, see Exponential Smoothing Models. This is also the same technique TCP uses in measuring transmission delay: TCP Timers.

Figure 3: Graphs the heartbeat arrival time vs. the exponentially smoothed absolute difference of inter_sample_delay: s[t]

However, there is still a lot of noise in the data (as shown above in Figure 3):


Therefore, Brown's linear/double exponential smoothing is used to apply a second smoothing to the data using the following formula:

s'[0] = s[0]

s'[t] = α*s'[t-1] + (1-α)*s[t]

This provides us with a smoothed data set to measure the current link quality accurately. 

Figure 4: Graphs the heartbeat arrival time vs. the double exponentially smoothed absolute difference of inter_sample_delay: s'[t]

In order to translate s'[t] into a user-readable metric (expressed as a percentage), the following regression line was calculated: link_quality = 100 - 500 * s'[t]. This is bounded by 0 to ensure no negative percentages.

A link quality of 100% means that the heartbeats arrive exactly 0.1 seconds apart, with no delay or bursts, while a link quality of 0% means that no heartbeats are received.

Figure 5: Graphs the heartbeat arrival time vs. the calculated link quality metric

Because the link quality is only updated when a heartbeat is received, we implement a timeout protocol for worst-case link quality. When a heartbeat hasn't been received for 1.0s, then the link quality is automatically set to 0%.


The link quality can be viewed on the OEO via the following command: add computer/heartbeat_quality