Workshop on

Predictive Maintenance

August 28 @ QONFEST 2021

“ An ounce of prevention is worth a pound of cure”

Benjamin Franklin

About

This interdisciplinary workshop brings together experts
from different areas linked to predictive maintenance.

Predictive maintenance (PM) develops optimal schedules of maintenance tasks in complex repairable systems. "Optimal" means to schedule maintenance actions at the most cost-efficient point in time. This point usually occurs right before a critical system component fails, as opposed to replacing failed components. Predicting failure points allows to stow maintenance actions for idle periods, when system load is low and financial losses are minimised.

But PM is an industry-strength practice that goes beyond maintenance scheduling. Its predictions also serve to design recovery protocols, which are the backbone of fault-tolerant systems such as satellites and nuclear power plants. Thus, PM is one of the most promising approaches to implement safety-critical systems, allowing them to run without unexpected failures which would otherwise lead to dangerously unsafe situations and financial losses.

The apparent industrial interest has led to the implementation of PM by numerous companies. Moreover, the questions posed by PM problems are of great scientific interest. In particular, many of its applications involve stochastic behaviour, that surface e.g. from the failure rates of system components. Therefore, analysis methods from the area of quantitative model checking are a great fit.

On top of that, several systems for PM application are physical entities with hybrid behaviour: discrete system parts (embedded controllers) are used to control continuous entities (water level, voltage, etc.). Because of this and in many real-world scenarios, modelling PM applications requires complex methods capable of analysing hybrid automata and other high-complexity formalisms.

Thus, the field of formal methods provides automated and formally-correct tools that serve to implement PM in practice. This workshop gathers contributions from various areas in or close to formal methods, that describe or advance the state-of-the-art of PM, including:

  • model checking methods, e.g. to find optimal component-replacement policies for a range of PM models,

  • methods based on machine learning, e.g. describing procedures to learn the failure rates of critical components in the system,

  • case studies from successful applications of formal methods to PM in practice.

The PM 2021 workshop touches upon all of this
with highlights by domain experts and influential panel discussions.

Program

The event was divided in three sessions, taking place on Saturday, August 28.
All times are
CEST (see in your local time)

Session 1: PM modelling and deployment on large-stack systems
14:00–15:15

Abstract: Predictive maintenance is a technique that promises to improve the maintenance performance of complex industrial systems. The latter by indicating when and how to carry out maintenance strategies, with the goal of optimizing costs, ensure system reliability and maximize system availability. One of the main steps within predictive maintenance is forecasting the future system performance, better known as prognostics. In this presentation, I will provide an overview of important concepts in prognostics, with special emphasis on the applications at a system-level, and some of the main associated challenges.

One of these challenges is motivated by the need for efficient model-learning and data-driven knowledge discovery for the inference of system-level models, I will present some results of our algorithm, the FT-MOEA, which makes use of multi-objective evolutionary algorithms to automatically infer Fault Tree models, where based on synthetic failure data sets we managed to successfully learn efficient fault tree structures. We also identified a set of limitations that are worthy of further research.

The last part of this presentation is related to the outlook of my research, one of them associated with an extension of our algorithm FT-MOEA, where we plan to harness the concept of symmetry, which is a feature existing in many real-world complex systems. Another aspect that needs further research to improve system-level prognostics is related to uncertainty quantification (UQ). Here we envision a concept based on survival analysis that aims at making UQ a more accurate by considering contextual features.

Abstract: NS (Dutch Railways) is the main passenger train operating company in the Netherlands. HVAC (heating, ventilation, air conditioning) units are used to keep passengers comfortable during cold and hot weather periods. However, these HVAC units sometimes show cooling malfunctions, which affect passenger comfort in summer.
In my talk, I will present NS’ current approach in detecting and repairing HVAC cooling malfunctions through the use of Real Time Monitoring, with specific attention to the role of sensor measurements and diagnostic messages.

Session 2: urgent vs. long-term applications of PM
15:4516:45

Abstract: The Corona pandemic may be seen as the largest predictive maintenance problem ever. Here, the overall purpose of the maintenance problem is to reduce the number of people dying, largely through ensuring that the capacity of health systems are not exceeded. The maintenance itself consist of a range of measures, including lock-down of schools, regions, work-places, testing, contract-testing apps, vaccination, etc. What needs to be optimized is the cost and impact on society in terms of loos of jobs, income, reduced social life, etc.

The talk will highlight intensive work carried out over the last 1½ year in developing and applying a framework in the tool UPPAAL for modelling and analyzing large scale agent-based models of the COVID-19 epidemic in Denmark. In particular, we have applied the framework to analyze the series of lock-down events in Northern Jutland in the Autumn of 2020. The agent-based model is based on national registers concerning number of individuals at schools, work places, commuting patterns, living addresses, etc. In the model each of the more than 500.000 citizens gives rise to two stochastic timed automata components: one automaton reflecting the health status of a person (as a SEIRH model) and one automaton reflecting daily routines between home, workplace, school and leisure. Using statistical model checking we have estimated the effect of various lock-down measure of boarders between municipalities, as well as the trade-off between increased testing versus more wide-spread use of contact-tracing app.

    • Jaap van Ekris (Delta Pi, NL)
      Maintaining sleeping giants, using quantitative data to optimise system maintenance ( slides - video )

Abstract: Many safety systems are designed to be never used: they are safety nets, waiting to avert the disaster that hopefully never will happen. For example the Dutch Storm surge barriers, dormant giants waiting to be used, but only really active every year or even once every ten years. How do you know that such a system is there when you need it. Is the giant safely asleep, or did he die without us noticing it? How do you design and optimise its maintenance, where its biggest problem is dormant failure? In this talk I adress the approach used for these systems.

Session 3: boundaries and new promises of PM
17:0018:15

Abstract: The fast speed at which the prediction of Moore's law has been succeeding has resulted into neither industrial nor military standards being updated fast enough to take into account the growth in susceptibility to environmental and operating conditions of the most advanced CMOS technology integrated circuits. All the procedures defined in the standards for determining the failure rate of the latest generation of integrated circuits either only consider temperature as the only factor accelerating the failure rate of such devices, or only take into account a single dominant mechanism.

Many authors have proposed alternative techniques that take into account, in addition to the temperature dependence, the dependence of the device supply voltages and all known acceleration mechanisms. The failure rate (λ) that the device will exhibit during its lifetime is thus determined by accelerated stress tests and the result obtained is then extrapolated to normal use conditions by means of models linking environmental and operating conditions to λ. Since there is a different dominant accelerating mechanism for each one of the wide range of environment and operational conditions at which the device works, models representing all known accelerating mechanisms are used for this purpose.

However, there is still the problem that when performing this back extrapolation toward the real use conditions, a fixed reference value is used for both the voltage and the operating temperature, which is the one that the device is supposed to withstand during its useful life. This ends up resulting in a coarse approximation of reality where in general the temperature and voltage conditions suffer from random fluctuations within wide ranges, over the lifetime of the device.

As a solution to this problem, it is proposed to replace the fixed values of the device operating parameters (temperature, voltage) by the values arising from their probability distribution and to calculate the final failure rate by summing the contributions of all the mechanisms and averaging the resulting contributions over time obtained as a result of using a Monte Carlo process.

Inscriptions

Attendance is free of charge but participants must register to the PM workshop through the QONFEST platform.

Acknowledgements

The Workshop on Predictive Maintenance is funded by:

⬟ the NWO under grant NWA.1160.18.238 "PrimaVera" ⭓ the EU under project 864075 "CAESAR"

PrimaVera

NWO grant no.
1160.18.238

ERC project no.
864075