Towards A Statistic Ontology for Data Analysis in Smart Manufacturing

Zhuoxun Zheng, Baifan Zhou, Dongzhuoran Zhou, Akif Quddus Khan,

Ahmet Soylu, and Evgeny Kharlamov

YouTube link for 1 min Talk: https://youtu.be/pY26PkPOx7Q

Motivation and Challenges

Statistical analyses have always played a crucial role in modern industry. However, especially for those who didn't receive excessive training of data science in industry, statistical analytics suffers from transparency, formal descriptions and reusability.

Currently there are a few studies [1, 2] that discuss partially the modelling of statistical analytics pipeline, but they insufficiently study the procedures of these approaches, thus cannot address the challenges in industrial practice.


We
propose

a statistical ontology StatsOnto with the angle of procedure orientation. In particular, StatsOnto fulfils the following requirements:

R1. Procedure-Orientation: StatsOnto should be able to reflect the statistical analytics procedure, allowing to describe sequence of statistical tasks in a data pipeline.
R2. Transparency: StatsOnto should improve the transparency of the representing statistical analytics in industry.
R3. Knowledge Coverage: StatsOnto should cover the knowledge and practice of the statistical analytics.
R4. Purpose Coverage: StatsOnto should cover the four types of tasks: data inspection (e.g., find the data with certain property), statistical modelling (e.g., build the distribution of the data), data denoising (e.g., detect and remove the outliers) and data analysis (e.g., interpolation, subsampling).

Our Approach

Industrial Scenario: We limit our scope in this poster paper to smart welding manufacturing at Bosch. StatsOnto aims to help the engineers at Bosch to gain insights from the data collected from the welding production, and to monitor quality of the welding operations.

Ontology Engineering Process: We broadly follow the routine of Ontology development 101 [3]. The whole process can be divided into four steps:

Step 1: Domain Analysis, where common statistical analytics at Bosch are discussed. Common and important terms of statistical tasks are enumerated and classified.
Step 2: Concepts Formalisation,
where enumerated basic concepts are formalised as classes and relationships between them.
Step 3: Mechanism Investigation, where the mechanism of how StatsOnto can serve as the basis in generating KGs which represent concrete statistical analytic pipelines. This step reflects the requirement of Procedure-Orientation of StatsOnto. Step 4: System Deployment, where StatsOnto will be deployed in manufacturing and user feedbacks are collected constantly for iterative processing and further improvement.

Summary

This poster paper presents our ongoing research of statistical ontology, which is easy to understand and covers most of statistical analytics in industrial applications. Additionally, it’s practice-orientated, which means this ontology also emphasis on the general knowledge of statistical analytics pipelines. The evaluation shows, the ontology indeed improves the transparency of the statistical analysis, and covers most statistic practice in industry


Evaluation

R1 Procedure Orientation: We use an example for data denoising (Fig. 2) for demonstrating the procedure orientation. Given the input of Q-Value Array, the pipeline first extracts its trend by calculating the median value with sliding window, then calculates the scattering by the difference between the trend and Q-Value. The points with large scattering (large deviation from the trend) are detected as the outliers. This shows StatsOnto is capable of representing the procedure of statistical analytics.

R2 Transparency: We organised a workshop at Bosch and collected 28 reports from experts of different backgrounds. The users first perform statistical analytical tasks with or without our method, and then answer several single-selected questions, their correctness of these questions reflect their understanding towards these analytics methods.
R3 Knowledge Coverage
: We select two example CQs in SPARQL from two aspects: knowledge query (Fig. 2c), analysis procedure query (Fig. 2d).

R4 Purpose Coverage: After extensive dissuasion among the expert users in the workshop, we categorised most statistical analytical tasks in our project into the four types of purposes, and find that, according to our empirical cases, most of categories can be covered (above 80%).

Reference

[1] A. Salatino, et al., The computer science ontology: a large-scale taxonomy of research areas, in: International Semantic Web Conference, Springer, 2018, pp. 187–205
[2] K. Kotis, A. Papasalouros, Statistics ontology, http://stato-ontology.org/ (2018).