Towards Executable Knowledge Graph Translation
Dongzhuoran Zhou, Baifan Zhou, Zhuoxun Zheng, Zhipeng Tan, Egor V. Kostylev and Evgeny Kharlamov
Dongzhuoran Zhou, Baifan Zhou, Zhuoxun Zheng, Zhipeng Tan, Egor V. Kostylev and Evgeny Kharlamov
YouTube link for 1 min Talk: https://youtu.be/Efiia86vEu8
Poster DOI: TBD
Data analytics is vital in manufacturing for extracting insights from production data and optimising. However, the challenges of the transparency and explainability of analytics always exist caused by the dinstinct backgrounds of the experts. Our project experience reals that the experts with distinct background, e.g.,including engineers, data scientists, manager, spent excessive time on discussion but found out they misunderstood the problem and the ML solutions.
Semantic technologies including knowledge graphs (KG) is proved to be beneficial for these transparency and explainability challenges by offering standardised means to describe manufacturing domains, data, analytical tasks and solutions.
In particular, we study how knowledge graph can facilitate analytics by addressing two issues:
how to encode data pipelines as KGs, – we refer to such knowledge graphs as executable knowledge graphs (ExeKG)
how to automatically compute executable pipelines from executable KGs
Framework
We propose a framework for executable KG (ExeKG) that represents ML solutions for solving ML questions. Framework supports ExeKG to be translated to executable scripts and modularised in reusable and modularised fashion.
We first define data, methods and tasks in this framework.
Data 𝒟 is a set of facts, which can be in forms of numerals or others,typically relational tables or RDF database.
Method ℱ is a function in form of language-dependent script (such as in C++ or Python). A method takes some data which fulfils certain Constraints 𝒞ℱ as input and can output specific data. 𝒟𝑜𝑢𝑡= ℱ (𝒟𝑖𝑛).
Task 𝒯 is the process of invoking a method by feeding it with some data that meets certain Constraints, and by doing so to obtain some other data. Formally, 𝒯 ⟨𝒟𝑖𝑛, ℱ⟩ = ℱ(𝒟𝑖𝑛) = 𝒟𝑜𝑢𝑡, if 𝒞ℱ(𝒟𝑖𝑛) = 𝑇𝑟𝑢𝑒.
Pipeline 𝒯𝑝: Some tasks are more complex and we refer to a complex tasks as a a pipeline 𝒯𝑝 with input data 𝒟𝑖𝑛 to get 𝒟𝑜𝑢𝑡. Formally, expressed as 𝒯𝑝⟨𝒟𝑖𝑛, ℱ ⟩ = 𝒟𝑜𝑢𝑡. This formalism can be unfolded in the sequence of {𝒯1, 𝒯2, ..., 𝒯𝑛}, where:
𝒯1⟨𝒟𝑖𝑛1, ℱ1⟩ = 𝒟𝑜𝑢𝑡1, 𝒟𝑖𝑛1⊂ 𝒟𝑖𝑛, 𝒞ℱ1(𝒟𝑖𝑛1) = 𝑇𝑟𝑢𝑒; ...
𝒯𝑛⟨𝒟𝑖𝑛𝑛, ℱ𝑛⟩ = 𝒟𝑜𝑢𝑡𝑛, 𝒟𝑖𝑛𝑛⊂⋃𝑖∈{1,...𝑛−1}𝒟𝑜𝑢𝑡𝑖∪ 𝒟𝑖𝑛, 𝒞ℱ𝑛(𝒟𝑖𝑛𝑛) = 𝑇𝑟𝑢𝑒 ⟶ 𝒟𝑜𝑢𝑡∈⋃𝑖∈{1,...,𝑛}𝒟𝑜𝑢𝑡𝑖, 𝒞ℱ=⋂𝑖∈{1,...,𝑛}𝒞ℱ𝑖(𝒟𝑖𝑛𝑖).
Verification
We use Boolean query and the axioms in KG to offer the correct Constraints of translated executable data analytics.
An example can be Every Task has at least one output data, which is a DataEntity, where
∀𝑥.task(𝑥) → ∃𝑦(hasOutput(𝑥, 𝑦) ∧DataEntity(𝑦))
QUERY∶ 𝑄(𝑥) ←Task(𝑥) ∧ ¬∃𝑦.(hasOutput(𝑥, 𝑦) ∧DataEntity(𝑦))
Translation and Execution
The translation can be discussed with two structures of executable KGs:
Sequential: here each executable KG is in the form of a Pipeline 𝒯𝑝, which consists of a series of Tasks 𝒯 of sequential structures connected with object property hasNextTask. Thus, the translation of an executable KG invokes the Python function scripts with the inputs/outputs and parameters given by DataEntity and datatype properties of KGs, according to the order defined by hasNextTask.
Parallel: In the case of merging two parallel structures, the translator will search preceding dependency with hasNextTask, until no preceding Task 𝒯 is found.
Example
The following diagram illustrates an example ML pipeline KG. It takes TimeSeries and SingleFeatures as the input data, and does
LRRegression to predict the Q-Value. The users can simply change the input data, output data, and method of the pipeline, by changing the named individuals, e.g., the users can delete TimeSeries if they do not have the sensor curves (time series) in their data, because the sensor curves are costly to collect. The users can also change the ML method from LRRegression to MLP (multilayer perceptron).
In this poster we propose our ongoing research of representing data analytical pipelines in KGs and transformation (also called “translation”) of such KGs in executable analytical pipelines, which contributes to that the experts with distinct background spend less time on discussion and found out they have the same view of the problem and the ML solutions.
We discussed framework, verification, translation and execution with our scope of welding monitoring with a Bosch case and evaluated our approach with real industrial data and users from Bosch case, which shows promising results.