SemAI 2022: First Workshop on Semantic AI

September 13, 2022, Vienna, Austria. Co-located with SEMANTiCS Conference 2022

Introduction

AI approaches based on machine learning have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely opaque and suffer from a lack of transparency and explainability. Semantic AI approaches combine methodology from statistical AI and symbolic AI based on semantic technologies such as knowledge graphs as well as natural language processing, while incorporating mechanisms for explainable AI. Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. While the individual aspects of semantic AI are being studied in their respective research communities a dedicated community focusing on their combination is yet to be established.

Program

Session 1 - 14:00

  • 14:00-14:15: Welcome and Introduction [slides]
    Torsten Priebe, St. Pölten University of Applied Sciences

In many modern statistical approaches to AI, raw data is the preferred input for (Machine Learning) models. In some areas and in some cases, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has invested decades of work on just this problem: how to use graphs to represent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. To build scalable, transparent and explainable AI in various domains where such heterogeneous is available, we need to collaborate with domain experts to develop a) relevant an high-quality knowledge graphs as well as b) appropriate data science and ML methos to constantly enrich and analyse these graphs. In this talk, i will give an overview of collaboratory current work on this continuous enrichment of Knowledge Graphs, specifically in the domains of Digital Humanities and IOT and will discuss some promises, challenges and solutions identified.

Victor de Boer is an Associate Professor of User-Centric Data Science at Vrije Universiteit Amsterdam and he is Senior Research Fellow at the Netherlands Institute for Sound and Vision. His research focuses on data integration, semantic data enrichment and knowledge sharing using Linked Data technologies in various domains. These domains include Cultural Heritage, Digital Humanities and ICT for Development where he collaborates with domain experts in interdisciplinary teams. He is currently involved in the European projects InTaVia and InterConnect as well as the Dutch national projects Pressing Matter, Clariah and Hybrid Intelligence.

  • 15:00-15:30 Coffee Break

Session 2 - 15:30

  • 15:30-16:00: Towards a Standardized Description of Semantic Web Machine Learning Systems [slides][paper]
    Fajar J. Ekaputra, Laura Waltersdorfer, Anna Breit and Marta Sabou

In this paper, we report on our proposed approach towards a standardized description for systems combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community (SWeMLS), which is one of lessons learned from our large-scale survey (476 papers) on the topic. We elaborate the key information that should be described of SWeMLS and selected methods to support its documentation.

  • 16:00-16:45: Keynote: Weak Supervision for Information Extraction from Text
    Benjamin Roth, University of Vienna

Weak supervision (WS) is an alternative to standard supervised learning that overcomes the need for manually annotated training data. WS uses heuristics, knowledge repositories, or high-level constraints to obtain (lower quality) labels more efficiently or at higher levels of abstraction. This talk will give an overview of applications of WS to information extraction from text. Potential challenges of these approaches, and algorithmic solutions for overcoming them, will be highlighted. The talk will conclude with a high-level characterization of the software framework Knodle for weakly supervised learning with arbitrary neural networks.

Benjamin Roth is a professor in the area of deep learning & statistical NLP, leading the WWTF Vienna Research Group for Young Investigators "Knowledge-Infused Deep Learning for Natural Language Processing". Prior to this, he was an interim professor at LMU Munich. He obtained his PhD from Saarland University and did a postdoc at UMass, Amherst. His research interests are the extraction of knowledge from text with statistical methods and knowledge-supervised learning.

  • 16:45-17:15: Market Needs/Real-World Experiences with Weak Supervision in a No-Code NLP Platform [slides]
    Peter Kesch, Head of Business Development, SentiSquare

Topics of Interest

Topics relevant to this workshop include – but are not limited to – the following aspects of semantic AI:

  • Classification of types of emerging semantic AI approaches, e.g. those listed below

  • Novel design patterns for semantic AI systems

  • System development life-cycle stages when creating semantic AI systems

  • Neuro-symbolic AI, i.e., methods combining statistical AI based on machine learning and symbolic AI, based on semantic technologies

  • Machine teaching and other use of knowledge graphs in supervised machine learning to improve model quality and robustness

  • Semi-supervised learninng, i.e., combination of supervised and unsupervise machine learning, e.g., in natural language processing

  • Distant supervision, i.e., use of semantic rules and other heuristics for data labelling

  • Few- and zero-shot learning, i.e., adapting to unseen classes given only a few or no examples at all

  • Semantics-based approaches to explainable AI, human in the loop in AI to improve trustworthiness

  • Knowledge graphs and other semantic technologies for automated data quality management

  • Reports on benefits/limitations of combining machine learning and semantic technologies, possibly from the perspective of concrete application domains and tasks

Submission Guidelines

We welcome the following types of contributions:

  • Research papers (max. 8 pages)

  • Best practices and lessons learned (max. 4 pages)

All submissions must be written in English and adhere to the CEUR-ART style. Please use the following Overleaf template for LaTeX: https://www.overleaf.com/read/mwrbbrhrpwzv.

We follow a single-blind process with at least two reviewers per paper. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop.

Please submit your contributions electronically in PDF format via the EasyChair system: https://easychair.org/conferences/?conf=semai2022

For any enquiries, please send an email to semai2022@easychair.org.

Important Dates

  • Submission Deadline: July 4, 2022

  • Notification of Acceptance: July 30, 2022

  • Camera-ready versions of accepted contributions: August 15, 2022

  • Workshop date: Tuesday, September 13, 2022

Proceedings

Accepted contributions will be presented at the workshop and included in the CEUR workshop proceedings. At least one author of each article is expected to register for the workshop and attend to present their contribution.

Organising Committee

  • Sebastian Neumaier, St. Pölten University of Applied Sciences

  • Martin Kaltenböck, Semantic Web Company

  • Marta Sabou, Vienna University of Economics and Business

Programme Comittee

  • Anna Breit, Semantic Web Company

  • Tomáš Brychcín, SentiSquare

  • Fajar Juang Ekaputra, Semantic Systems Research Lab, TU Vienna

  • Heiko Paulheim, Data and Web Science Group, University of Mannheim, Germany

  • Torsten Priebe, Data Intelligence Research Group, St. Pölten University of Applied Sciences

  • Laura Waltersdorfer, Semantic Systems Research Lab, TU Vienna

Objectives and Goals

AI approaches based on machine learning (ML) have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely opaque and suffer from a lack of transparency and explainability. This means that even if the underlying mathematical principles of these methods are understood, it is often unclear why a particular prediction has been made and if meaningful and grounded patterns have led to this prediction. Thus, there is a risk that the AI learns biases from the data or makes its decisions based on wrong or ambiguous information.

Semantic AI approaches combine methodology from ML-based AI and symbolic AI based on semantic technologies such as knowledge graphs (KG) as well as natural language processing (NLP), while incorporating mechanisms for explainable AI (XAI). Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. KGs, being one of the core elements of symbolic AI, provide a human-understandable and machine-processable way to model and reason over complex relationships of entities of interest. They also provide means for a more automated data quality management.

The interaction of ML-based and KG-based approaches can manifest in different forms, including the provision of rich background information for the input data by interlinking to existing external KGs, using the real-world domain knowledge in the KG to guide the training process of ML models, and enhancing causal inference to verify and/or assist the prediction. This close interconnection is not only capable of increasing the prediction performance of the AI model, but also of improving its robustness. Furthermore, the interaction between ML and KG leads to more understandable and interpretable predictions, as the ML approaches can be exploited to symbolise their intrinsic knowledge by generating new entities and relations within the KG and thus extending it, and the KG can be leveraged to generate human-understandable explanations for automatically produced predictions.

While the neuro-symbolic nature of semantic AI systems supports the understandability and interpretability of the produced predictions, more focused approaches originating from the field of XAI enable truly explainable decisions. XAI focuses on integrating explainability mechanisms in complex black-box models, either post-hoc (on already trained models) or during training (self-learned explainability). The integration and close interaction of these mechanisms with the neuro-symbolic system shall overcome existing limitations of state-of-the-art XAI methods and provide explanations which are directly formulated in a domain-specific language.