SemAI 2022: First Workshop on Semantic AI
September 13, 2022, Vienna, Austria. Co-located with SEMANTiCS Conference 2022
Introduction
AI approaches based on machine learning have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely opaque and suffer from a lack of transparency and explainability. Semantic AI approaches combine methodology from statistical AI and symbolic AI based on semantic technologies such as knowledge graphs as well as natural language processing, while incorporating mechanisms for explainable AI. Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. While the individual aspects of semantic AI are being studied in their respective research communities a dedicated community focusing on their combination is yet to be established.
Program
Session 1 - 14:00
14:00-14:15: Welcome and Introduction [slides]
Torsten Priebe, St. Pölten University of Applied Sciences
14:15-15:00 Keynote: Knowledge Graphs for impactful Data Science [slides][victordeboer.com]
Victor de Boer, Vrije Universiteit Amsterdam
In many modern statistical approaches to AI, raw data is the preferred input for (Machine Learning) models. In some areas and in some cases, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has invested decades of work on just this problem: how to use graphs to represent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. To build scalable, transparent and explainable AI in various domains where such heterogeneous is available, we need to collaborate with domain experts to develop a) relevant an high-quality knowledge graphs as well as b) appropriate data science and ML methos to constantly enrich and analyse these graphs. In this talk, i will give an overview of collaboratory current work on this continuous enrichment of Knowledge Graphs, specifically in the domains of Digital Humanities and IOT and will discuss some promises, challenges and solutions identified.
Victor de Boer is an Associate Professor of User-Centric Data Science at Vrije Universiteit Amsterdam and he is Senior Research Fellow at the Netherlands Institute for Sound and Vision. His research focuses on data integration, semantic data enrichment and knowledge sharing using Linked Data technologies in various domains. These domains include Cultural Heritage, Digital Humanities and ICT for Development where he collaborates with domain experts in interdisciplinary teams. He is currently involved in the European projects InTaVia and InterConnect as well as the Dutch national projects Pressing Matter, Clariah and Hybrid Intelligence.
15:00-15:30 Coffee Break
Session 2 - 15:30
15:30-16:00: Towards a Standardized Description of Semantic Web Machine Learning Systems [slides][paper]
Fajar J. Ekaputra, Laura Waltersdorfer, Anna Breit and Marta Sabou
In this paper, we report on our proposed approach towards a standardized description for systems combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community (SWeMLS), which is one of lessons learned from our large-scale survey (476 papers) on the topic. We elaborate the key information that should be described of SWeMLS and selected methods to support its documentation.
16:00-16:45: Keynote: Weak Supervision for Information Extraction from Text
Benjamin Roth, University of Vienna
Weak supervision (WS) is an alternative to standard supervised learning that overcomes the need for manually annotated training data. WS uses heuristics, knowledge repositories, or high-level constraints to obtain (lower quality) labels more efficiently or at higher levels of abstraction. This talk will give an overview of applications of WS to information extraction from text. Potential challenges of these approaches, and algorithmic solutions for overcoming them, will be highlighted. The talk will conclude with a high-level characterization of the software framework Knodle for weakly supervised learning with arbitrary neural networks.
Benjamin Roth is a professor in the area of deep learning & statistical NLP, leading the WWTF Vienna Research Group for Young Investigators "Knowledge-Infused Deep Learning for Natural Language Processing". Prior to this, he was an interim professor at LMU Munich. He obtained his PhD from Saarland University and did a postdoc at UMass, Amherst. His research interests are the extraction of knowledge from text with statistical methods and knowledge-supervised learning.
16:45-17:15: Market Needs/Real-World Experiences with Weak Supervision in a No-Code NLP Platform [slides]
Peter Kesch, Head of Business Development, SentiSquare
Topics of Interest
Topics relevant to this workshop include – but are not limited to – the following aspects of semantic AI:
Classification of types of emerging semantic AI approaches, e.g. those listed below
Novel design patterns for semantic AI systems
System development life-cycle stages when creating semantic AI systems
Neuro-symbolic AI, i.e., methods combining statistical AI based on machine learning and symbolic AI, based on semantic technologies
Machine teaching and other use of knowledge graphs in supervised machine learning to improve model quality and robustness
Semi-supervised learninng, i.e., combination of supervised and unsupervise machine learning, e.g., in natural language processing
Distant supervision, i.e., use of semantic rules and other heuristics for data labelling
Few- and zero-shot learning, i.e., adapting to unseen classes given only a few or no examples at all
Semantics-based approaches to explainable AI, human in the loop in AI to improve trustworthiness
Knowledge graphs and other semantic technologies for automated data quality management
Reports on benefits/limitations of combining machine learning and semantic technologies, possibly from the perspective of concrete application domains and tasks
Submission Guidelines
We welcome the following types of contributions:
Research papers (max. 8 pages)
Best practices and lessons learned (max. 4 pages)
All submissions must be written in English and adhere to the CEUR-ART style. Please use the following Overleaf template for LaTeX: https://www.overleaf.com/read/mwrbbrhrpwzv.
We follow a single-blind process with at least two reviewers per paper. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop.
Please submit your contributions electronically in PDF format via the EasyChair system: https://easychair.org/conferences/?conf=semai2022
For any enquiries, please send an email to semai2022@easychair.org.
Important Dates
Submission Deadline:July 4, 2022Notification of Acceptance:July 30, 2022Camera-ready versions of accepted contributions:August 15, 2022Workshop date:Tuesday, September 13, 2022
Proceedings
Accepted contributions will be presented at the workshop and included in the CEUR workshop proceedings. At least one author of each article is expected to register for the workshop and attend to present their contribution.
Organising Committee
Sebastian Neumaier, St. Pölten University of Applied Sciences
Martin Kaltenböck, Semantic Web Company
Marta Sabou, Vienna University of Economics and Business
Programme Comittee
Anna Breit, Semantic Web Company
Tomáš Brychcín, SentiSquare
Fajar Juang Ekaputra, Semantic Systems Research Lab, TU Vienna
Heiko Paulheim, Data and Web Science Group, University of Mannheim, Germany
Torsten Priebe, Data Intelligence Research Group, St. Pölten University of Applied Sciences
Laura Waltersdorfer, Semantic Systems Research Lab, TU Vienna
Objectives and Goals
AI approaches based on machine learning (ML) have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely opaque and suffer from a lack of transparency and explainability. This means that even if the underlying mathematical principles of these methods are understood, it is often unclear why a particular prediction has been made and if meaningful and grounded patterns have led to this prediction. Thus, there is a risk that the AI learns biases from the data or makes its decisions based on wrong or ambiguous information.
Semantic AI approaches combine methodology from ML-based AI and symbolic AI based on semantic technologies such as knowledge graphs (KG) as well as natural language processing (NLP), while incorporating mechanisms for explainable AI (XAI). Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. KGs, being one of the core elements of symbolic AI, provide a human-understandable and machine-processable way to model and reason over complex relationships of entities of interest. They also provide means for a more automated data quality management.
The interaction of ML-based and KG-based approaches can manifest in different forms, including the provision of rich background information for the input data by interlinking to existing external KGs, using the real-world domain knowledge in the KG to guide the training process of ML models, and enhancing causal inference to verify and/or assist the prediction. This close interconnection is not only capable of increasing the prediction performance of the AI model, but also of improving its robustness. Furthermore, the interaction between ML and KG leads to more understandable and interpretable predictions, as the ML approaches can be exploited to symbolise their intrinsic knowledge by generating new entities and relations within the KG and thus extending it, and the KG can be leveraged to generate human-understandable explanations for automatically produced predictions.
While the neuro-symbolic nature of semantic AI systems supports the understandability and interpretability of the produced predictions, more focused approaches originating from the field of XAI enable truly explainable decisions. XAI focuses on integrating explainability mechanisms in complex black-box models, either post-hoc (on already trained models) or during training (self-learned explainability). The integration and close interaction of these mechanisms with the neuro-symbolic system shall overcome existing limitations of state-of-the-art XAI methods and provide explanations which are directly formulated in a domain-specific language.