May 25th, ESWC 2014 - Anissaras/Hersonissou, Crete


Question Answering (QA) approaches over Linked Data provide principled and multidisciplinary solutions to cope with the growing semantic heterogeneity of databases, merging elements from natural language processing, information retrieval and databases. The study of existing approaches for QA over Linked Data can provide key insights on how future applications and users will cope with increasing data variety. In this tutorial we will provide a comprehensive introduction and description of the state-of-the art for Question Answering (QA) approaches over Linked Data. The tutorial will cover emerging challenges & trends in the database landscape (Big Data & Linked Data), describing how QA principles and methods can be used to address these challenges. The discussion will be grounded on existing systems. The methodology and existing test collections for evaluating QA over Linked Data systems will be covered. 

Description

The demand to access large amounts of heterogeneous structured data is emerging as a solid trend for many applications and users on the Web. However, the effort involved in finding and querying heterogeneous and distributed third-party data sources on the Web can create barriers for data consumers. At the core of this problem is the vocabulary gap between the way users express their information needs and the data representation. Question Answering (QA) approaches over Linked Data provide principled and multidisciplinary solutions to cope with the growing semantic heterogeneity of databases, merging elements from natural language processing, information retrieval and databases. The study of existing approaches for QA over Linked Data can provide key insights on how future applications and users will cope with increasing data variety.

In this tutorial we will provide a comprehensive introduction and description of the state-of-the art for Question Answering (QA) approaches over Linked Data. The tutorial will cover emerging challenges & trends in the database landscape (Big Data & Linked Data), aligning how QA principles and methods can be used to address these challenges. The core terminology and concepts in the area, and the components of QA approaches will be introduced. The semantic assumptions behind QA systems will be examined under the Computational Linguistics perspective and the Semantic Web perspective, followed by the description of state-of-the-art approaches for QA over Linked Data (Lopez et al. (2006), Cimiano et al. (2007), Damljanovic et al. (2010), Freitas et al. (2012), Unger et al. (2012), Cabrio et al. (2012), Pradel et al. (2013). IBM Watson (Ferrucci et al. (2010)) will also be analyzed as a case study. The methodologies and existing test collections for evaluating QA over Linked Data systems will be covered. After the analysis of existing approaches, relevant semantic patterns used in the construction of QA systems, which can be used in other application scenarios, are analyzed. Finally, a roadmap for the research in the field based on the analysis of the literature will be covered.

Audience

This half-day tutorial provides a timely discussion on modern data management challenges, in particular, on how to support users and applications querying heterogeneous databases, using state-of-the-art techniques in computational semantics. The principles, techniques and the multidisciplinary perspective of applied computational semantics present on existing QA approaches over Linked Data, can be applied to other areas such as: data integration, flexible databases queries, semantic search, human computer interaction, social network querying, among others. 

This tutorial targets Computer Science researchers and professionals who are either seeking a comprehensive introduction to Question Answering over Linked Data or specialists who want to have an in-depth understanding of the state-of-the-art on this field. The tutorial topic can be of interest in particular (but not restricted) to researchers and professionals in the following areas: Semantic Web, Linked Data, Social Networks, Information Retrieval, Databases and Natural Language Processing.

The audience is expected to have a Computer Science or Computational Linguistics background. Familiarity with basic Semantic Web concepts will facilitate the understanding of the covered topics.

Presenters

Christina Unger is a post-doctoral researcher at at the Cognitive Interaction Technology cluster of excellence (CITEC) at Bielefeld University in Germany. She received her PhD in 2010 from the Utrecht Institute of Linguistics (UiL-OTS) in the Netherlands. Christina’s main expertise is in computational semantics. Since joining the Semantic Computing group in Bielefeld, her main research interests are the ontology-based interpretation of natural language and automatic grammar induction. Christina has several years of experience in developing ontology-based question answering systems over linked data, thereby gaining much insight into the involved challenges, and is a co-organizer of the QALD open challenge on question answering over linked data. 

André Freitas is a PhD Candidate at the Insight Center for Data Analytics at the National University of Ireland, Galway (formerly the Digital Enterprise Research Institute (DERI)) and is a partner and co-founder at Amtera Semantic Technologies. André holds a BSc. in Computer Science from the Federal University of Rio de Janeiro (UFRJ), Brazil (2005). His main research areas include Natural Language Query Mechanisms over Linked Data, Semantic Search, Vocabulary-independent Query Mechanisms, Distributional Semantics, Semantic Relatedness, Approximate Reasoning and Provenance. Before joining DERI, Andre worked as a research assistant (trainee) at Siemens Corporate Research, Princeton, USA. André worked as a software engineer, product designer and project manager in different industries including Oil & Gas Exploration, IT Security, Medical, Healthcare, Banking, Mining and Telecom.

Topics

Below is a tentative Table of Contents for the tutorial:

– Motivation & Context
    • Why Question Answering?
    • QA vs IR Perspectives
    • QA vs Databases Perspectives
    • QA Vision
    • QA: Reality (Watson, Siri, FB Graph Search)

– Linked Data & QA
    • Linked Data: Structured queries
    • Query/Search Spectrum
    • The Vocabulary Problem for Linked Data
    • Are Natural Language Interfaces (NLIs) useful for Databases? (Kaufmann & Bernstein 2010) 
    • NLI & QAs
    • Comparative study
    • QA for Linked Data Requirements

– The Anatomy of a QA System
    • Basic Concepts & Taxonomy
    • Question Phrase
    • Question Type
    • Answer Type
    • Question Focus & Topic
    • Data Source Type
    • Domain Type
    • Answer Format
    • Answer Quality Criteria
    • Answer Assessment
    • Answer Processing
    • Complexity of the QA Task
    • Main Components

– Case Studies
    • Aqualog & PowerAqua (Lopez et al. 2006)
    • ORAKEL (Cimiano et al, 2007)
    • QuestIO & Freya (Damljanovic et al. 2010)
    • Treo (Freitas et al. 2012)
    • Unger et al. (2012)
    • QAKIS (Cabrio et al. 2012)
    • Swip (Pradel et al. 2013)
    • IBM Watson (Ferrucci et al. 2010)

– Evaluation of QA Systems
    • Evaluation Campaigns
    • Evaluation metrics
    • Test Collections
    • QALD-1, ESWC (2011)
    • QALD-2, ESWC (2012)
    • QALD-3, CLEF (2013)
    • QALD-4, CLEF (2014)
    • INEX Linked Data Track (2013)
    • SemSearch Challenge
    • BioASQ

– Roadmap & Research Topics for QA Systems

Tutorial Material

The material for the tutorial will be available on April 25th, 2014.

Recommended References

E. Kaufmann, A. Bernstein, Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases, J. Web Semantics: Science, Services and Agents on the World Wide Web 8, 393-377, 2010.

1st Workshop on Question Answering over Linked Data (QALD-1), http://www.sc.cit-ec.uni-bielefeld.de/qald-1, 2011.

A. Freitas, E. Curry, J. G. Oliveira, S. O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012.

V. Lopez, E. Motta, V. Uren: PowerAqua: Fishing the Semantic Web. Proc 3rd European Semantic Web Conference ESWC, Vol. 4011. p. 393-410, 2004.

D. Damljanovic, M. Agatonovic, and H. Cunningham, FREyA: An Interactive Way of Querying Linked Data Using Natural Language, Proc. 1st Workshop on Question Answering over Linked Data (QALD-1), 8th Extended Semantic Web Conf. (ESWC), 2011.

C. Unger, L. Buehmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber and P. Cimiano, Template-based Question Answering over RDF Data, In Proceedings of 21st International World Wide Web Conference (WWW'12). Lyon, France, April, 2012.

M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp and G. Weikum, Natural Language Questions for the Web of Data, EMNLP 2012.

E. Cabrio , J. Cojan , A. P. Aprosio, B. Magnini, A. Lavelli, and F. Gandon (2012), QAKIS: An Open Domain QA System based on Relational Patterns,  in Proceedings of the 11th International Semantic Web Conference (ISWC), 2012.

P. Cimiano, P. Haase, J. Heizmann, M. Mantel, R. Studer, Towards portable natural language interfaces to knowledge bases: The Case of the ORAKEL system, Data Knowledge Engineering (DKE), 65 (2) p. 325-354, 2008.

M. Franklin, A. Halevy, D. Maier, From databases to dataspaces: a new abstraction for information management, ACM SIGMOD Record, v. 34, p. 27-33, 2005.

D. Ferrucci et al., Building Watson: An Overview of the DeepQA Project, AI Magazine, 2010.

C. Pradel,  O. Haemmerle, N. Hernandez, Swip, a Semantic Web Interface using Patterns, International Semantic Web Conference (Posters & Demos), pp. 73-76, 2013.