QuestIO

QuestIO accepts text-based query as an input and transforms it into a set of ranked formal language queries (SeRQL or SPARQL). The top ranked queries are executed until a non-empty set of results is found and returned to the user. In what follows we give brief overview of the system (see Figure 1).

Figure 1. QuestIO Workflow

Initialisation of the system

To initialise QuestIO automatically we preprocess the ontology resources (e.g., classes, instances, properties and property values) and extract lemmas of all human-understandable lexicalisations. These lemmas are stored in a dynamic gazetteer list which is used by the subsequent components in the process of query interpretation. It is essential that the gazetteer list is updated on the fly, because it needs to be kept in sync with the ontology, as the latter changes over time.

Figure 2: QuestIO diagram

Figure 1: QuestIO diagram

Key components of QuestIO are shown in Figure 2. Each query is interpreted using the Query Interpreter in the User Interface. It is then analysed by two components, each of which is a separate GATE pipeline application. Firstly, the Key Concept Identification Tool (KCIT) identifies key concepts inside the query. Identified key concepts refer to mentions of ontology resources such as instances, classes, properties or property values. The query is processed with the same language processing resources we used when extracting lemmas in the previous phase, so that we can then match the extracted lemmas from the ontology resources and the lemmas from the query. In this way, we are matching all existing morphological inflections of the relevant terms. Secondly, the Context Collector collects all words from the query that are not recognised by KCIT, but could be useful in the process of generating the formal query, e.g. keywords.

To give an example, in a query What are the countries located in Europe?, KCIT annotates countries as a mention of the class Country, and Europe as an instance of the class Continent. What are is a key phrase and in is a keyword, both of which will be annotated by the Context Collector as they could be used later to help disambiguate the formal queries or to filter results, so that the results for the query What are the countries located in Europe? would be the list of countries, whereas for the query how many countries are located in Europe? the result would be the number of countries. Additionally, the Context Collector extracts the text between all identified key concepts which are called chunks.

Next, the Query Analyser (see Figure 3) uses the identified key concepts from the KCIT to infer any potential relations that are defined between these concepts. These relations are then ranked based on the string similarity with relevant chunks, and also based on their position in the ontology: those that are more specific, and whose domain and range classes are more specific, are ranked higher in comparison to others. These are then transformed into a set of ranked SeRQL/SPARQL queries, which are executed and the result is sent back to the user.

Figure 3. Query Analyser