Datasets

This page contains information and links to datasets used in the evaluation experiments with FREyA.

MusicBrainz and DBPedia datasets available from the QALD-1 challenge website, used in our QALD-1 paper (see Damljanovic et al., 2011).
Geography of the United States dataset which we used in our ESWC paper [Damljanovic et al., 2010]:
- Mooney GeoQuery dataset originates from the system called Geobase, which was included in the commercial Prolog for PCs (Turbo Prolog 2.0, Borland International 1988), and was a NLI for a simple geography database. The Geobase data covered information about the United States: population, area, capital cities, states, rivers, the highest and the lowest points and their elevations. This dataset has been used extensively by Mooney and collegues who published the dataset and the demo which can be found here. This dataset has been used in the evaluations of various NLIDBs, and recently also in the evaluation of NLIs to ontologies. Below file is from University of Zurich. If using any of these datasets please acknowledge appropriately their original sources.
  - OWL file
  - 250 questions
- Identification of the question focus and answer type. This data we generated as a gold standard in the evaluation of the algorithm for the identification of the answer type described in our LREC paper [Damljanovic et al., 2010]