Alberto Pérez García-Plaza
Wikipedia Animal Dataset
Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes.
Statistics
This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilling the restriction of having at least 100 words each. Therefore, the HTML and MediaWiki content of these articles can be found in this dataset.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
Legal Information
By downloading and using this dataset you acknowledge that:
The data has been compiled to exclusively use it for scientific research purposes.
The copyright holders retain ownership and reserve all rights.
Download
There are two different packages available for downloading, depending on your reseacrh interests:
wad_en-es_html_20110111.tar.gz (6 MB): HTML Content for English and Spanish Wikipedia articles on the dataset.
wad_en-es-fi_20110111.tar.gz (17.3 MB): HTML and MediaWiki Content for English, Finnish and Spanish Wikipedia articles on the dataset.
Reference
Please, cite the following paper if you make use of this dataset for your research work:
Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez and Timo Honkela. Learning a taxonomy from a set of text documents. Applied Soft Computing. Volume 12, Issue 3, Pages 1138 - 1148, March 2012.