Alberto Pérez García-Plaza


Wikipedia Animal Dataset

Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes.


This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilling the restriction of having at least 100 words each. Therefore, the HTML and MediaWiki content of these articles can be found in this dataset.

If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.

Legal Information

By downloading and using this dataset you acknowledge that:


There are two different packages available for downloading, depending on your reseacrh interests:


Please, cite the following paper if you make use of this dataset for your research work:

Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez and Timo Honkela. Learning a taxonomy from a set of text documents. Applied Soft Computing. Volume 12, Issue 3, Pages 1138 - 1148, March 2012.