Alberto Pérez García-Plaza

Wikipedia Animal Dataset

Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes.


This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilling the restriction of having at least 100 words each. Therefore, the HTML and MediaWiki content of these articles can be found in this dataset.

If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.

Legal Information

By downloading and using this dataset you acknowledge that:

  • The data has been compiled to exclusively use it for scientific research purposes.
  • The copyright holders retain ownership and reserve all rights.


There are two different packages available for downloading, depending on your reseacrh interests:


Please, cite the following paper if you make use of this dataset for your research work:

Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez and Timo Honkela. Learning a taxonomy from a set of text documents. Applied Soft Computing. Volume 12, Issue 3, Pages 1138 - 1148, March 2012.