Web Spam 2011 Datasets

The lack of a benchmark collection of Arabic Web pages is still considered as one of the main problems affecting the research efforts in the field of Arabic Web spam filtering.

The following three datasets of Web spam pages were considered and used in

Wahsheh H., Abu Doush I., Al-Kabi M., Alsmadi I. and Al-Shawakfa E. (2012), Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam, International Journal of Information Assurance and Security (JIAS), 7 (1): 14-24.

Extended Arabic Web Spam 2011 Dataset

WEBSPAM-UK2007 Dataset

UK-2011 Web spam Dataset

Please cite our paper if you use Web Spam 2011 Datasets in your publication.