UK-2011 Web spam Dataset

Due to the unavailability of some Web sites that were mentioned in WEBSPAM-UK2007, and the urgent need to compute new features, we have built a new dataset; called UK-2011, which was derived from the WEBSPAM-UK2007 dataset, to act as an alternative dataset.

Depending on the operational spam Web sites mentioned in WEBSPAM-UK2007 we have recollected the UK spam pages, by extracting spam pages from the available Web spam sites. The new dataset consisted of around 3,700 Web pages.

Please cite our paper (Wahsheh H., Abu Doush I., Al-Kabi M., Alsmadi I. and Al-Shawakfa E. (2012), Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam, International Journal of Information Assurance and Security (JIAS), 7 (1): 14-24.

) if you use Web Spam 2011 Datasets in your publication.