Publications‎ > ‎

Spam Users Identification in Wikipedia via Editing Behavior

by Thomas Green and Francesca Spezzano @ ICWSM 2017

In this paper, we address the problem of identifying spam users on Wikipedia and present our preliminary results. We formulate the problem as a binary classification task and propose a set of features based on user editing behavior to separate spammers from benign users. We tested our system on a new dataset we built consisting of 4.2K (half spam and half benign) users and 75.6K edits. Experimental results show that our approach reaches 80.8% classification accuracy and 0.88 mean average precision. We compared against ORES, the most recent tool developed by Wikimedia which assigns a damaging score to each edit, and we show that our system outperforms ORES in spam users detection. Moreover, by combining our features with ORES, classification accuracy increases to 82.1%. Additionally, we also show that our system performs well in a more realistic, unbalanced setting, i.e. when spammers are greatly outnumbered by benign users, by achieving an AUROC of 0.84 (which increases to 0.86 when we combine with ORES).

Contacts: francescaspezzano@boisestate.edu

    Paper PDF

Cite our paper as:

@inproceedings{GreenS2017,

    author = {Green, Thomas and Spezzano, Francesca},

    title = {Spam Users Identification in Wikipedia via Editing Behavior},

    booktitle = {11th AAAI International Conference on Web and Social Media (ICWSM)}, 

    pages = {532--535},

    year = {2017}

Download our Spam dataset here below

ċ
SPAM_DATASET.zip
(3813k)
Francesca Spezzano,
12 mag 2017, 11:04
Comments