Does Paraphrasing Affect Bot Detection

Nikhil Kumar Bharti and Koustav Rudra

IIT(ISM) Dhanbad and IIT Kharagpur

Introduction

Nowadays, the presence of bots, or automated programs, is becoming increasingly commonplace on the internet. Bots on social media, especially, hamper the user experience in many ways. The bots may spread false information, fake news, spam, affect political campaigns or any other form of nuisance. Some bots may be created for malicious purposes, which also includes stealing user information.

Because of these reasons, developers and moderators need to employ the use of bot detectors to combat the trouble caused by the bots. As a result, bot creators use different tactics to evade bot detection algorithms. To mimic human behaviour, bot creators use paraphrasing. In paraphrasing, the text is rephrased by using synonyms, voice-change, or changing sentence structure to keep its overall meaning intact. This becomes especially difficult for the bot-detection algorithms as they cannot correctly identify whether a tweet is written by a human or a bot. In this project, we aim to analyze if paraphrasing has any effects on bot-detection algorithm. We consider a dataset containing tweets written by both humans and bots.

Analysis

We develop an LSTM-based approach to identify whether a tweet is written by a human or a bot. We consider two different setups

Figure 1: Comparison of performance of the bot detection model of bot written tweets under Setup 1 (text): The performance has dropped by 1.1% due to paraphrasing

Figure 2: Comparison of performance of the bot detection model of bot written tweets under Setup 2 (text+metadata): The performance has dropped by 1.55% due to paraphrasing

Figure 3: Comparison of performance of the bot detection model of bot written tweets under Setup 3 (text): The performance has increased by 0.32% due to paraphrasing

Figure 4: Comparison of performance of the bot detection model of bot written tweets under Setup 4 (text+metadata): The performance has dropped by 0.52% due to paraphrasing

Figure 5: Comparison of performance of the bot detection model of bot written tweets under Setup 5 (text): The performance has dropped by 1.42% due to paraphrasing

Figure 6: Comparison of performance of the bot detection model of bot written tweets under Setup 6 (text+metadata): The performance has dropped by 1.03% due to paraphrasing

As a general trend, we observe a decrease in Precision and Recall and F1-Score after paraphrasing. This trend can have the following reasons:

Conclusion

In this project, we analysed the effect paraphrasing on bot detection. It turns out that paraphrasing could be used to fool the bot detection models. Further research and better models will serve towards a better future in ensuring a hassle-free user experience and better access to information.

Acknowledgement

This work is supported by the Science and Engineering Research Board, Department of Science and Technology, Government of India, under Project SRG/2022/001548.