Anomaly Detection with Joint Representation Learning of Content and Connection

Abstract

Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information space to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user's follower network, to detect groups of densely connected users in an unsupervised fashion. We then investigate these dense sub-blocks of users to flag anomalous behavior. In our experiments, we study the tweets related to the upcoming 2019 Canadian Elections, and observe a set of densely-connected users engaging in local politics in different provinces, and exhibiting troll-like behavior.

Contributions

(1) We create a joint autoencoder based solution to the problem by formulating Information Operation detection as dense sub-block detection on binary attributed graph, which encodes both the content of tweets and the connections in the Twitter follower network. The dense sub-blocks are detected using density-based clustering on learned node embeddings of the graph.


(2) We design an adaptive hyperparameter selection method by generating task-specific synthetic data, thus solving the problem of lack of objective evaluation standard for this unsupervised anomaly detection tasks.


(3) We demonstrate the application of our solution to real-world data by identifying a sub-block of suspicious and tightly connected users, as well as a suspicious account exhibiting behaviors related to Information Operations.