Trang chủ‎ > ‎IT‎ > ‎Data Mining‎ > ‎Online Social Network Analysis‎ > ‎

Twitter Bot Detection

1. Indiana State University 's Truthy project

Publications:
"Researchers at Indiana University’s Bloomington School of Informatics and Computing have released a tool designed to detect if a Twitter account is being operated by an automated “bot” system or a real person in their continued effort to raise awareness about the potential for such accounts to be abused for misinformation campaigns.

The BotOrNot tool’s development was funded by U.S. Department of Defense and the National Science Foundation, and it analyzes thousands of variables related to a Twitter account’s network including the account’s content and temporal information in in real time then calculates the probability that the account may be controlled automated software.

“We have applied a statistical learning framework to analyze Twitter data, but the ‘secret sauce’ is in the set of more than one thousand predictive features able to discriminate between human users and social bots, based on content and timing of their tweets, and the structure of their networks,” said Alessandro Flammini, principal investigator on the project. “The demo that we’ve made available illustrates some of these features and how they contribute to the overall ‘bot or not’ score of a Twitter account.”

The military’s support for the project centers around concerns over how modern social media platforms combined with the proliferation of mobile information technology could negatively impact national security if leveraged to conduct large scale misinformation campaigns.

BotOrNot has a statistical accuracy of about 95%, and the researchers hope the tool will be useful in surveying the Twittersphere to determine how many accounts are actually being controlled by bots, and which ones may be malicious in nature.

“Part of the motivation of our research is that we don’t really know how bad the problem is in quantitative terms,” said Fil Menczer, direcor of IU’s Center for Complex Networks and Systems Research. “Are there thousands of social bots? Millions? We know there are lots of bots out there, and many are totally benign. But we also found examples of nasty bots used to mislead, exploit and manipulate discourse with rumors, spam, malware, misinformation, political astroturf and slander.

[http://archive.news.indiana.edu/releases/iu/2014/05/twitter-botornot.shtml]

BLOOMINGTON, Ind. -- Complex networks researchers at Indiana University have developed a tool that helps anyone determine whether a Twitter account is operated by a human or an automated software application known as a social bot. The new analysis tool stems from research at the IU Bloomington School of Informatics and Computing funded by the U.S. Department of Defense to counter technology-based misinformation and deception campaigns.

BotOrNot analyzes over 1,000 features from a user's friendship network, their Twitter content and temporal information, all in real time. It then calculates the likelihood that the account may or may not be a bot. The National Science Foundation and the U.S. military are funding the research after recognizing that increased information flow -- blogs, social networking sites, media-sharing technology -- along with an accelerated proliferation of mobile technology is changing the way communication and possibly misinformation campaigns are conducted.

As network science is applied to the task of uncovering deception, it leverages the structure of social and information diffusion networks, along with linguistic cues, temporal patterns and sentiment data mined from content spreading through social media. Each of these feature classes is analyzed with BotOrNot.

“We have applied a statistical learning framework to analyze Twitter data, but the ‘secret sauce’ is in the set of more than one thousand predictive features able to discriminate between human users and social bots, based on content and timing of their tweets, and the structure of their networks,” said Alessandro Flammini, an associate professor of informatics and principal investigator on the project. “The demo that we’ve made available illustrates some of these features and how they contribute to the overall ‘bot or not’ score of a Twitter account.”

Through use of these features and examples of Twitter bots provided by Texas A&M University professor James Caverlee's infolab, the researchers are able to train statistical models to discriminate between social bots and humans; according to Flammini, the system is quite accurate. Using an evaluation measure called AUROC, BotOrNot is scoring 0.95 with 1.0 being perfect accuracy.

“Part of the motivation of our research is that we don't really know how bad the problem is in quantitative terms,” said Fil Menczer, the informatics and computer science professor who directs IU’s Center for Complex Networks and Systems Research, where the new work is being conducted as part of the information diffusion research project called Truthy. “Are there thousands of social bots? Millions? We know there are lots of bots out there, and many are totally benign. But we also found examples of nasty bots used to mislead, exploit and manipulate discourse with rumors, spam, malware, misinformation, political astroturf and slander.”

Flammini and Menczer said it’s their belief that these kinds of social bots could be dangerous for democracy, cause panic during an emergency, affect the stock market, facilitate cybercrime and hinder advancement of public policy. The goal is to support human efforts to counter misinformation with truthful information.

The use of social bots has gained widespread attention in mass media. Menczer has been interviews by The New York Times on the use of social bots to sway elections and was sought out to consult on the topic by writers of the network television series "The Good Wife."

The team received just over $2 million in 2012 for a proposal called “Detecting Early Signature of Persuasion in Information Cascades” and last month presented results about BotOrNot and other aspects of the project at a Department of Defense meeting in Arlington, Va.


2. DARPA Twitter Bot Challenge

The US military has enlisted academics to fight a new enemy: Twitter bots. 

The Defense Advanced Research Projects Agency (DARPA) held a special contest last year to identify so-called "influence bots"  — "realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook."

The fascinating 4-week competition, called the DARPA Twitter Bot Challenge, was detailed in a paper published this week.

The paper minces no words about how dangerous it is that human-like bots on social media can accelerate recruitment to organizations like ISIS, or grant governments the ability to spread misinformation to their people. Proven uses of influence bots in the wild are rare, the paper notes, but the threat is real.

The contest

And so, the surprisingly simple test. DARPA placed "39 pro-vaccination influence bots" onto a fake, Twitter-like social network. Importantly, competing teams didn't know how many influence bots there were in total.

Teams from the University of Southern California, Indiana University, Georgia Tech, Sentimetrix, IBM, and Boston Fusion worked over the four weeks to find them all. 

With 8.5% of all Twitter users being bots, per the company's own metrics, it's important to weed out those bots who go beyond just trying to sell you weight-loss plans and work-at-home methods, and cross the line into politics.

But actually making that distinction can be a challenge, as the paper notes.

Sentimetrix technically won the challenge, reporting 39 correct guesses and one false positive, a full six days before the end of the four-week contest period. But USC was the most accurate, going 39 for 39. 

How to detect a robot

DARPA combined all the teams' various approaches into a complicated 3-step process, all of which will need improved software support to get better and faster going forward:

  1. Initial bot detection — You can detect who's a bot and who's not by using language analysis to see who's using statistically unnatural and bot-generated words and phrases. Using multiple hashtags in a post can also be a flag. Also, if you post to Twitter a lot, and consistently over the span of a 24-hour day, the chances you're a bot go up.
  2. Clustering, outliers, and network analysis: That first step may only identify a few bots. But bots tend to follow bots, so you can use your initial findings to network out and get a good statistical sense of robot social circles. 
  3. Classification/Outlier analysis: The more positives you find with the first two steps, the easier it is to extrapolate out and find the rest in a bunch. 

A key finding from the DARPA paper, and very important to note, is that all of this required human interaction — computers just can't tell a real human from an influence bot, at least not yet. 

Twitter botnetA Twitter bot post.Twitter

The good news, say the authors in their paper, is that these methods can also be used to find human-run propaganda and misinformation campaigns.

The bad news is that you can expect a lot more evil propaganda bots on Twitter in the years to come.

"Bot developers are becoming increasingly sophisticated. Over the next few years, we can expect a proliferation of social media influence bots as advertisers, criminals, politicians, nation states, terrorists, and others try to influence populations," says the paper.


3. Some twitter bot detection services websites: