Motivation and Background

Motivation

In today's world where there is thousands of tweets produced every second, the impact of tweets influencing about various widespread events ranging from day to day activities, stories, reviews or ratings on a product/company/individual, perceptions on a public body like a political party/movement, etc. is very high.

The role and usage of automated accounts known as the social-media bots plays a widespread role in creating an influence or spreading a message/perception through content generated with no direct human involvement. Hence, the importance to identify such social-media bots and tweets generated by them is important to avoid being influenced by agendas or perceptions created by such automated accounts.

Problem Statement

This project aims at detecting the Twitter bots i.e. these automated twitter accounts by using the tweets data generated by these accounts. The project will use the live twitter data made available through the Twitter developer API to analyze tweeting patterns of an user and classify them as a human user or a Twitter bot.

The project will take a supervised approach to perform classification by utilizing a set of pre-labelled data that already has identified users who are human users and Twitter bots. This pre-labelled data along with the features that is extracted from the raw tweets of these users will be used to train classification models to classify twitter bot users. The features extracted will also apply Natural Language Processing techniques to perform sentiment analysis and Topic Modeling on the tweet data

Approach

The approach adopted to execute this project was the iterative cycle that was etched on our brains as the Data Science Process during the entire semester. Framing the problem statement, Getting the data, Explore the data, Model the data, evaluate the results, communicate and visualize the results.

Data Science Process

This website is also structured and organized in the same sequence starting with

  • Project statement - contains the motivation, problem statement and approach followed
  • Data - explains the source of data, the data collection process and all the data cleansing and wrangling done as part of feature extraction
  • EDA - explains all the analysis and assessment performed on the extracted features as part of the Exploratory Data Analysis process
  • Modeling Approach - explains the approach that we adopted for modeling and all the models evaluated as part of the process. The steps followed in identifying the optimal parameters for each model and how the final model was arrived at
  • Results - explains the results in terms of prediction accuracy scores achieved with various evaluated models across the train, validation and test data sets and the accuracy achieved by the final model
  • Conclusion - concludes our findings from the approach followed and the results obtained. Gives pointers on the next steps that can be taken to improve the prediction and model performance and any other modelling approach that can be evaluated.
  • References - lists all the references and literature that was used to gather the understanding and knowledge to execute this project.
  • Team - provides information about the collaborative combination that made this project happen along with the guidance from our valuable project guide.
  • Acknowledgements - conveys our gratitude to our wonderful teaching team that made all this happen