Data Collection for the Integrative Cyberinfrastructure for Online Abuse Research Project (ICOAR)

Ethan Anderson

Authors: Ethan Anderson, Taran Kavuru, Mohammed Aldeen, and Dr. Long Cheng

Faculty Mentor: Dr. Long Cheng

College: College of Engineering, Computing, and Applied Sciences


ABSTRACT

As social media sites continue to grow, the prevalence of online harassment and abuse grows as well. Consequently, many research endeavors are pursuing artificial intelligence (AI) and machine learning tools to rapidly detect, stop, and analyze instances of online abuse. Unfortunately, social and behavioral scientists engaging in this research often lack the means to utilize these tools successfully. As a solution, ICOAR aims to offer an easy-to-use interface for collecting data from various social media sites, running pre-trained sentiment and toxicity models for annotation, training and validating new models, and visualizing results.


This research focused on building ICOAR’s dynamic data collection system such that it can pull from various prominent social media sites such as Twitter, Reddit, Instagram, and TikTok, giving researchers the ability to do highly targeted searches and apply different techniques to sort and classify data. A variety of data collection methods, such as scraping, crawling, and official application programming interface (API) requests, were implemented to address the issue of consistent data collection being interrupted by frequent social media platform and API changes.

Video Introduction

Ethan Anderson 2023 Undergraduate Poster Forum