ICWSM Data Challenge: Safety

June 8, 2020


ICWSM Data Challenge Program

Workshop Registration and Attendance

There is no registration fee to attend the virtual workshop. The workshop will be held via Zoom web conferencing service.

Please fill in the following form to attend the workshop: https://forms.gle/41bCKmqreVgkwjd1A [Registrations closed]

We will send the Zoom connection details to registrants.

Invited keynote Speakers

1. "Social Cybersecurity and the Pandemic" by Prof. Kathleen Carley

Abstract: This talk introduces the emerging field of social cybersecurity, and then applies new techniques developed in this area to social media data related to COVID-19. Key issues discussed include the role of disinformation and bots, and the type of maneuvers used to influence people.

Bio: Dr. Carley is a Professor of Computer Science in the Institute for Software Research, IEEE Fellow, and Director of the Center for Computational Analysis of Social and Organizational Systems (CASOS) and Director of the center for Informed DEmocracy And Social‐cybersecurity (IDeaS) of the Center for at Carnegie Mellon University. She joined Carnegie Mellon in 1984 as Assistant Professor Sociology and Information Systems. In 1990 she became Associate Professor of Sociology and Organizations, in 1998 Professor of Sociology, Organizations, and Information Technology, and in 2002, attained her current role as Professor of Computation, Organization, and Society. She is also the CEO of Carley Technologies Inc. aka Netanomics.

2. "Beyond Content Classification: An Informal Talk about Computational Approaches to Online Abuse and Misinformation" by Prof. Mor Naaman

Bio: Mor Naaman is a professor of Information Science at the Jacobs Institute at Cornell Tech. Mor leads a research group focused on topics related to the intersection of technology, media and democracy. The group applies multidisciplinary techniques - from machine learning to qualitative social science - to study our information ecosystem and its challenges. Previously, Mor was on the faculty at the Rutgers School of Communication and Information, led a research team at Yahoo! Research Berkeley, received a Ph.D. in Computer Science from the Stanford University InfoLab, and played professional basketball for Hapoel Tel Aviv. His research is widely recognized, including with an NSF Early Faculty CAREER Award, research awards and grants from numerous corporations, and multiple best paper awards.

Accepted Papers

An RNN-based Classifier to detect Misinformation in News Articles: Brendan Cunha and Lydia Manikonda

On Analyzing Annotation Consistency in Online Abusive Behavior Datasets: Md Rabiul Awal, Rui Cao, Roy Ka-Wei Lee and Sandra Mitrovic

Do All Good Actors Look The Same? Exploring News Veracity Detection Across The U.S. and The U.K.: Benjamin Horne, Maurício Gruppi and Sibel Adali

Enhanced Offensive Language Detection Through Data Augmentation: Ruibo Liu, Guangxuan Xu and Soroush Vosoughi

Examining Racial Bias in an Online Abuse Corpus with Structural Topic Modeling: Thomas Davidson and Debasmita Bhattacharya

"To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers: Gaurav Verma, Niyati Chhaya and Vishwa Vinay

Intersectional Bias in Hate Speech and Abusive Language Datasets: Jae Yeon Kim, Carlos Ortiz, Sarah Nam, Sarah Santiago and Vivek Datta

Implicit Crowdsourcing for Identifying Abusive Behavior in Online Social Networks: Abiola Osho, Ethan Tucker and George Amariucai

Call For Participation

ICWSM 2020 is hosting the first ICWSM data challenge to bring together researchers from across disciplines to solve societally-relevant problems together as a community. This will be enabled by fostering collaboration and exchange of ideas in a structured setting. This year’s data challenge theme is Safety. To achieve this, we invite participants to work on two pertinent datasets in the areas of Misinformation and Abusive behavior in social media.

We invite papers that offer modeling and understanding of misinformation or abusive behavior based on the datasets we provide, or identify other important related dimensions to study those two datasets. We welcome submissions on topics including - but not limited to - the following: computational models, theories, insights for misinformation/abusive behavior.

The two datasets were selected after extensive deliberation to meet our four key criteria:

  1. The dataset should address societally relevant topics of interest to researchers and practitioners from multiple disciplines,

  2. The dataset should have the ability to answer multiple interesting questions,

  3. The dataset should have high quality and large quantity of rich data, and

  4. The dataset should be relatively new.

ICWSM data challenge is a full day workshop taking place on June 8th, 2020 in conjunction with ICWSM 2020, Atlanta, USA. Challenge participants will have the opportunity to present their work and discuss with other workshop participants at the workshop.

Workshop URL: https://sites.google.com/view/icwsm2020datachallenge

Submission Site: https://easychair.org/conferences/?conf=icwsm2020dc

Important Dates:

  • Data Challenge opens: Feb 21st, 2020

  • Paper Submission deadline - May 15th, 2020

  • Data Challenge notification - May 25th, 2020

  • ICWSM Data Challenge Full day Workshop - June 8th, 2020

Task Description

Task 1: The Study of Misinformation in News Articles

This dataset contains 713k articles collected between 02/2018-11/2018, which were collected directly from 194 news and media outlets including mainstream, hyper-partisan, and conspiracy sources. It also includes ground truth ratings of the sources collected from 8 different assessment sites covering multiple dimensions of veracity, including reliability, bias, transparency, adherence to journalistic standards, and consumer trust.

In this task, you are free to use the dataset to a research problem of your choice. Some examples are: what tactics are used by news producers publishing false, misleading or propaganda news? How do false news change over time? Can we build better machine learning algorithms to detect misinformation? How can models built on this dataset be generalized to other new articles? We invite papers investigating any related themes here, and descriptions of running projects and ongoing work in this space.

Dataset link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ULHLCB

Dataset Paper Reference: Nørregaard, Jeppe, Benjamin D. Horne, and Sibel Adalı. "NELA-GT-2018: A large multi-labelled news dataset for the study of misinformation in news articles." In Proceedings of the International AAAI Conference on Web and Social Media, 2019.

Task 2: Twitter Abusive Behavior identification

This dataset consists of 100k annotated tweets associated with Inappropriate speech like abusive and hateful speech, as well as Normal interactions and Spam. Online social media suffers from many kinds of abusive behavior such as hate speech, bullying, racism, and sexism. Identifying abusive behavior will help protect users from harmful content. This crowd sourced dataset is the end result of a 8-month study of abusive behavior on twitter.

In this task, you are free to apply the dataset to investigate a research problem of your choice. Some example applications of this task are: building better models to identify abusive behavior, providing insights and research directions in this area, identify other user responses to abusive content and derive insights into how other users react to abusive behavior online, identifying intervention mechanisms to detect and mitigate abusive behavior in the context of online social media. We invite papers investigating any related themes, and descriptions of running projects and ongoing work in this space.

Dataset link: https://www.dropbox.com/sh/4mapojr85a6sc76/AABYMkjLVG-HhueAgd0qM9kwa?dl=0

Dataset Paper Reference: Founta, Antigoni Maria, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. "Large scale crowdsourcing and characterization of twitter abusive behavior." In Proceedings of the International AAAI Conference on Web and Social Media, 2018.


The data challenge is open to everyone.

Submission instructions

Submission should be made via EasyChair and must follow the formatting guidelines for ICWSM-2020. All submissions must be anonymous and conform to AAAI standards for double-blind review. Both short papers (4 pages including references) and posters (2 pages including references) that adhere to the 2-column AAAI format will be considered for review.

Submission Site: https://easychair.org/conferences/?conf=icwsm2020dc

Data Challenge Chairs