ICWSM 2023 Data Challenge:

 Temporal social data

Colocated with ICWSM 2023

June 5th (Limassol, Cyprus)

Location: Hybrid ( Limassol, Cyprus and online)

Important Dates

Data challenge announcement: February 15, 2023

Paper submission deadline:  1 April , 17 April, 2023 (AoE)

Paper notifications: May 1, 2023 (AoE)

Camera-Ready : May 6, 2023 (AoE)

ICWSM 2023 Data Challenge Workshop day: June 5, 2023

In the 4th ICWSM 2023 data challenge, we invite papers that model and contribute to understanding the temporal and social dynamics of the social tasks in the provided datasets, or identify other important related dimensions to study the temporal effect on those datasets (Call for Papers)

Program Schedule

5th of June

All times are in  Limassol, Cyprus (GMT+3)

Location: Limassol Cyprus, St. Raphael Resort and Marina, Room: Phoenix

Online streaming shared with the registered attendee, 

contact us for any further information: data.challenge@icwsm.org

Google Calendar 

ICWSM 2023 Data Challenge Program

Call for Participation 

We welcome submissions on various topics that address the temporal shift of data. The data challenge includes three tracks, 1) Time-aware models and social trends, 2 ) Temporal dataset, and 3) Non- archival track.

Track 1: Time-Aware Models and social trends ( archival)

We encourage submissions that characterize the incorporation of temporal data in different social-based tasks, both at data and model level. Tasks may include, but are not limited to, modeling temporal characteristics of hate speech, using different versions of edited Wikipedia articles, enhanced tone detection, the effectiveness of temporal data in detecting the veracity of rumors, and stance detection dynamics. Submitted papers may focus on evaluating models' performance when considering the time variable, studying the evolution of specific phenomena, examining the distribution shift between the training data and live update data, or focusing on specific concept shifts.

 We especially welcome contributions that examine the transitions in community or content based on social data (i.e., temporal networks or content) to analyze some social phenomena. For example, studying context or static embedding to analyze content across time and communities where the main goal could be to study changes in descriptions of genders and ethnic groups, representation of people using contextualized semantics, analyzing factors that cause the formation and persistence of trends, the dynamics of sentiment and topics, or the manifestation of different social phenomena across communities.


For this data challenge, we ask authors to use datasets of their choice, selected from those released as ICWSM Dataset Papers between 2009 and 2022. You can find a list of these papers/ datasets categorized by topic in the table below. The work should be in compliance with The FAIR Data Principles, as submitted work can use one of these datasets as baseline and collect the temporal data using open resources tools such as archive.org, or any other archival repositories. 

Participants are also welcome to use their own or other open datasets, such as pushshift Reddit (2005-2019), Wikipedia historical archive, and Diachronic Language Models from Twitter (2019-2022). There are other possible alternatives to construct a temporal dataset for your topic of interest using one of the available features provided by various platforms' APIs. In general, the datasets need to comply with the platform API regulations and follow the general principle that guarantees the transparency of datasets, such as datasheets for datasets.

Access to the set of proposed datasets for this data challenge from [here].

ICWSM 2023 data_Challenge

Track 2: Temporal Dataset (archival)

Participants are also welcome to submit their own temporal dataset and will be part of the full proceedings. The work should be in compliance with The FAIR Data Principles

Datasets and metadata must be published using a dataset sharing service (e.g. Zenodo , datorium , dataverse , or any other dataset sharing services that index your dataset and metadata and increase the re-findability of the data) that provides a DOI for the dataset, which should be included in the dataset paper submission. Ethical considerations must be discussed and datasets need to comply with the platform API regulations and follow the general principle that guarantees the transparency of datasets, such as datasheets for datasets. 

Track 3: Non-Archival 

The non-archival track seeks recently accepted/published work as well as work-in-progress. It does not need to be anonymized and will not go through the review process. The submission should clearly indicate the original venue and will be accepted. Non-archival submissions won't be included in the proceeding, rather it going to be given a (presentation or poster slot) based on the organizers' evaluation and the alignment of the work with the data challenge's main theme.

In case the work has been published/or is undersubmission, then submit the pdf file version of the work (i.e. no need to resubmit in the data challenge format). Follow the submission website and make sure you select a non-archival track to upload your pdf file.


The data challenge is open to everyone.

Submission Instruction.

Submission should be made via EasyChair and must follow the formatting guidelines for ICWSM-2023, all archival submissions will follow we will follow double-blind review process.

All submissions must be anonymous and conform to AAAI standards for double-blind review. Both short papers (4 pages including references) and posters (2 pages including references) that adhere to the 2-column AAAI format will be considered for review. All papers must be submitted as high-resolution PDF files, formatted in AAAI two-column, camera-ready style, for US Letter (8.5" x 11") paper, using Type 1 or TrueType fonts (available templates: AAAI 2023 Author Kit on Overleaf or AAAI 2023 Author Kit.zip [Word | LaTeX])

Submission Website: https://easychair.org/conferences/?conf=icwsm-dc2023 

The top ranked papers presented to the Data Challenge will be awarded Distinguished Paper Awards and will be included in the ICWSM Workshop Proceedings, published openly online (TBA).

 Contact us: data.challenge@icwsm.org

keynote Speakers

Arkaitz Zubiaga

Arkaitz Zubiaga 

The impact of time on social media research and classification

Classification algorithms are frequently used to support the organisation, filtering and moderation of social media content, including for tasks like disinformation detection, abusive language detection and sentiment analysis. In this talk, I will discuss the impact of time on social media classification tasks. Social media content and metadata are bound to change and evolve, which challenges the performance stability of social media classification models over different time periods, i.e. a classification model trained on year Y may not be as effective and accurate on year Y+5. I will cover work assessing the impact of time on social media classification from two different angles: (i) social media data can be deleted and sometimes altered, which challenges the ability to rehydrate social media datasets with their original properties and in turn to conduct reproducible social media research, and (ii) social media content changes over time, due to changes in platforms leading to different posting conventions as well as societal evolution leading to changes in language use, among others, which ultimately poses a challenge to the development of temporally persistent social media classifiers. I will discuss these two points based on insights drawn from a series of longitudinal social media datasets.

Arkaitz Zubiaga is a senior lecturer (associate professor) at Queen Mary University of London, where he leads the Social Data Science lab. His research revolves around Social Data Science, interdisciplinary research bridging Computational Social Science and Natural Language Processing. He's particularly interested in linking online data with events in the real world, among others for tackling problematic issues on the Web and social media that can have a damaging effect on individuals or society at large, such as hate speech, misinformation, inequality, biases and other forms of online harm. He serves in the editorial boards of 7 journals, is a regular SPC member of top conferences in computational social science and NLP, and has published over 130 peer reviewed papers, including 50+ journal articles.

Website: http://www.zubiaga.org/

Francesco Barbieri

Updating and Evaluating Language Models Overtime

Advances in language modeling have led to remarkable accuracy on several NLP tasks, but most benchmarks used for evaluation are static, ignoring the practical setting under which training data from the past and present must be used for generalizing to future data. Consequently, training paradigms also ignore the time sensitivity of language and essentially treat all text as if it was written at a single point in time. Recent studies have shown that in a dynamic setting, where the test data is drawn from a different time period than the training data, the accuracy of such static models degrades as the gap between the two periods increases. The lack of diachronic specialization is especially concerning in contexts such as social media, where topics of discussion and new terms change rapidly. This talk will focus on evaluating and updating language models in the domain of Social Media.

Francesco is a Senior Research Scientist at Snap Research, and he is interested in understanding social media communications. 

His current research focuses on developing NLP tools to represent and evaluate social media text, with special attention to temporal shifts.

The Venue

This workshop is colocated with ICWSM 2023