ICWSM 2023 Data challenge

Arkaitz Zubiaga

The impact of time on social media research and classification

Classification algorithms are frequently used to support the organisation, filtering and moderation of social media content, including for tasks like disinformation detection, abusive language detection and sentiment analysis. In this talk, I will discuss the impact of time on social media classification tasks. Social media content and metadata are bound to change and evolve, which challenges the performance stability of social media classification models over different time periods, i.e. a classification model trained on year Y may not be as effective and accurate on year Y+5. I will cover work assessing the impact of time on social media classification from two different angles: (i) social media data can be deleted and sometimes altered, which challenges the ability to rehydrate social media datasets with their original properties and in turn to conduct reproducible social media research, and (ii) social media content changes over time, due to changes in platforms leading to different posting conventions as well as societal evolution leading to changes in language use, among others, which ultimately poses a challenge to the development of temporally persistent social media classifiers. I will discuss these two points based on insights drawn from a series of longitudinal social media datasets.

Arkaitz Zubiaga is a senior lecturer (associate professor) at Queen Mary University of London, where he leads the Social Data Science lab. His research revolves around Social Data Science, interdisciplinary research bridging Computational Social Science and Natural Language Processing. He's particularly interested in linking online data with events in the real world, among others for tackling problematic issues on the Web and social media that can have a damaging effect on individuals or society at large, such as hate speech, misinformation, inequality, biases and other forms of online harm. He serves in the editorial boards of 7 journals, is a regular SPC member of top conferences in computational social science and NLP, and has published over 130 peer reviewed papers, including 50+ journal articles.

Website: http://www.zubiaga.org/

Francesco Barbieri

Updating and Evaluating Language Models Overtime

Advances in language modeling have led to remarkable accuracy on several NLP tasks, but most benchmarks used for evaluation are static, ignoring the practical setting under which training data from the past and present must be used for generalizing to future data. Consequently, training paradigms also ignore the time sensitivity of language and essentially treat all text as if it was written at a single point in time. Recent studies have shown that in a dynamic setting, where the test data is drawn from a different time period than the training data, the accuracy of such static models degrades as the gap between the two periods increases. The lack of diachronic specialization is especially concerning in contexts such as social media, where topics of discussion and new terms change rapidly. This talk will focus on evaluating and updating language models in the domain of Social Media.

Francesco is a Senior Research Scientist at Snap Research, and he is interested in understanding social media communications.

His current research focuses on developing NLP tools to represent and evaluate social media text, with special attention to temporal shifts.

Website: https://scholar.google.com/citations?user=B10uzI4AAAAJ&hl=en&oi=ao