Integrity in Social Networks and Media

Integrity 2024, the fifth edition of the Integrity Workshop, is an event colocated within the WSDM conference happening on March 8th 2024 in Mérida, Yucatán (MX) (GMT-6).  Integrity 2024 aims to repeat the success achieved in the previous editions, Integrity 2020, Integrity 2021, Integrity 2022, and Integrity 2023.

Workshop Description

In the past decade, social networks and social media sites, such as Facebook and Twitter, have become the default channels of communication and information. The popularity of these online portals has exposed a collection of integrity issues: cases where the content produced and exchanged compromises the quality, operation, and eventually the integrity of the platform. Examples include misinformation, low quality and abusive content and behaviors, and polarization and opinion extremism. There is an urgent need to detect and mitigate the effects of these integrity issues, in a timely, efficient, and unbiased manner.

This workshop aims to bring together top researchers and practitioners from academia and industry, to engage in a discussion about algorithmic and system aspects of integrity challenges. The WSDM Conference, that combines Data Mining and Machine Learning with research on Web and Information Retrieval offers the ideal forum for such a discussion, and we expect the workshop to be of interest to everyone in the community. The topic of the workshop is also interdisciplinary, as it overlaps with psychology, sociology, and economics, while also raising legal and ethical questions, so we expect it to attract a broader audience.

Workshop topics:

Agenda

All times below are in Mérida, MX timezone (GMT-6), on March 8th 2024.


Speaker and Talk details

Ioannis Kompatsiaris

Bio: Dr. Ioannis (Yiannis) Kompatsiaris is the Director of the Information Technologies Institute, Research Director at CERTH-ITI and the Head of Multimedia Knowledge and Social Media Analytics Laboratory. His research interests include ΑΙ/Machine Learning for multimedia analysis, Semantics (multimedia ontologies and reasoning), Social Media and Big Data Analytics, Multimodal and Sensors Data Analysis, Human Computer Interfaces, e- Health, Arts and Cultural, Media/Journalism, Environmental and Security applications. He is the co-author of 178 papers in refereed journals, 63 book chapters, 8 patents and 560 papers in international conferences. Since 2001, Dr. Kompatsiaris has participated in 88 National and European research programs, in 31 of which he has been the Project Coordinator. He has also been the PI in 15 contracts from the industry. He has been the co-chair of various international conferences and workshops including the 13th IEEE Image, Video, and Multidimensional Signal Processing (IVMSP 2018) Workshop and has served as a regular reviewer, associate and guest editor for a number of journals and conferences currently being an associate editor of IEEE Transactions on Image Processing. He is a member of the National Ethics and Technoethics Committee, the Scientific Advisory Board of the CHIST-ERA funding programme and an elected member of the IEEE Image, Video and Multidimensional Signal Processing - Technical Committee (IVMSP - TC).

Kiran Garimella

Title: Unveiling Biases in Televised Debates: A Multimedia and Social Media Analysis

Abstract: This talk presents a comprehensive analysis of television shows and their interplay with social media, focusing on a prime time news debate shows in India. We introduce a novel toolkit employing advanced computer vision and speech-to-text techniques for large-scale multimedia analysis of YouTube videos of these debates, which are pivotal in Indian media but criticized for compromised journalistic integrity. Our methodology transcends traditional text analysis, capturing the essence of debates through text, audio, and video frames, providing concrete metrics to assess bias and incivility. Simultaneously, we explore the show's relationship with social media, particularly Twitter. By examining the hashtags used to promote the show and corresponding social media data, we assess the audience composition and the biases present in the show. Our findings indicate a significant bias towards the ruling Bharatiya Janata Party (BJP), with a reciprocal flow of information between the TV show and social media, suggesting a dynamic interplay between traditional and online platforms. This dual analysis reveals alarming levels of bias and incivility in televised debates and their amplification through social media. Our work offers profound implications for public discourse, democratic debate, and the evolving relationship between traditional and digital media platforms. The toolkit and findings provide valuable resources for future research in multimedia analysis and the influence of social media on television content.

Bio: Kiran Garimella is the Michael Hammer postdoctoral researcher at the Institute for Data, Systems, and Society at MIT. Before joining MIT, he was a postdoc at  EPFL, Switzerland. His research focuses on using digital data for social good, including areas like polarization, misinformation and human migration. His work on studying and mitigating polarization on social media won the best student paper awards at WSDM 2017 and WebScience 2017. Kiran received his PhD at Aalto University, Finland, and Masters & Bachelors from IIIT Hyderabad, India. Prior to his PhD, he worked as a Research Engineer at Yahoo Research, Barcelona, and Qatar Computing Research Institute, Doha. More info: https://users.ics.aalto.fi/kiran/

Beth Goldberg

Title: Exploiting GenAI Tools for Disinformation: Ethnographic Insights from Creators in Brazil and the US

Abstract: Generative AI (GenAI) has sparked concerns about detecting and discerning AI-generated content from human-generated content. However, existing literature often assumes a binary divide between ‘expert' creators and 'ordinary' consumers. Based on longitudinal ethnographic research between 2022-2023, we instead find that GenAI supports 'ordinary' people to create content to meet their individual needs - emotional, social, and increasingly, financial needs. We argue for shifting analysis from the public as consumers of AI content to producers who use GenAI creatively, often without a detailed understanding of its underlying technology. Time spent with these creators revealed three key findings: First, GenAI is primarily used for content creation by disinformation producers, rather than information discovery. Second, GenAI lowers the barrier to entry for content creation, enticing new creators and significantly increasing existing creators’ output across modalities. Third, a spreading ‘influencer millionaire’ narrative motivates ‘ordinary’ individuals to use GenAI to build a brand for financial gain. We analyze how these emergent uses of GenAI yield new or accelerated harms, and the implications they have for platform policies on GenAI disclosure and labeling.

Bio: Beth Goldberg is the Head of Research & Development at Jigsaw, a Google unit that explores threats to open societies. She leads an interdisciplinary team of researchers who investigate online threats from disinformation to violent extremism, and then test novel mitigation strategies. Beth works closely with academics, civil society, and technologists to develop evidence-based interventions that reduce online harms in products at Google and beyond. Beth is also a Senior Fellow and Lecturer at Yale’s Jackson School of Global Affairs.  She holds graduate degrees from Yale University and a BS from Georgetown University's School of Foreign Service.

Oscar Rodriguez

Title: How Authenticity uplevels Integrity for online platforms

Abstract: In the digital age, authenticity is not just a value; it's the bedrock of trust and credibility. This presentation delves into the transformative role of verification in bolstering authenticity on LinkedIn, a leading professional networking platform. We'll explore the multifaceted challenges verification seeks to address, from combating misinformation to enhancing user trust. By examining LinkedIn's journey, we underscore the tangible benefits that have emerged, including improved platform integrity and user engagement. The discussion extends beyond LinkedIn, advocating for the adoption of authenticity verification as a pivotal strategy in digital identity management across industries. Join us to uncover why verification is not merely a feature but a fundamental shift towards more genuine and secure online interactions.

Bio: Oscar Rodriguez is a seasoned leader in the internet safety space, with more than ten years dedicated to safeguarding users on some of the world’s leading online platforms, including Google, Facebook, and Twitter. As a VP of Product at LinkedIn, he leads the Trust team, focusing on critical initiatives such as identity verification and platform moderation to maintain LinkedIn trusted and professional. Throughout his career, he has led teams combating spam, scams, misinformation, election integrity, and security threats like malware and phishing.

Mike Plumpe


Abstract: This paper presents a technique for identifying violating Facebook Groups using group embedding similarity. The authors discuss the challenges of detecting problematic groups, including high modality and multiple entities with different modalities, as well as noisy labels and multiple perceptions of policy by readers. They propose a solution using collaborative filtering and content-based filtering techniques to identify similar groups based on user behavior and attributes. The authors use group embeddings to represent groups in a high-dimensional space and identify similar groups based on patterns of features or characteristics. They then narrow down the list of similar groups by looking at which groups have been co-visited or co-engaged by the same users. The proposed method improves recall by 40% while maintaining precision above 85%, outperforming the current classifier in production.

Bio: Mike Plumpe is an Engineering Director leading the Facebook Groups engineering team.  He graduated from MIT with a Masters Degree in Electrical Engineering focusing on Speech and Speaker Recognition.  He worked for many years in speech synthesis and speech recognition at Microsoft prior to moving to Facebook/Meta.  At Meta, Mike has led teams across Facebook Feed Experience, Instagram Feed & Stories Experience and Ranking, and Facebook Groups.

Behrouz Behmardi

Title: New Technique for Identifying Violating Facebook Groups

Abstract: This paper presents a technique for identifying violating Facebook Groups using group embedding similarity. The authors discuss the challenges of detecting problematic groups, including high modality and multiple entities with different modalities, as well as noisy labels and multiple perceptions of policy by readers. They propose a solution using collaborative filtering and content-based filtering techniques to identify similar groups based on user behavior and attributes. The authors use group embeddings to represent groups in a high-dimensional space and identify similar groups based on patterns of features or characteristics. They then narrow down the list of similar groups by looking at which groups have been co-visited or co-engaged by the same users. The proposed method improves recall by 40% while maintaining precision above 85%, outperforming the current classifier in production.

Bio: Behrouz Behmardi is an Engineering manager in Facebook group recommendation team. He has graduated from Oregon State Universtiy with Ph.D. in ML and over ten years of experience in research, development, and management in ML infra and ML product. He has hands-on experience building and developing large-scale recommendation and ranking systems across various verticals, including Ads, Trust and Safety, and Marketplace.

Angelica Liguori

Title: Leveraging a Self-Supervised Deep Learning Approach for Detecting Fake News Across Various Domains

Abstract: With the widespread use of web-based platforms and social media, the sharing of news has become a global phenomenon. However, the information propagated through these channels is often unverified and subject to individual interpretation, making them potential vehicles for the dissemination of misleading or false news. This phenomenon poses a significant challenge in the identification of deceptive information, especially considering the diverse range of topics that fake news may encompass.

Traditional detection models, tailored for specific domains, fre- quently perform poorly when used in different contexts. To address this challenge, our work introduces a novel deep learning-based ar- chitecture designed to enhance cross-domain detection capabilities by generating high-level features. Our approach aims to mitigate the impact of fake news across diverse domains.

To evaluate the effectiveness of our proposed solution, we con- ducted initial experiments on two benchmark datasets. Preliminary results demonstrate promising outcomes, showing the potential of our approach in addressing the complex task of identifying and combating fake news across various domains.

Bio: Angelica Liguori received the Ph.D. degree in Information and Communication Technologies (ICT) from the University of Calabria, Italy, in 2024. Her research interests include machine and deep learning. She is particularly interested in developing solutions in the area of Anomaly Detection/Generation in data sets.

Punyajoy Saha

Title: Echoes of Fear: Unraveling the Presence of Fear Speech in Social Media Platforms

Abstract: Social media has become an integral part of our everyday life. It al- lows the rapid dissemination of information and opinions. However, sometimes people use such media to express a range of phenom- ena that often overlap and intersect, and include a variety of types of speech that cause different harms. Such speech is collectively known as harmful speech. In this talk, we will focus on an impor- tant form of harmful speech - fear speech. We will try to explore the prevalence of fear speech across two different platforms - Gab and Whatsapp and two different countries -US and India. Our analysis further compares fear speech with the well-known concept of hate speech. Our findings necessitates the creation of better moderation policies which can handle such complex phenomena as well.

Bio: Punyajoy Saha is a PMRF (Prime Minister's Research Fellow) research scholar in the Department of Computer Science and Engineering at IIT Kharagpur, West Bengal. Currently, he is doing research with Prof. Animesh Mukherjee as his supervisor. He is also a member of the research group CNeRG. His current research interests lie in the intersection of computational social science and natural language processing. He is working in developing better human-in-the-loop mitigation strategies for hate speech and other forms of harmful speech like fear speech. He has published several papers in conferences like AACL, AAAI, ICWSM, IJCAI, Web conference, Hypertext, WebSci, CSCW, NeurIPS, PNAS, COLING-LREC, EACL and EMNLP. He was part of  three tutorials in ICWSM 2021, AAAI 2022 and WSDM 2023 and has given several invited talks. More about him can be found in this website https://punyajoy.github.io/.

Leif Sigerson

Title: The Field Guide to Non-Engagement Signals

Abstract: We know that optimizing purely for user engagement can promote low-quality, harmful content. However, there are significant technical barriers to incorporating "Non-Engagement" signals (e.g., surveys, human evaluation) in content ranking. To address these barriers, we published a Field Guide to Non-Engagement signals, based on a daylong workshop with industry professionals from 8 social media platforms. In this talk, I'll introduce the Field Guide and dive deep on two applications of the guide: optimizing for user wellbeing, and using GenAI to scale content quality signals.

Bio: Leif Sigerson, PhD, is a senior data scientist at Pinterest, where he works on LLM's, surveys and data labeling. Prior to Pinterest, he did his PhD in social psychology, focusing on wellbeing and social connection on Twitter.

Jonathan Stray

Title: Can We Design Recommenders To Reduce Polarization?

Abstract: It's now commonplace to say that ranking algorithms used by major social media and news platforms are tearing us apart, but what does this mean, what is the evidence, and what could we do differently? We'll look at theories how social media ranking algorithms can affect conflict and polarization, and dig up the data which might clarify what is actually happening. Then we'll ask, what would it mean to do better, and how could we design algorithms to get there?

Bio: Jonathan Stray is a Senior Scientist at the Center for Human Compatible AI at UC Berkeley, where he works on the design of AI-driven media with a particular interest in well-being and conflict. Previously, he taught the dual masters degree in computer science and journalism at Columbia University, worked as an editor at the Associated Press, and built document mining software for investigative journalism.

[Closed] Call for Papers

The Integrity workshop is accepting proposals for technical manuscripts and talk proposals to be presented during the event, and to be included in the Integrity Workshop proceedings. Relevant dates:

Link for the submission and further Call-For-Papers instructions: https://easychair.org/cfp/Integrity24

Agenda

All times below are in Mérida, Yucatán (MX) timezone (GMT-6), on March 8th 2024.

[Agenda is being finalized and will be shared shortly]

Organizers

Supporters