Integrity in Social Networks and Media
Integrity 2024, the fifth edition of the Integrity Workshop, is an event colocated within the WSDM conference happening on March 8th 2024 in Mérida, Yucatán (MX) (GMT-6). Integrity 2024 aims to repeat the success achieved in the previous editions, Integrity 2020, Integrity 2021, Integrity 2022, and Integrity 2023.
Workshop Description
In the past decade, social networks and social media sites, such as Facebook and Twitter, have become the default channels of communication and information. The popularity of these online portals has exposed a collection of integrity issues: cases where the content produced and exchanged compromises the quality, operation, and eventually the integrity of the platform. Examples include misinformation, low quality and abusive content and behaviors, and polarization and opinion extremism. There is an urgent need to detect and mitigate the effects of these integrity issues, in a timely, efficient, and unbiased manner.
This workshop aims to bring together top researchers and practitioners from academia and industry, to engage in a discussion about algorithmic and system aspects of integrity challenges. The WSDM Conference, that combines Data Mining and Machine Learning with research on Web and Information Retrieval offers the ideal forum for such a discussion, and we expect the workshop to be of interest to everyone in the community. The topic of the workshop is also interdisciplinary, as it overlaps with psychology, sociology, and economics, while also raising legal and ethical questions, so we expect it to attract a broader audience.
Workshop topics:
Emerging Threat Vectors: Emerging integrity risks in social media, especially in the context of generative AI, sensitive populations, and elections
Detection & Mitigation of Generative AI Powered Risks: Integrity defense against hyper-realistic misinformation, sophisticated evasion techniques, large scale attacks, leveraging behavioral and content anomaly detection, red teaming
Foundational Models for Integrity: Generative AI for content moderation, (open sourcing) oracle models for integrity, training datasets
Fairness and Responsible AI: Mitigating biases in sensitive attributes such as gender, race, sexual orientation, and political affiliation, overcoming biases in training corpora, preventing discrimination in moderation systems
Scaling to Low Resource Languages: Creative approaches for scaling LLMs to low resource languages, performance guarantees, data needs
Evaluation Best Practices: Best practices and practical approaches for evaluating risks and measuring progress on integrity issues
Addressing Over-enforcement, Subjectivity and Regulatory Needs: Balancing between objective content standards, subjective perceptions of harms and regional regulatory needs
Multimodal Harmful Content Understanding: Challenges, opportunities and successful techniques for multimodal content understanding, dealing with scale for video content - efficient architectures, hardware acceleration, practical heuristics
Human In The Loop for Integrity: Quality, efficiency and challenges for human in the loop processes - content labeling, malicious behavioral patterns, tooling, LLM assisted labeling
Virality and Creator-Audience Dynamics: Detection and mitigation of subscriber-base-growth via low quality, integrity, or genAI produced content, audience expectation and impact on original creators
Agenda
All times below are in Mérida, MX timezone (GMT-6), on March 8th 2024.
8:30 am -- AI against disinformation and why it is not enough , by Ioannis Kompatsiaris (CERTH-ITI, Greece)
9:15 am -- Unveiling Biases in Televised Debates: A Multimedia and Social Media Analysis, by Kiran Garimella (MIT, USA)
10:30 am -- Exploiting GenAI Tools for Disinformation: Ethnographic Insights from Creators in Brazil and the US, by Beth Goldberg (Google, USA)
11:15 am -- How Authenticity uplevels Integrity for online platforms, by Oscar Rodriguez (LinkedIn, USA)
1:30 pm -- Identifying Violating Facebook Groups, by Behrouz Behmardi and Mike Plumpe (Meta, USA)
2:10 pm -- Leveraging a Self-Supervised Deep Learning Approach for Detecting Fake News Across Various Domains, by Angelica Liguori (ICAR-CNR, Italy)
2:35 pm -- Echoes of Fear: Unraveling the Presence of Fear Speech in Social Media Platforms, by Punyajoi Saha (IIT Kharagpur, India)
3:30 pm -- The Field Guide to Non-Engagement Signals, by Leif Sigerson (Pinterest, USA)
Speaker and Talk details
Ioannis Kompatsiaris
Title: AI against disinformation and why it is not enough
Abstract: In recent years, the proliferation of generative AI technology has revolutionized the landscape of media content creation, enabling even the average user to fabricate convincing videos, images, text, and audio. However, this advancement has also exacerbated the issue of online disinformation, which is spiraling out of control due to the vast reach of social media platforms, sophisticated campaigns, and the proliferation of deepfakes. After an introduction including the significant impact on key societal values such as Democracy, Public Health and Peace, the talk focuses on techniques to detect visual disinformation, manipulated photos/video, deepfakes and visuals out of context. While AI technologies offer promising avenues for addressing disinformation, it is clear that they alone are not sufficient to address this complex and multifaceted problem. Limitations of current AI approaches will be discussed, along with broader human behaviour, societal and financial challenges that must be addressed to effectively combat online disinformation. A holistic approach that encompasses technological, regulatory, and educational interventions, developing critical thought will be finally presented.
Bio: Dr. Ioannis (Yiannis) Kompatsiaris is the Director of CERTH-ITI and the Head of Multimedia Knowledge and Social Media Analytics Laboratory (MKLab). His research interests include ΑΙ/ML for Multimedia, Semantics, Multimodal and Sensors Data Analysis, Human Computer Interfaces, e- Health, Cultural, Media/Journalism and Security applications. He is the co-author of 222 papers in refereed journals, 69 book chapters, 10 patents and 641 papers in international conferences. Dr. Kompatsiaris has participated (as PI and Project Coordinator) in numerous National and European research programs and direct collaborations with the industry. Currently, he is co-ordinating the “AI4Media: Artificial Intelligence for the Society and the Media Industry” NoE. He has been the co-organizer of various international conferences and workshops including the ACM International Conference on Multimedia Retrieval (ACM ICMR) in 2023 and has served as a regular reviewer, associate and guest editor for a number of journals and conferences. He has been an associate editor of IEEE Transactions on Image Processing (2018 – 2023). He is a member of the National Ethics and Technoethics Committee, of the Scientific Advisory Board of the CHIST-ERA funding programme and has been an elected member of the IEEE Image, Video and Multidimensional Signal Processing - Technical Committee (IVMSP - TC). He is a Senior Member of IEEE and ACM. He is the co-founder of two spin-off companies: Infalia focusing on data intensive web services and applications and CDXi, creating AI and Multimodal Data Fusion solutions for Green and Digital Transformation.
Kiran Garimella
Title: Unveiling Biases in Televised Debates: A Multimedia and Social Media Analysis
Abstract: This talk presents a comprehensive analysis of television shows and their interplay with social media, focusing on a prime time news debate shows in India. We introduce a novel toolkit employing advanced computer vision and speech-to-text techniques for large-scale multimedia analysis of YouTube videos of these debates, which are pivotal in Indian media but criticized for compromised journalistic integrity. Our methodology transcends traditional text analysis, capturing the essence of debates through text, audio, and video frames, providing concrete metrics to assess bias and incivility. Simultaneously, we explore the show's relationship with social media, particularly Twitter. By examining the hashtags used to promote the show and corresponding social media data, we assess the audience composition and the biases present in the show. Our findings indicate a significant bias towards the ruling Bharatiya Janata Party (BJP), with a reciprocal flow of information between the TV show and social media, suggesting a dynamic interplay between traditional and online platforms. This dual analysis reveals alarming levels of bias and incivility in televised debates and their amplification through social media. Our work offers profound implications for public discourse, democratic debate, and the evolving relationship between traditional and digital media platforms. The toolkit and findings provide valuable resources for future research in multimedia analysis and the influence of social media on television content.
Bio: Kiran Garimella is the Michael Hammer postdoctoral researcher at the Institute for Data, Systems, and Society at MIT. Before joining MIT, he was a postdoc at EPFL, Switzerland. His research focuses on using digital data for social good, including areas like polarization, misinformation and human migration. His work on studying and mitigating polarization on social media won the best student paper awards at WSDM 2017 and WebScience 2017. Kiran received his PhD at Aalto University, Finland, and Masters & Bachelors from IIIT Hyderabad, India. Prior to his PhD, he worked as a Research Engineer at Yahoo Research, Barcelona, and Qatar Computing Research Institute, Doha. More info: https://users.ics.aalto.fi/kiran/
Beth Goldberg
Title: Exploiting GenAI Tools for Disinformation: Ethnographic Insights from Creators in Brazil and the US
Abstract: Generative AI (GenAI) has sparked concerns about detecting and discerning AI-generated content from human-generated content. However, existing literature often assumes a binary divide between ‘expert' creators and 'ordinary' consumers. Based on longitudinal ethnographic research between 2022-2023, we instead find that GenAI supports 'ordinary' people to create content to meet their individual needs - emotional, social, and increasingly, financial needs. We argue for shifting analysis from the public as consumers of AI content to producers who use GenAI creatively, often without a detailed understanding of its underlying technology. Time spent with these creators revealed three key findings: First, GenAI is primarily used for content creation by disinformation producers, rather than information discovery. Second, GenAI lowers the barrier to entry for content creation, enticing new creators and significantly increasing existing creators’ output across modalities. Third, a spreading ‘influencer millionaire’ narrative motivates ‘ordinary’ individuals to use GenAI to build a brand for financial gain. We analyze how these emergent uses of GenAI yield new or accelerated harms, and the implications they have for platform policies on GenAI disclosure and labeling.
Bio: Beth Goldberg is the Head of Research & Development at Jigsaw, a Google unit that explores threats to open societies. She leads an interdisciplinary team of researchers who investigate online threats from disinformation to violent extremism, and then test novel mitigation strategies. Beth works closely with academics, civil society, and technologists to develop evidence-based interventions that reduce online harms in products at Google and beyond. Beth is also a Senior Fellow and Lecturer at Yale’s Jackson School of Global Affairs. She holds graduate degrees from Yale University and a BS from Georgetown University's School of Foreign Service.
Oscar Rodriguez
Title: How Authenticity uplevels Integrity for online platforms
Abstract: In the digital age, authenticity is not just a value; it's the bedrock of trust and credibility. This presentation delves into the transformative role of verification in bolstering authenticity on LinkedIn, a leading professional networking platform. We'll explore the multifaceted challenges verification seeks to address, from combating misinformation to enhancing user trust. By examining LinkedIn's journey, we underscore the tangible benefits that have emerged, including improved platform integrity and user engagement. The discussion extends beyond LinkedIn, advocating for the adoption of authenticity verification as a pivotal strategy in digital identity management across industries. Join us to uncover why verification is not merely a feature but a fundamental shift towards more genuine and secure online interactions.
Bio: Oscar Rodriguez is a seasoned leader in the internet safety space, with more than ten years dedicated to safeguarding users on some of the world’s leading online platforms, including Google, Facebook, and Twitter. As a VP of Product at LinkedIn, he leads the Trust team, focusing on critical initiatives such as identity verification and platform moderation to maintain LinkedIn trusted and professional. Throughout his career, he has led teams combating spam, scams, misinformation, election integrity, and security threats like malware and phishing.
Mike Plumpe
Abstract: This paper presents a technique for identifying violating Facebook Groups using group embedding similarity. The authors discuss the challenges of detecting problematic groups, including high modality and multiple entities with different modalities, as well as noisy labels and multiple perceptions of policy by readers. They propose a solution using collaborative filtering and content-based filtering techniques to identify similar groups based on user behavior and attributes. The authors use group embeddings to represent groups in a high-dimensional space and identify similar groups based on patterns of features or characteristics. They then narrow down the list of similar groups by looking at which groups have been co-visited or co-engaged by the same users. The proposed method improves recall by 40% while maintaining precision above 85%, outperforming the current classifier in production.
Bio: Mike Plumpe is an Engineering Director leading the Facebook Groups engineering team. He graduated from MIT with a Masters Degree in Electrical Engineering focusing on Speech and Speaker Recognition. He worked for many years in speech synthesis and speech recognition at Microsoft prior to moving to Facebook/Meta. At Meta, Mike has led teams across Facebook Feed Experience, Instagram Feed & Stories Experience and Ranking, and Facebook Groups.
Behrouz Behmardi
Title: New Technique for Identifying Violating Facebook Groups
Abstract: This paper presents a technique for identifying violating Facebook Groups using group embedding similarity. The authors discuss the challenges of detecting problematic groups, including high modality and multiple entities with different modalities, as well as noisy labels and multiple perceptions of policy by readers. They propose a solution using collaborative filtering and content-based filtering techniques to identify similar groups based on user behavior and attributes. The authors use group embeddings to represent groups in a high-dimensional space and identify similar groups based on patterns of features or characteristics. They then narrow down the list of similar groups by looking at which groups have been co-visited or co-engaged by the same users. The proposed method improves recall by 40% while maintaining precision above 85%, outperforming the current classifier in production.
Bio: Behrouz Behmardi is an Engineering manager in Facebook group recommendation team. He has graduated from Oregon State Universtiy with Ph.D. in ML and over ten years of experience in research, development, and management in ML infra and ML product. He has hands-on experience building and developing large-scale recommendation and ranking systems across various verticals, including Ads, Trust and Safety, and Marketplace.
Angelica Liguori
Title: Leveraging a Self-Supervised Deep Learning Approach for Detecting Fake News Across Various Domains
Abstract: With the widespread use of web-based platforms and social media, the sharing of news has become a global phenomenon. However, the information propagated through these channels is often unverified and subject to individual interpretation, making them potential vehicles for the dissemination of misleading or false news. This phenomenon poses a significant challenge in the identification of deceptive information, especially considering the diverse range of topics that fake news may encompass.
Traditional detection models, tailored for specific domains, fre- quently perform poorly when used in different contexts. To address this challenge, our work introduces a novel deep learning-based ar- chitecture designed to enhance cross-domain detection capabilities by generating high-level features. Our approach aims to mitigate the impact of fake news across diverse domains.
To evaluate the effectiveness of our proposed solution, we con- ducted initial experiments on two benchmark datasets. Preliminary results demonstrate promising outcomes, showing the potential of our approach in addressing the complex task of identifying and combating fake news across various domains.
Bio: Angelica Liguori received the Ph.D. degree in Information and Communication Technologies (ICT) from the University of Calabria, Italy, in 2024. Her research interests include machine and deep learning. She is particularly interested in developing solutions in the area of Anomaly Detection/Generation in data sets.
Punyajoy Saha
Title: Echoes of Fear: Unraveling the Presence of Fear Speech in Social Media Platforms
Abstract: Social media has become an integral part of our everyday life. It al- lows the rapid dissemination of information and opinions. However, sometimes people use such media to express a range of phenom- ena that often overlap and intersect, and include a variety of types of speech that cause different harms. Such speech is collectively known as harmful speech. In this talk, we will focus on an impor- tant form of harmful speech - fear speech. We will try to explore the prevalence of fear speech across two different platforms - Gab and Whatsapp and two different countries -US and India. Our analysis further compares fear speech with the well-known concept of hate speech. Our findings necessitates the creation of better moderation policies which can handle such complex phenomena as well.
Bio: Punyajoy Saha is a PMRF (Prime Minister's Research Fellow) research scholar in the Department of Computer Science and Engineering at IIT Kharagpur, West Bengal. Currently, he is doing research with Prof. Animesh Mukherjee as his supervisor. He is also a member of the research group CNeRG. His current research interests lie in the intersection of computational social science and natural language processing. He is working in developing better human-in-the-loop mitigation strategies for hate speech and other forms of harmful speech like fear speech. He has published several papers in conferences like AACL, AAAI, ICWSM, IJCAI, Web conference, Hypertext, WebSci, CSCW, NeurIPS, PNAS, COLING-LREC, EACL and EMNLP. He was part of three tutorials in ICWSM 2021, AAAI 2022 and WSDM 2023 and has given several invited talks. More about him can be found in this website https://punyajoy.github.io/.
Leif Sigerson
Title: The Field Guide to Non-Engagement Signals
Abstract: We know that optimizing purely for user engagement can promote low-quality, harmful content. However, there are significant technical barriers to incorporating "Non-Engagement" signals (e.g., surveys, human evaluation) in content ranking. To address these barriers, we published a Field Guide to Non-Engagement signals, based on a daylong workshop with industry professionals from 8 social media platforms. In this talk, I'll introduce the Field Guide and dive deep on two applications of the guide: optimizing for user wellbeing, and using GenAI to scale content quality signals.
Bio: Leif Sigerson, PhD, is a senior data scientist at Pinterest, where he works on LLM's, surveys and data labeling. Prior to Pinterest, he did his PhD in social psychology, focusing on wellbeing and social connection on Twitter.
[Closed] Call for Papers
The Integrity workshop is accepting proposals for technical manuscripts and talk proposals to be presented during the event, and to be included in the Integrity Workshop proceedings. Relevant dates:
Paper submission: 22 Jan 2024
Paper notification: 1 Feb 2024
Workshop date: 8 March 2024
Link for the submission and further Call-For-Papers instructions: https://easychair.org/cfp/Integrity24
Agenda
All times below are in Mérida, Yucatán (MX) timezone (GMT-6), on March 8th 2024.
[Agenda is being finalized and will be shared shortly]
Organizers
Lluis Garcia-Pueyo, Meta
Symeon Papadopoulos, ITI-CERTH, Greece
Prathyusha Senthil Kumar, Meta
Aristides Gionis, KTH Royal Institute of Technology, Sweden
Panayiotis Tsaparas, University of Ioannina, Greece
Vasilis Verroios, Meta
Giuseppe Manco, ICAR-CNR, Italy
Anton Andryeyev, Meta
Stefano Cresci, IIT-CNR, Italy
Timos Sellis, Archimedes / Athena Research Center, Greece
Anthony McCosker, Swinburne University