Social media is emerging as the predominant communication platform. Globally, the number of active users is estimated to be 4.62 billion (58.4% of the total population). As a consequence, the need for automatic processing, understanding and monitoring of communication patterns has significantly increased. Detection of misinformation, polarization and malicious communities is also a crucial step to identify hoaxes and monitor online content. To address these issues, the research community has proposed social media analysis algorithms, which, so far, have been primarily based on graph and network methods.
In this context, online content is becoming increasingly available in mixed modalities (text, images, videos, etc). The convergence of Computer Vision (CV) and Natural Language Processing (NLP) has made it possible to empower textual and image understanding, and, more recently, to link textual and visual information to enable multi-modal understanding and retrieval. However, these recent efforts have been seldom applied to social media analysis, missing the benefit of developing innovative approaches to jointly understand text and other modalities, and provide an effective understanding of communication patterns. Indeed, there is a lack of a unified social media analysis methodology, which provides a seamless integration between network analysis and multimodal processing of visual and textual data.
The MUSMA project will make a radical change by investigating and developing innovative analysis models that can jointly process and understand textual and visual content simultaneously. MUSMA will enable:
processing, understanding, selection and monitoring of online content, to extract relevant information on a set of specific topics or subjects;
misinformation and manipulated content detection, through the development of AI techniques specifically designed to identify the network-wide phenomena (e.g. emerging communities and viral content) and credibility of online content;
analysis of the main drivers of information consumption, the dynamics of information and misinformation flow and ranking information sources according to their topical influence.
At the core of the project lies a new unifying synergy between Network Science, NLP and CV, using supervised neural networks (going beyond convolutive autoencoders, Transformer-based NNs, Capsules and graph-based networks) and symbolic representations.
This 2-year project brings together the research experiences and expertise of three internationally-recognized research teams: the AILAB of the University of Udine, the Data and Complexity for Society Lab at Roma Sapienza, and AImageLab at UNIMORE, encompassing NLP, vision and network science. The project proposes foundational research with direct practical and industrial exploitation. We foresee an enormous potential benefit for the society and as well as in paving the way to new research directions in several areas of AI.
MUSMA lies at the intersection of three central fields of AI, namely Network Science, Natural Language Processing and Computer Vision. The project aims at providing a unified approach for integrating multimodal understanding in social network analysis tasks, with the final goal of creating novel tools for addressing social media challenges like topic modeling, polarization and misinformation detection. The project brings together algorithms and innovations for empowering network and community analysis with the automatic understanding of data shared on social media, which is increasingly available in mixed visual (images, videos) and textual (sentences, tags) types. We address this challenge by putting forward three overarching goals: 1) the extraction and automatic understanding of relevant content from social media, through multi-modal integration and joint processing of text, images and videos; 2) the usage of such information for detecting polarization patterns in user behavior as well as misinformation and content manipulation; 3) the integration of data extracted from multiple modalities and their analysis over time, to enable monitoring and understanding of the information flow.
Objective 1) will deal with the extraction of relevant information from social media and joint understanding of multiple modalities: the first objective is to design algorithms capable of extracting and fully comprehending multi-modal content shared on social media, by jointly taking into account text, images and videos. We will develop specific algorithms for extracting relevant information from social media data. This includes the detection of concepts, topics and entities from text, the extraction of semantic information and text from images and videos. We aim to go beyond the current state of the art (SOTA), addressing challenges which are specific to the social network context.
Objective 2) will employ and test the developed algorithms in two social network analysis tasks which have a fundamental importance in nowadays life, i.e. to detect the polarization patterns in online communities and the manipulation of content and spread of misinformation. In particular, misinformative content will be recognized thanks to the multi-modal integration outcomes and signals coming from a selection of trusted sources. Advanced quantitative metrics will be applied to explore how individuals consume news and the drivers behind (mis)information spreading. This will allow us to rank information sources according to their topical influence and assess their informative action, with an outstanding social impact.
Finally, Objective 3) deals with the integration of data and features extracted from different modalities, to link information extracted from text and images or videos through shared embedding spaces or by translating one modality into another (e.g. by translating images to tags or sentences). The construction of such multimodal information, and their analysis over time, will allow the monitoring and the understanding of information flows coming from social media, together with the analysis of network-level phenomena such as the emergence of novel communities and echo chambers, the spread of viral content across different platforms. Noticeably, such phenomena have never been studied by analyzing shared content, but rather focusing on network patterns alone.