Research

Information vs noise in complex systems

A complex system is a large collection of units interacting with each other locally that give rise — without explicit coordination — to emergent macroscopic behaviour that may be hard to grasp from the behaviour of individual units. A parsimonious description of a complex system can therefore be invaluable to connect local interactions with emergent properties. This can often be achieved with methods that seek to filter out noise and identify interactions that are genuinely informative, such as random matrix theory and the statistical validation of networks.

Random Matrix Theory

As surprising as it may sounds, matrices with random entries can turn out to be useful in a vast range of applications. The idea dates back to John Wishart and Eugene Wigner, who independently realized (in very different contexts) that the spectral properties of large matrices can be well reproduced by those of suitably chosen ensembles of random matrices. The differences between the average spectral properties of an ensemble of random matrices and those of a target (large) matrix of interest can help to identify the statistically relevant properties of the latter.

The core of my PhD thesis was devoted to developing the theory of products of rectangular random matrices with Gaussian entries (PRE 2010, APPB 2011), which are used extensively in wireless telecommunication applications (MIMO channels), and I showed in a case study on financial data how this can be used to filter out the noise in the non-symmetric correlation matrices between variables belonging to two distinct statistical systems (EPJB 2012). I contributed to a number of applied Random Matrix Theory projects devoted to the study of financial correlation matrices (PRE 2011, JSTAT 2012, AEL 2014), and to a few theoretical papers devoted to quantum cavities (APPB 2011), isotropic matrices (PRE 2013), and sums of random matrices (JSTAT 2015).

I have published a short introductory book to Random Matrix Theory (Springer, 2018) with Pierpaolo Vivo and Marcel Novaes. A free version of the book is available here.

Statistically validated networks

Complex networks can be huge. Several real-world networks (e.g., the Internet) can be made of several millions or billions of nodes and links. This poses a number of computational challenges, as analysing — or even visualising — large networks can be a daunting task.

The statistical validation of networks is a set of techniques that seek to identify the links in a weighted network of interactions that are statistically significant against a null hypothesis of partially random interactions, and filter out the remaining ones. This idea has been pioneered by Rosario Mantegna’s group in Palermo (Italy).

My main contribution to this stream of literature is a methodology — named the Pólya filter after the Hungarian mathematician who came up with the combinatorial model it is based on — that allows to tune the intensity of the filtering applied to a network (Nature Comm, 2019). The Pólya filter can be either applied as a moderate filter, leading to very rich “backbones” (i.e., the set of links validated by the method), or as a greedy filter, leading to very sparse backbones containing only a few anomalous links.

I have applied the Pólya filter and other statistical validation methods in a variety of contexts, e.g., to detect anomalous patterns of citations between journals (which helps to identify so called "citation cartels") (Sci Rep, 2021), and to detect interlinkages between topics in "knowledge networks" inferred from news articles (ESWA, 2021).

Computational social science

The increased availability of data detailing all aspects of our social interactions has led to a dramatic change in the Social Sciences, making it possible to quantitatively test and validate social theories that were previously developed only with qualitative methods. Broadly speaking, the data-driven modelling of social systems is referred to as Computational Social Science. Within this broad stream of research, my research interests focus mainly on the "Science of Science" (i.e., the quantitative analysis of scientific research and publications), on trust and reputation in the online domain, and on opinion dynamics.

Science of Science

Why do certain scientific careers take off? What are the determinants of scientific impact? Can we predict the scientific success of a researcher and their work? These questions have been asked for decades by researchers in Bibliometrics and, more recently, in the so called "Science of Science" (the data-driven investigation of scientific research and publications). The answers to such questions keep being refined over the years — thanks to the ever increasing availability of data detailing all aspects of academic life — and their relevance only keeps growing, due to the competitiveness of current academia.

My main focus in this area is that of relating the success of scientific careers and publications to the social network of relationships that underpin the production of science (e.g., collaborations, coauthorship, etc.). In a recent study, my colleagues and I have demonstrated that a researcher’s chances of becoming a top-cited scientist in their field heavily depend on their early career network of collaborators. In fact, we found that coauthoring even a single paper with a top scientist during the first three career years is a very strong predictor of future success, even twenty years later (Nature Comm, 2019).

Network analysis can also be leveraged to investigate citation patterns. In particular, it can help us to measure the response of the academic community to the ever increasing emphasis being placed on bibliometric indicators (such as the h-index) to quantify scientific impact. For instance, we have observed a steady increase of reciprocity in author-author citation networks over several decades (EPJ Data Science, 2019), and we have detected anomalous citation patterns — often resulting in "citation cartels" aimed at boosting impact factors — in journal-journal citation networks (Sci Rep, 2021).

Online trust & reputation

The Digital Economy has increasingly transformed into a "platform society" where individuals exchange knowledge, goods, and resources on a peer-to-peer (P2P) basis, with no central authority. For example, nowadays many people look for accommodation on Airbnb rather than booking a hotel, use ride hailing apps such as Uber and Lyft instead of calling a cab, or choose to outsource small jobs and tasks through freelance marketplaces such as TaskRabbit. How is it possible that so many people trust complete strangers to the point of staying in their home or getting in their car? The key is reputation. Most platforms rely on peer-review mechanisms that allow users to develop a digital reputation score. For example, after an Airbnb stay hosts and guests both have the opportunity to review each other, leaving feedback that everyone else on the platform can read.

Understanding and modelling the determinants of online reputation was the main theme of my EPSRC Fellowship, and one of the core subjects of my ongoing research. I am investigating this from a number of perspectives. First, we have shown how the P2P paradigm of bidirectional ratings (as in the Airbnb example of both hosts and guests rating each other) allows users to easily inflate their reputation scores through the exchange of ratings, a phenomenon we refer to as the "reciprocity bias" (Sci Rep, 2017). Second, I am collaborating with a group of psychologists at UCL to run a series of online experiments aimed at understanding the information consumption patterns of platform users. Our findings have shown that users are rarely capable of incorporating the wealth of information typically available on a profile (e.g., star ratings, number of reviews, text reviews, ID verification, etc.) into their decision-making, and mostly rely on a very small set of cues (PLoS ONE, 2018) to accurately discern the diagnostic information in a peer’s profile (Frontiers, 2021).

I am also conducting a number of data-driven studies aimed at detecting and measuring other types of biases in online platforms, most importantly whether all participants receive the same opportunities, or, instead, some users are discriminated based, e.g., on their race or gender (EPJ Data Science, 2019).

Opinion dynamics

The last few years have been marked by the rise of "fake news" online misinformation, leading to an age of increased opinion polarisation even on well established facts (e.g., the effectiveness of vaccines). Understanding those phenomena falls under the purview of opinion dynamics — an interdisciplinary area of research at the interface between the social sciences, computer science, and physics. The goal of opinion dynamics is to develop quantitative models that explain how individuals and societies form opinions based on available and how that is shared across social networks (both online and offline).

My work in opinion dynamics combines agent-based modelling and data-driven approaches. My recent work in the field has focused on determining the conditions under which a individuals in a social network hold accurate beliefs about true vs false statements (Sci Rep, 2020), how the topology of social networks may influence the diversity of opinions in a population (RSOS, 2021), and how networks of influence between media outlets can explain their agenda-setting (ANS, 2020).

Economic complexity

Aggregating the interactions of large numbers of "microscopic" economic agents into large-scale macroeconomic laws represents a major challenge. Mainstream Economics often circumvents this issue by resorting to ad hoc aggregation schemes, which however fail to take into account the heterogeneity of real-world economic agents. Computationally oriented research favours instead the development of agent-based models that simulate real-world complexity, but these are often beyond the reach of analytical description and effectively become "black boxes".

The Statistical Physics of disordered systems provides a compromise solution between these two approaches. Indeed, in many cases the large-scale properties of an interacting system are well approximated, when the system's size is large, by the statistical properties of a system with suitably chosen random interactions. Following this approach we have derived theoretical results which demonstrate that the increasing complexity of modern supply chains can lead to the collapse of industrial production in economies where firms compete to provide consumers with technologically sophisticated goods (JSTAT 2017). In a previous paper, we have shown that arbitrage opportunities (i.e., risk-free profit opportunities) might emerge in markets whose participants employ exceedingly different models to price financial instruments (JSTAT 2013).