Research

Probabilistic preference learning with the Mallows rank model  (with V. Vitelli, Ø. Sorensen, A. Frigessi, E. Arjas) [2018, Journal of Machine Learning Research]

Abstract: Ranking and comparing items is crucial for collecting information about preferences in many areas, from marketing to politics. The Mallows rank model is among the most successful approaches to analyse rank data, but its computational complexity has limited its use to a particular form based on Kendall distance. We develop new computationally tractable methods for Bayesian inference in Mallows models that work with any right-invariant distance. Our method performs inference on the consensus ranking of the items, also when based on partial rankings, such as top-k items or pairwise comparisons. We prove that items that none of the assessors has ranked do not influence the maximum a posteriori consensus ranking, and can therefore be ignored. When assessors are many or heterogeneous, we propose a mixture model for clustering them in homogeneous subgroups, with cluster-specific consensus rankings. We develop approximate stochastic algorithms that allow a fully probabilistic analysis, leading to coherent quantifications of uncertainties. We make probabilistic predictions on the class membership of assessors based on their ranking of just some items, and predict missing individual preferences, as needed in recommendation systems. We test our approach using several experimental and benchmark datasets.


A Bayesian Mallows approach to non-transitive pair comparison data: how human are sounds?  (with V. Vitelli, E. Arjas, N. Barrett and A. Frigessi) [2019, the Annals of Applied Statistics]

Abstract: We propose a Bayesian probabilistic method to learn preferences from non-transitive pairwise comparison data, as happens when one (or more) individual preferences in the data contradicts what is implied by the others. Lack of transitivity easily arises when the items compared are perceived as rather similar and when the pairwise comparisons are presented sequentially without allowing for consistency checks. We build an extension of  the Bayesian Mallows model (Vitelli et al., 2017) in order to handle non-transitive data, by adding a latent layer of uncertainty which captures the generation of preference misreporting. We then develop a mixture  extension of the Mallows model, able to learn individual preferences in a heterogeneous population, which is particularly important in applications.  

We are interested in learning how listeners perceive sounds as having human origins. An experiment was performed with a series of  electronically synthesized sounds, and listeners were asked to compare them in pairs. The result of our analysis is relevant for composers and sound designers whose aim is to understand how computer generated sounds can sound more human.


The Impact of 3-D Sound Spatialisation on Listeners’ Understanding of Human Agency in Acousmatic Music (with N. Barrett) [2018, Journal of New Music Research]

Abstract: Commonly we hear sound without experiencing the associated visual activity to tell us how the sounds were made or from where they originate. In the larger context of electroacoustic music, and more specifically in acousmatic music where causal visual information is removed, it is interesting to investigate how a sense of human agency may be evoked by sound alone. An intuitive starting point would be to assume that the listener identifies source and cause through a sound’s close proximity to a known archetype, yet composers often work with materials offering less obvious source clues, while expanding the spatial image over many loudspeakers. New technologies allow us to accurately control the movement of sound in space, and because we understand our own bodies in spatial terms, it is natural to ask how a sound’s spatial behaviour influences our listening understanding. This paper investigates two research questions: can listeners identify human agency in 3-D sound, and if so, what are the most salient features involved in this process? Besides being of interest to electroacoustic composers, the topic is also relevant to the audio industry utilising high density loudspeaker arrays in cinema and other spaces where the projection of high resolution spatial imagery is possible.


"Model-based learning from preference data" (with Q. Liu, I. Scheel, V. Vitelli, A. Frigessi) [2019, Annual Review of Statistics and Its Applications]

Abstract: Preference data occurs when assessors express comparative opinions about a set of items, by rating, ranking, pair comparing, liking or clicking. The purpose of preference learning is to (i) infer on the shared consensus preference of a group of users, sometimes called rank aggregation; or (ii) estimate for each user her individual ranking of the items, when the user indicates only incomplete preferences; this is an important part of recommender systems. We provide an overview of probabilistic approaches to preference learning, including the Mallows, Plackett-Luce, Bradley-Terry models and collaborative filtering, and some of their variations. We illustrate, compare and discuss the use of these models by means of an experiment in which assessors rank potatoes, and with a simulation. The purpose of this paper is not to recommend the use of one best method, but to present a palette of different possibilities for different questions and different types of data. 


"Dependence properties and Bayesian inference for asymmetric multivariate copulas" (with J., Arbel, and S. Girard) [2019, Journal of Multivariate Analysis]

Abstract:  We study a broad class of asymmetric copulas introduced by Liebscher (2008) as a combination of multiple - usually symmetric - copulas. The main thrust of the paper is to provide new theoretical properties including exact tail dependence expressions and stability properties. A subclass of Liebscher copulas obtained by combining Fréchet copulas is studied in more details. We establish further dependence properties for copulas of this class and show that they are characterized by an arbitrary number of singular components. Furthermore, we introduce a novel iterative representation for general Liebscher copulas which de facto insures uniform margins, thus relaxing a constraint of Liebscher’s original construction. Besides, we show that this iterative construction proves useful for inference by developing an Approximate Bayesian computation sampling scheme. This inferential procedure is demonstrated on simulated data. [Main codes available here]


"BayesMallows: An R Package for the Bayesian Mallows model" (with Ø. Sorensen, Q. Liu, and V. Vitelli) [2020, R journal]

Abstract: BayesMallows is an R package for analyzing the Mallows model, and its finite mixture extension in a Bayesian probabilistic framework. The Mallows is a well-known model for learning from data in the form of rankings. It is grounded on the idea that the probability density of an observed ranking decreases exponentially fast as its distance to the location parameter increases. The distances supported in BayesMallows are Footrule, Spearman, Kendall, Cayley, Hamming and Ulam, allowing to fully exploit the rich expressiveness of the Mallows model. This is made possible also thanks to the implementation of approximations of the partition function of the model. Although developed for being used in computing the posterior distribution of the model, these partition functions may be of interest in their own right. The package is capable of handling non standard data: partial rankings and pair comparisons, even in cases including non-transitive patterns. The advantage of the Bayesian paradigm in this context comes from its ability to coherently quantify posterior uncertainties of estimates of any quantity of interest, which are made fully available to the user, also in the form of tools for posterior visualization. [The BayesMallows R package is available on CRAN]


 "Understanding dependency patterns in structural and functional brain connectivity through fMRI and DTI data" (with S. Ranciati, S. D’Angelo, and A. Mira) [2019,  Contributions to Neural Data Science, Springer volume “Proceedings in Mathematics & Statistics”]

Abstract: Neuroscience and neuroimaging have been providing new challenges for statisticians and quantitative researchers in general. As datasets of increasing complexity and dimension become available, the need for statistical techniques to analyze brain related phenomena be- comes prominent. In this paper, we delve into data coming from functional Magnetic Resonance Imaging (fMRI) and Diffusion Tensor Imaging (DTI). The aim is to combine information from both sources in order to learn possible patterns of dependencies among regions of interest (ROIs) of the brain. First, we infer positions of these regions in a la- tent space, using the observed structural connectivity provided by the DTI data, to understand if physical spatial coordinates suitably reflect how ROIs are effectively interconnected. Secondly, we inspect Granger causality in the fMRI data in order to capture patterns of activations between ROIs. Then, we compare results from the analysis on these datasets, to find a link between functional and structural connectivity. Preliminary findings show that latent space positions well reflect hemisphere separation of the brain but are not perfectly connected to all the other structural partitions (that is, lobe, cortex, etc.); furthermore, activations of ROIs inferred from fMRI data are tied to observed structural connections derived from DTI scans. 


"Informative priors for the consensus ranking in the Mallows model" (with I. Antoniano-Villalobos) [2022, Adv. Publ. Bayesian Analysis]

Abstract: The aim of this work is to study the problem of prior elicitation for the consensus ranking in the Mallows model with Spearman's distance, a popular distance-based model for rankings or permutation data. Previous Bayesian inference for such a model has been limited to the use of the uniform prior over the space of permutations. We present a novel strategy to elicit informative prior beliefs on the location parameter of the model, discussing the interpretation of hyper-parameters and the implication of prior choices for the posterior analysis. [Main codes available upon request]


"The Role of Majority Status in Close Election Studies" (with M. Alpino) [2023, Political Analysis]

Abstract: Many studies exploit close elections in a regression discontinuity framework to identify partisan effects, that is, the effect of having a given party in office on some outcome. We argue that, when conducted on single-member districts, such design may identify a compound effect: the partisan effect, plus the majority status effect, that is, the effect of being represented by a member of the legislative majority. We provide a simple strategy to disentangle the two, and test it with simulations. Finally, we show the empirical relevance of this issue using real data. 

"Efficient and accurate inference for mixtures of Mallows models with Spearman distance" (Statistics and Computing, to appear) with C. Mollica, V. Astuti, L. Tardella

Abstract: The Mallows model occupies a central role in parametric modelling of ranking data to learn preferences of a population of judges. Despite the wide range of metrics for rankings that can be considered in the model specification, the choice is typically limited to the Kendall, Cayley or Hamming distances, due to the closed-form expression of the related model normalizing constant. This work instead focuses on the Mallows model with Spearman distance. An efficient and accurate EM algorithm for estimating finite mixtures of Mallows models with Spearman distance is developed, by relying on a twofold data augmentation strategy aimed at i) enlarging the applicability of Mallows models to samples drawn from heterogeneous populations; ii) dealing with partial rankings affected by diverse forms of censoring. Additionally, a novel approximation of the model normalizing constant is introduced to support the challenging model-based clustering of rankings with a large number of items. The inferential ability of the EM scheme and the effectiveness of the approximation are assessed by extensive simulation studies. Finally, we show that the application to three real-world datasets endorses our proposals also in the comparison with competing mixtures of ranking models. 


Ongoing Work 

"A mixture of experts Mallows model" (in preparation) with L. Modugno and C. Mollica

"A Hierarchical Bradley-Terry model" (with A. Frigessi