ASCAI (ANR-DFG PRCI)

Active and batch Segmentation, Clustering, and seriation: toward unified foundations in AI

The ASCAI project

Unsupervised Learning is one of the most fundamental problem of machine learning, and more generally, of artificial intelligence. In a broad sense, it amounts to learning some unobserved latent structure over data. This structure may be of interest per se, or may serve as an important stepping stone integrated in a complex data analysis pipe-line - since large amounts of unlabeled data are more common than costly labeled data. Arguably, one the cornerstones of unsupervised learning is clustering, where the aim is to recover a partition of the data into homogeneous groups. Beside vanilla clustering, unsupervised learning encompasses a large variety of related other problems such as hierarchical clustering, where the group structure is more complex and reveals both the backbone and fine-grain organization of the data, segmentation where the shape of the clusters is constrained by side information, or ranking or seriation problems where where no actual cluster structure exists, but where there is some implicit ordering between the data. All these problems have already found countless applications and interest in these methods is even strengthening due to the amount of available unlabelled data. We can for instance cite crowdsourcing - where individuals answer to a subset of questions, and where, depending on the context, one might want to e.g. cluster them depending on their field of expertise, rank them depending on their performances, or seriate them depending on their affinities. Such problems are extremely relevant for recommender systems - where individuals are users, and questions are items - and for social network analyses.

The analysis of unsupervised learning procedures has a long history that takes its roots both in the computer science and in mathematical communities. In response to recent bridges between these two communities, groundbreaking advances have been made in the theoretical foundations of vanilla clustering. We believe that these recent advances hold the key for deep impacts on the broader field of unsupervised learning because of the pervasive nature of clustering. In this proposal, we first aim at propagating these recent ground-breaking advances in vanilla clustering to problems where the latent structure is either more complex or more constrained. We will consider problems of increasing latent structure complexity - starting from hierarchical clustering and heading toward ranking, seriation, and segmentation - and propose new algorithms that will build on each other, focusing on the interfaces between these problems. As a result, we expect to provide new methods that are valid under weaker assumptions in comparison to what is usually done - e.g. parametric assumptions -  while being also able to adapt to the unknown intrinsic difficulty of the problem.

Moreover, many modern unsupervised learning applications are essentially of an online nature - and sometimes decisions have to be made sequentially on top of that. For instance, consider a recommender systems that sequentially recommends items to users. In this context where sequential, active recommendations are made, it is important to leverage the latent structure underlying the individuals. While both the fields of unsupervised learning, and sequential, active learning, are thriving, research at the crossroad has been conducted mostly separately by each community - leading to procedures that can be improved. A second aim of this proposal will then be to bring together the fields of unsupervised learning and active learning, in order to propose new algorithms that are more efficient at leveraging sequentially the unknown latent structure. We will consider the same unsupervised learning problems as in the batch learning side of the proposal. We will focus on developing algorithms that fully take advantage of new advances in clustering, and of our own future work in batch learning.

ASCAI is a ANR-DFG PRCI project, that spans over four research institutes (see below). The speakers are Nicolas Verzelen (French side) and Alexandra Carpentier (German side).

Events

ASCAI Kick-Off Workshop in Montpellier: March 1-2 2022 in INRAE Montpellier: Institut Agro campus, 2 place Pierre Viala - we will meet in the Building 11 Room 204. See the campus map). Here is the program.

2nd ASCAI Workshop in Munich: February 28 - March 2, 2023 in Technical University of Munich: School of Computation, Information and Technology (Garching campus). Here is the program (including location information).

3rd ASCAI Workshop in Potsdam: February 20 -21, 2024 in University of Potsdam: More details here (including program and location information).

Four places

INRAE Montpellier

Nicolas Verzelen (speaker)

Julie Josse

Etienne Roquain

Isabelle Sanchez

Emmanuel Pilliat  

Universität Potsdam

Alexandra Carpentier

Pierre Menard 


Université Paris-Saclay

Elisabeth Gassiat

Christophe Giraud

Gilles Blanchard

Zacharie Naulet

Guillermo Durand


Technische Universität München

Debarghya Ghoshdastidar

Satyaki Mukherjee

Leena Vankadara


!!!There will be postdoc job openings for this project!!!

Please contact us if you are interested- see email addresses in our webpages.

Acknowledgements: The project ASCAI is funded by the ANR (Agence Nationale de la Recherche) and the DFG (Deutsche Forschungsgemeinschaft), through the PRCI program AAPG 2021, and runs from Feb. 2022 to  to Jan 2025.