Séance du 13 février 2023
Séance organisée par Estelle Kuhn et Marie-Luce Taupin
Lieu : IHP, amphi Hermite
14.00 : Tabea Rebafka (Sorbonne Université, LPSM)
Titre : Model-based clustering of multiple networks with a hierarchical algorithm
Résumé : We consider the problem of clustering multiple networks, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of groups of networks requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the graphs. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a collection of ecological networks illustrates the interpretability of the obtained results.
15.00 : Vincent Runge (UEVE - Université Paris-Saclay, LaMME)
Titre : Fast Change-Point Detection in Multivariate Time Series Using Functional Pruning Methods
Résumé : We consider the problem of detecting multiple change-points in time series. In recent years, many efficient dynamic programming algorithms have been proposed for a wide class of time-series models. The acceleration from quadratic complexity to close-to-linear one (observed empirically in simulation studies) is made possible by using pruning methods. Most of these algorithms are based on the idea of functional pruning. The functional cost is particularly easy to update for one-parametric functions. This case corresponds (more or less) to univariate time-series.
In this talk, we focus on exploring the pruning capacity of dynamic programming in the multivariate setting. This is a challenging geometric problem of computational geometry. We focus in particular on change-point problems for which dynamic programming is described in a two-dimensional parameter space. Many simple statistical problems correspond to this setting: changes in bi-variate independent time series, change in simple linear regression, changes in mean and variance… We propose a new class of pruning rules based on the update of some points of interest in the 3D space of the functional cost. These points are solutions of optimization problems under constraints. This is an ongoing work on this type of pruning in multidimensional setting. We will evaluate different models and pruning strategies with respect to pruning efficiency and time complexity.
16.00 : Mathilde Mougeot (ENSIIE & ENS Paris-Saclay/centre Borelli)
Titre : Transfer learning in the industry
Résumé : In the industrial environment, the databases available in research and development or in production are rarely so voluminous and the question arises as to whether in this context it is reasonable to want to develop powerful tools based on artificial learning techniques. This talk presents research work around transfer learning that use knowledge from related application domains to implement efficient models with an economy of data. Several achievements in industrial collaborations will be also presented that successfully use these learning models to design machine learning for industrial small data regimes and to develop powerful decision support tools even in cases where the initial data volume is limited. The Python ADAPT library offers today a link from transfer theory to practice.