Séance du 9 mars 2015
Séance organisée par Erwan Le Pennec et Joseph Salmon.
Lieu : IHP, Amphithéâtre Darboux.
14h00: Balazs Kégl. (LAL, CNRS et Université Paris-Sud)
Titre : Learning to discover: signal/background separation and the Higgs boson challenge
Résumé : Classification algorithms have been routinely used since the 90s in high-energy physics to separate signal and background in particle detectors. The goal of the classifier is to maximize the sensitivity of a counting test in a selection region. It is similar in spirit but formally different from the classical objectives of minimizing misclassification error or maximizing AUC. We start the talk by motivating the problem on an ongoing example of detecting the Higgs boson in the tau-tau decay channel in the ATLAS detector of the LHC. We formalize the problem, then go on by describing the usual analysis chain, and explain some of the choices physicists make when designing a classifier for optimizing the discovery significance. We derive different surrogates that capture this goal and show some simple techniques to optimize them, raising some questions both on the statistical and on the algorithmic side. We end the talk by presenting a data challenge we organized to draw the attention of the machine learning and statistics communities to this important application and to improve the techniques used to optimize the discovery significance.
15h00: Aurélien Bellet (Télécom ParisTech, TSI, LTCI CNRS 5141)
Titre : The Frank-Wolfe Algorithm: Recent Results and Applications to High-Dimensional Similarity Learning and Distributed Optimization
Résumé : The topic of this talk is the Frank-Wolfe (FW) algorithm, a greedy procedure for minimizing a convex and differentiable function over a compact convex set. FW finds its roots in the 1950's but has recently regained a lot of interest in machine learning and related communities. In the first part of the talk, I will introduce the FW algorithm and review some recent results that motivate its appeal in the context of large-scale learning problems. In the second part, I will describe two applications of FW in my own work: (i) learning a similarity/distance function for sparse high-dimensional data, and (ii) learning sparse combinations of elements that are distributed over a network.
16h00: Florence d’Alché-Buc, (Télécom ParisTech, TSI, LTCI CNRS 5141)
Titre : Learning of nonparametric dynamical models with operator-valued kernels
Résumé : Recent years have witnessed a surge of interest for dynamical systems especially in scientific fields such as molecular biology or climate science where data are now collected at a large scale. In parallel, the abundance of sensor data together with the emergence of the internet of things provide a bunch of time-series to analyze and exploit. Among
various problems of interest in dynamical systems, network inference or the search for (causal) relationships between state variables has retained our attention. Motivated by this learning task, we develop a general framework for nonparametric dynamical modeling based on matrix-valued kernels and propose a generic procedure to infer the
underlying using the Jacobian of the estimated model. Our framework can be applied to vector autoregressive (VAR) modeling as well as ordinary differential equations. Matrix-valued kernels as an instance of operator-valued kernels allow to build vector-valued functions. As scalar-valued kernels, operator-valued kernels may be used to define Reproducing Kernel Hilbert spaces which enjoy representer theorems giving rise to a strong basis for penalized regression. Within this framework, we propose new kernels adapted to the target task
(network inference) and derive a learning algorithm based on proximal gradient methods when sparsity or structured sparsity is imposed. Interestingly, kernels can also be learned with similar methods in order to discover the network. Once shown on vector autoregressive models, this approach is shortly presented in the context of nonparametric ordinary differential
equations estimation using a gradient matching approach. Numerical results on various data illustrate the very good performance of the approach.