Séance du 9 février 2009

Lundi 09 février 2009

Organisateurs: Vincent Rivoirard et Liliane Bel

14h00 Tristan Mary-Huard (Agroparistech)

Variable clustering and variable aggregation for high dimension data

Abstract: We first consider the problem where the goal is to detect groups of highly correlated variables in a set of ordered variables $V_1,...,V_p$. A clustering algorithm is proposed, where the variables are grouped according to a compression criterion based on latent variable modeling. Two algorithms are proposed to find the optimal compression and applied to identification of chromosomal domains of gene expression. We then move to the problem of variable aggregation in supervised aggregation. When dealing with large dimension data, building an interpretable classifier is difficult, due to the presence of many irrelevant variables, and to a high level of redundancy between the variables. Variable selection is the classical strategy to discard both redundant and/or uninformative variables. While removing uninformative variables is both theoretically and practically motivated, removing redundant variables makes the selection process unstable. As a solution, an aggregation step can be performed before the selection step. In the aggregation step, variables that share common information are classified into a same cluster and summed up by a summarized variable, and in the selection step informative summarized variables are selected. Compared with the first problem of variable clustering, the difficulty is now to propose an aggregation method that is consistent with the selection method to build an efficient classifier. We propose a general framework to develop aggregation methods, where the aggregation specifically depends on the chosen classification algorithm to ensure the coherence between the selection and aggregation steps. The strategy is illustrated with the CART and $k$NN algorithms, and we show on real data that the aggregation step leads to more stable and interpretable results.

15h00 Gabor Lugosi (Pompeu Fabra University)

Combinatorial problems in randomized sequential prediction

(joint work with Nicolò Cesa-Bianchi)

Abstract: We discuss sequential prediction problems in which a decision maker repeatedly chooses an action and suffers a loss related to the chosen action. The goal of the decision maker is to accumulate a total loss that is not much larger than the best fixed action that could have been chosen if all losses had been known in advance. This problem and its variants have been thoroughly studied in statistics, game theory, learning theory, and information theory and various well-performing randomized strategies exist. We consider cases in which the set of actions is very large and has some combinatorial structure. We study multi-armed bandit problems, that is, when the decision maker only observes the loss of the selected action. We present a general strategy that allows one to achieve near-optimal performance in many cases. Computing randomized strategies with small loss often raises nontrivial algorithmic problems. We present various examples.

16h00 Ana Karina Fermin (Université Paris 10 Nanterre)

Inegalités oracles pour les methodes iteratives des problèmes inverses

Résumé: Dans ce travail nous utilisons les méthodes de régularisation itératives pour la résolution de problèmes inverses mal posés, que l'on rencontre dans de nombreux domaines : économie, imagerie médicale, traitement du signal, géophysique. La principale difficulté dans l'application d'une méthode de régularisation itérative est la détermination du point d'arrêt de l'algorithme. L'approche choisie dans ce travail s'appuie sur des inégalités de concentration et des outils de la théorie de sélection de modèles.