Séance du 13 octobre 2014

Séance organisée par Thanh Mai Pham Ngoc et Judith Rousseau

Lieu : IHP, Amphithéâtre Darboux.

14h00: Yohann de Castro (Orsay)

Titre : Géométrie L1 et processus localement linéaire : une preuve directe de la propriété du noyau utilisant la méthode de Stephen O. Rice

Résumé : Minimiser une norme L1 est devenu un outil important en statistique en grandes dimensions tant par ces performances pratiques que par ces garanties théoriques. Ces dernières reposent toutes sur une CNS très simple : la propriété du noyau. Bien que cette propriété soit au coeur même de la minimisation L1, très peu de preuves directes de cette propriété existent. Dans cette exposé, nous ferons une connexion entre la propriété du noyau et le maximum d'un processus localement linéaire. Sous l'hypothèse gaussienne, nous utiliserons ce lien pour prouver la propriété du noyau à l'aide de la méthode de Stephen O. Rice.

15h00: Sofia Olhede (University College London)

Titre : Graph limits and network data

Résumé : Network data correspond to observations of relationships between variables. By their very nature these objects are strongly combinatorial, and not naturally linked to the notion of smoothing, or tools from classical nonparametric function estimation. The past decade has seen a flurry of results in combinatorics, developing the theory of ``graph limits''. This associates a function with the generation of a network, and naturally treats the notion of an intrinsic unlabelled network invariance to permutation, by modelling the network observations as exchangeable random variables. An infinite network can be sparsified by percolation, and thus can be made to possess realistic levels of network connectivity, comparable to that of real data. I will discuss how a graph limit model can be estimated consistently from data, and how tools from nonparametric function regression can be placed in this context

16h00: Stéphane Robin (INRA)

Titre : Identifiability in Hidden Markov Models with Applications to Genomics

Résumé : Like most models involving hidden variables, hidden Markov models raise identifiability issues. A common way to circumvent these issues is to restrict the shape of the emission distribution. Indeed, an independent mixture model with fully non-parametric emission distribution is not identifiable. We will present a general result showing that hidden Markov models are identifiable under general assumptions. This results allows us to consider a broad class of emission distributions such as mixture or non-parametric ones.

Hidden Markov models are very popular in genomics as many genomics data are collected along the genome, which can be seen as a time line. In the second part of the talk, we will introduce the problem of the joint detection of copy number variations (CNV) and loss of heterozygosity (LOH). We will show that this analysis can be tackled using an HMM with mixtures as emission distributions. We will briefly discuss the balance between the flexibility offered by the previous identifiability and the behavior of the EM algorithm, often used for HMM inference.

Gassiat, E., Cleynen, A. Robin (2013). Finite state space non parametric Hidden Markov Models are in general identifiable, arXiv:1306.4657