Séance du 13 mai 2013

Lundi 13 mai 2013

Organisateurs: Estelle Kuhn et Mathilde Mougeot

14h00 Aurélie Fischer (Université Paris Diderot LPMA)

Titre: COBRA : une stratégie d'agrégation non linéaire

Résumé : Dans cet exposé, nous introduisons une nouvelle méthode permettant de combiner plusieurs estimateurs de la fonction de régression. Au lieu de construire une combinaison linéaire de ces estimateurs, nous les utilisons comme indicateurs de la distance entre les données à disposition et la nouvelle observation dont on souhaite estimer la réponse. D'un point de vue théorique, le résultat principal est que l'estimateur obtenu est asymptotiquement au moins aussi bon au sens L2 que le meilleur estimateur de la liste. D'un point de vue pratique, nous présentons quelques résultats obtenus sur données réelles et simulées avec cette méthode. Implémentée dans le package R COBRA (COmBined Regression Alternative), elle s'avère très performante, notamment en temps de calcul.

15h00 Nicolas Verzelen (INRA, MIA, Montpellier)

Titre: Community Detection in Random Networks

( joint work with Ery Arias-Castro)

Résumé: In recent years, the problem of detecting communities in networks has received a large amount of attention, with important applications in the social

and biological sciences, among others.The vast majority of this expansive literature focuses on developing realistic models of (random) networks, on designing

methods for extracting communities from such networks and on fitting models to network data. For example, in a social network, a node would represent

an individual and an edge between two nodes would symbolize a friendship or kinship of some sort shared by these two individuals. In the literature,

almost all the methodology has concentrated on devising graph partitioning methods, with the end goal of clustering the nodes into groups with strong

inner-connectivity and weak inter-connectivity. In this euphoria, perhaps the most basic problem of actually detecting the presence of a community in an otherwise homogeneous network has

been overlooked. From a practical standpoint, this sort of problem could arise in a dynamic setting where a network is growing over time

and monitored for clustering. From a mathematical perspective, probing the limits of detection (i.e., hypothesis testing) often offers insight into

what is possible in terms of extraction (i.e., estimation). We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph

that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization

of an Erdös-Rényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability

of connection is p1 > p0. We derive a detection lower bound for detecting such a subgraph in terms of N, n, p0, p1 and exhibit a test

that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomial-time.

As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem.

16h00 Romain Guy (INRA, MIA, Jouy-en-Josas)

Titre : Inference for epidemic models using partially and discretely observed diffusions with small diffusion coefficient

Résumé : Epidemic data are often partially observed and temporally aggregated and the tractability in large populations of such processes is difficult. In this context, diffusions with small diffusion coefficient seem appropriate to both describe precisely the epidemic dynamic and provide efficient estimators of the key epidemic parameters. Consequently, we consider here a multidimensional diffusion process (dimension p) with small diffusion coefficient. Its sample path is discretely observed on a fixed interval and only its firsts l coordinates are observed (l<p). We build a contrast-based estimation procedure, that provides consistent and asymptotically normal estimators for both the drift and diffusion parameters, in the asymptotic where both the sampling interval and the small diffusion coefficient go to zero. The performance of these estimators are assessed on simulated and real epidemic data.