09/27/2013

Post date: Sep 28, 2013 9:42:53 PM

Speaker: Fang Han, Department of Biostatistics, JHU

ENAR student award paper presentation

Topic: A Family of Optimal Robust Sparse PCA Methods in Big Data

Abstract

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including handling large scale, very complex, and noisy datasets. These challenges are distinguished and require new computational and statistical paradigm. In this talk we will focus on one prevailing data analysis tool, the (sparse) principal component analysis, and introduce a family of optimal as well as robust methods recently proposed by us and targeting at analyzing Big Data. The main contributions are in three folds: (i) Model-wise, we exploit new semiparametric techniques for modeling the complex datasets, enabling heavy tails and tail dependence; (ii) Methodology-wise, we propose new scalable nonparametric methods, exploiting marginal rank, multivariate rank, and quantile-based statistics; (iii) Theoretically, we construct a series of exponential concentration results, showing that, although the proposed methods can be used in much larger distribution families than the Gaussian and are robust with high breakdown points, the proposed methods achieve the parametric or nearly parametric rate of convergence in parameter estimation.