A Review of Distributed Algorithms for Principal Component Analysis

Authors

Xiaoxiao Wu (Arizona State University, USA)

Hoi-To Wai (Arizona State University, USA)

Li Lin (Arizona State University, USA)

Anna Scaglione (Arizona State University, USA)

Abstract

Principal Component Analysis (PCA) is a fundamental primitive of many data analysis, array processing and machine learning methods. In applications where extremely large arrays of data are involved, particularly in distributed data acquisition systems, distributed PCA algorithms can harness local communications and network connectivity to overcome the need of communicating and accessing the entire array locally. A key feature of distributed PCA algorithm is that they defy the conventional notion that the first step towards computing the principal vectors is to form a sample covariance. This paper is a survey of the methodologies to perform distributed PCA on different datasets, their performance and of their applications in the context of distributed data acquisition systems.