A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Gang, Arpita; Bajwa, Waheed U.

doi:10.1016/j.sigpro.2021.108408

Computer Science > Machine Learning

arXiv:2101.01300 (cs)

[Submitted on 5 Jan 2021 (v1), last revised 28 Nov 2021 (this version, v4)]

Title:A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Authors:Arpita Gang, Waheed U. Bajwa

View PDF

Abstract:Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. Furthermore, the ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. This paper focuses on the dual objective of PCA, namely, dimensionality reduction and decorrelation of features, but in a distributed setting. This requires estimating the eigenvectors of the data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors, when data is distributed across a network of machines. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of the data covariance matrix when data is distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also provided to demonstrate the efficacy of the proposed solution.

Comments:	34 pages; final version of journal paper accepted for publication in a special issue of EURASIP J. Signal Processing
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Machine Learning (stat.ML)
Cite as:	arXiv:2101.01300 [cs.LG]
	(or arXiv:2101.01300v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2101.01300
Related DOI:	https://doi.org/10.1016/j.sigpro.2021.108408

Submission history

From: Waheed Bajwa [view email]
[v1] Tue, 5 Jan 2021 00:51:14 UTC (1,540 KB)
[v2] Fri, 28 May 2021 20:04:10 UTC (1,527 KB)
[v3] Fri, 17 Sep 2021 02:22:03 UTC (2,272 KB)
[v4] Sun, 28 Nov 2021 18:15:57 UTC (2,272 KB)

Computer Science > Machine Learning

Title:A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators