Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

Mahankali, Arvind V.; Woodruff, David P.

Computer Science > Data Structures and Algorithms

arXiv:2007.10307 (cs)

[Submitted on 20 Jul 2020 (v1), last revised 16 Nov 2020 (this version, v2)]

Title:Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

Authors:Arvind V. Mahankali (1), David P. Woodruff (1) ((1) Carnegie Mellon University)

View PDF

Abstract:We study the problem of entrywise $\ell_1$ low rank approximation. We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an $\tilde{O}(k^{1/2})$-approximation for any $k$, improving upon the previous best $\tilde{O}(k)$-approximation and matching a prior lower bound for column subset selection-based $\ell_1$-low rank approximation which holds for any $\text{poly}(k)$ number of columns. We extend our results to obtain tight upper and lower bounds for column subset selection-based $\ell_p$ low rank approximation for any $1 < p < 2$, closing a long line of work on this problem.
We next give a $(1 + \varepsilon)$-approximation algorithm for entrywise $\ell_p$ low rank approximation, for $1 \leq p < 2$, that is not a column subset selection algorithm. First, we obtain an algorithm which, given a matrix $A \in \mathbb{R}^{n \times d}$, returns a rank-$k$ matrix $\hat{A}$ in $2^{\text{poly}(k/\varepsilon)} + \text{poly}(nd)$ running time such that: $$\|A - \hat{A}\|_p \leq (1 + \varepsilon) \cdot OPT + \frac{\varepsilon}{\text{poly}(k)}\|A\|_p$$ where $OPT = \min_{A_k \text{ rank }k} \|A - A_k\|_p$. Using this algorithm, in the same running time we give an algorithm which obtains error at most $(1 + \varepsilon) \cdot OPT$ and outputs a matrix of rank at most $3k$ -- these algorithms significantly improve upon all previous $(1 + \varepsilon)$- and $O(1)$-approximation algorithms for the $\ell_p$ low rank approximation problem, which required at least $n^{\text{poly}(k/\varepsilon)}$ or $n^{\text{poly}(k)}$ running time, and either required strong bit complexity assumptions (our algorithms do not) or had bicriteria rank $3k$. Finally, we show hardness results which nearly match our $2^{\text{poly}(k)} + \text{poly}(nd)$ running time and the above additive error guarantee.

Comments:	To appear in SODA 2021. Changes: (1) Fixed errors in hardness proof for constrained $\ell_1$ low rank approximation. (2) Simplified analysis of column subset selection algorithm. (3) Improved runtime of $\text{poly}(k)$-approximation algorithm with output rank $k$ from $2^{O(k\log k)} + \text{poly}(nd)$ to $\text{poly}(nd)$. Results are unchanged aside from (3)
Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2007.10307 [cs.DS]
	(or arXiv:2007.10307v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2007.10307

Submission history

From: Arvind Mahankali [view email]
[v1] Mon, 20 Jul 2020 17:50:30 UTC (60 KB)
[v2] Mon, 16 Nov 2020 07:22:43 UTC (72 KB)

Computer Science > Data Structures and Algorithms

Title:Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators