Search | arXiv e-print repository

Implicit Regularization of Infinitesimally-perturbed Gradient Descent Toward Low-dimensional Solutions

Authors: Jianhao Ma, Geyu Liang, Salar Fattahi

Abstract: Implicit regularization refers to the phenomenon where local search algorithms converge to low-dimensional solutions, even when such structures are neither explicitly specified nor encoded in the optimization problem. While widely observed, this phenomenon remains theoretically underexplored, particularly in modern over-parameterized problems. In this paper, we study the conditions that enable imp… ▽ More Implicit regularization refers to the phenomenon where local search algorithms converge to low-dimensional solutions, even when such structures are neither explicitly specified nor encoded in the optimization problem. While widely observed, this phenomenon remains theoretically underexplored, particularly in modern over-parameterized problems. In this paper, we study the conditions that enable implicit regularization by investigating when gradient-based methods converge to second-order stationary points (SOSPs) within an implicit low-dimensional region of a smooth, possibly nonconvex function. We show that successful implicit regularization hinges on two key conditions: $(i)$ the ability to efficiently escape strict saddle points, while $(ii)$ maintaining proximity to the implicit region. Existing analyses enabling the convergence of gradient descent (GD) to SOSPs often rely on injecting large perturbations to escape strict saddle points. However, this comes at the cost of deviating from the implicit region. The central premise of this paper is that it is possible to achieve the best of both worlds: efficiently escaping strict saddle points using infinitesimal perturbations, while controlling deviation from the implicit region via a small deviation rate. We show that infinitesimally perturbed gradient descent (IPGD), which can be interpreted as GD with inherent ``round-off errors'', can provably satisfy both conditions. We apply our framework to the problem of over-parameterized matrix sensing, where we establish formal guarantees for the implicit regularization behavior of IPGD. We further demonstrate through extensive experiments that these insights extend to a broader class of learning problems. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2504.09708 [pdf, ps, other]

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Authors: Gavin Zhang, Salar Fattahi, Richard Y. Zhang

Abstract: In practical instances of nonconvex matrix factorization, the rank of the true solution $r^{\star}$ is often unknown, so the rank $r$ of the model can be overspecified as $r>r^{\star}$. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$. We propose a… ▽ More In practical instances of nonconvex matrix factorization, the rank of the true solution $r^{\star}$ is often unknown, so the rank $r$ of the model can be overspecified as $r>r^{\star}$. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$. We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by $\ell_{2}$ regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime. △ Less

Submitted 13 April, 2025; originally announced April 2025.

Comments: NeurIPS 2021. See also https://proceedings.neurips.cc/paper/2021/hash/2f2cd5c753d3cee48e47dbb5bbaed331-Abstract.html

arXiv:2504.09648 [pdf, other]

RANSAC Revisited: An Improved Algorithm for Robust Subspace Recovery under Adversarial and Noisy Corruptions

Authors: Guixian Chen, Jianhao Ma, Salar Fattahi

Abstract: In this paper, we study the problem of robust subspace recovery (RSR) in the presence of both strong adversarial corruptions and Gaussian noise. Specifically, given a limited number of noisy samples -- some of which are tampered by an adaptive and strong adversary -- we aim to recover a low-dimensional subspace that approximately contains a significant fraction of the uncorrupted samples, up to an… ▽ More In this paper, we study the problem of robust subspace recovery (RSR) in the presence of both strong adversarial corruptions and Gaussian noise. Specifically, given a limited number of noisy samples -- some of which are tampered by an adaptive and strong adversary -- we aim to recover a low-dimensional subspace that approximately contains a significant fraction of the uncorrupted samples, up to an error that scales with the Gaussian noise. Existing approaches to this problem often suffer from high computational costs or rely on restrictive distributional assumptions, limiting their applicability in truly adversarial settings. To address these challenges, we revisit the classical random sample consensus (RANSAC) algorithm, which offers strong robustness to adversarial outliers, but sacrifices efficiency and robustness against Gaussian noise and model misspecification in the process. We propose a two-stage algorithm, RANSAC+, that precisely pinpoints and remedies the failure modes of standard RANSAC. Our method is provably robust to both Gaussian and adversarial corruptions, achieves near-optimal sample complexity without requiring prior knowledge of the subspace dimension, and is more efficient than existing RANSAC-type methods. △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2502.06775 [pdf, other]

Enhancing Performance of Explainable AI Models with Constrained Concept Refinement

Authors: Geyu Liang, Senne Michielssen, Salar Fattahi

Abstract: The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging interpretable-by-design methods, which aim to redesign ML algorithms for trustworthy interpretability but often sacrifice accuracy in the process. In this paper, we address this gap by investigating the impact of deviations in concept represe… ▽ More The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging interpretable-by-design methods, which aim to redesign ML algorithms for trustworthy interpretability but often sacrifice accuracy in the process. In this paper, we address this gap by investigating the impact of deviations in concept representations-an essential component of interpretable models-on prediction performance and propose a novel framework to mitigate these effects. The framework builds on the principle of optimizing concept embeddings under constraints that preserve interpretability. Using a generative model as a test-bed, we rigorously prove that our algorithm achieves zero loss while progressively enhancing the interpretability of the resulting model. Additionally, we evaluate the practical performance of our proposed framework in generating explainable predictions for image classification tasks across various benchmarks. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost. △ Less

Submitted 27 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

arXiv:2404.08178 [pdf, other]

A Parametric Approach for Solving Convex Quadratic Optimization with Indicators Over Trees

Authors: Aaresh Bhathena, Salar Fattahi, Andrés Gómez, Simge Küçükyavuz

Abstract: This paper investigates convex quadratic optimization problems involving $n$ indicator variables, each associated with a continuous variable, particularly focusing on scenarios where the matrix $Q$ defining the quadratic term is positive definite and its sparsity pattern corresponds to the adjacency matrix of a tree graph. We introduce a graph-based dynamic programming algorithm that solves this p… ▽ More This paper investigates convex quadratic optimization problems involving $n$ indicator variables, each associated with a continuous variable, particularly focusing on scenarios where the matrix $Q$ defining the quadratic term is positive definite and its sparsity pattern corresponds to the adjacency matrix of a tree graph. We introduce a graph-based dynamic programming algorithm that solves this problem in time and memory complexity of $\mathcal{O}(n^2)$. Central to our algorithm is a precise parametric characterization of the cost function across various nodes of the graph corresponding to distinct variables. Our computational experiments conducted on both synthetic and real-world datasets demonstrate the superior performance of our proposed algorithm compared to existing algorithms and state-of-the-art mixed-integer optimization solvers. An important application of our algorithm is in the real-time inference of Gaussian hidden Markov models from data affected by outlier noise. Using a real on-body accelerometer dataset, we solve instances of this problem with over 30,000 variables in under a minute, and its online variant within milliseconds on a standard computer. A Python implementation of our algorithm is available at https://github.com/aareshfb/Tree-Parametric-Algorithm.git. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07955 [pdf, other]

Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components

Authors: Naichen Shi, Salar Fattahi, Raed Al Kontar

Abstract: In this work, we study the problem of common and unique feature extraction from noisy data. When we have N observation matrices from N different and associated sources corrupted by sparse and potentially gross noise, can we recover the common and unique components from these noisy observations? This is a challenging task as the number of parameters to estimate is approximately thrice the number of… ▽ More In this work, we study the problem of common and unique feature extraction from noisy data. When we have N observation matrices from N different and associated sources corrupted by sparse and potentially gross noise, can we recover the common and unique components from these noisy observations? This is a challenging task as the number of parameters to estimate is approximately thrice the number of observations. Despite the difficulty, we propose an intuitive alternating minimization algorithm called triple component matrix factorization (TCMF) to recover the three components exactly. TCMF is distinguished from existing works in literature thanks to two salient features. First, TCMF is a principled method to separate the three components given noisy observations provably. Second, the bulk of the computation in TCMF can be distributed. On the technical side, we formulate the problem as a constrained nonconvex nonsmooth optimization problem. Despite the intricate nature of the problem, we provide a Taylor series characterization of its solution by solving the corresponding Karush-Kuhn-Tucker conditions. Using this characterization, we can show that the alternating minimization algorithm makes significant progress at each iteration and converges into the ground truth at a linear rate. Numerical experiments in video segmentation and anomaly detection highlight the superior feature extraction abilities of TCMF. △ Less

Submitted 8 November, 2024; v1 submitted 21 March, 2024; originally announced April 2024.

Journal ref: Journal of Machine Learning Research, 2024

arXiv:2402.06756 [pdf, other]

Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

Authors: Jianhao Ma, Salar Fattahi

Abstract: We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring… ▽ More We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetildeΩ(dr^9)$ observations, GD with an initial point $\|\rm{U}_0\| \leq ε$ converges near-linearly to an $ε$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $ε>0$, provided the initial point satisfies $\|\rm{U}_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2307.13750 [pdf, other]

Solution Path of Time-varying Markov Random Fields with Discrete Regularization

Authors: Salar Fattahi, Andres Gomez

Abstract: We study the problem of inferring sparse time-varying Markov random fields (MRFs) with different discrete and temporal regularizations on the parameters. Due to the intractability of discrete regularization, most approaches for solving this problem rely on the so-called maximum-likelihood estimation (MLE) with relaxed regularization, which neither results in ideal statistical properties nor scale… ▽ More We study the problem of inferring sparse time-varying Markov random fields (MRFs) with different discrete and temporal regularizations on the parameters. Due to the intractability of discrete regularization, most approaches for solving this problem rely on the so-called maximum-likelihood estimation (MLE) with relaxed regularization, which neither results in ideal statistical properties nor scale to the dimensions encountered in realistic settings. In this paper, we address these challenges by departing from the MLE paradigm and resorting to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters. Despite the nonconvex and discrete nature of our formulation, we show that it can be solved efficiently and parametrically for all sparsity levels. More specifically, we show that the entire solution path of the time-varying MRF for all sparsity levels can be obtained in $\mathcal{O}(pT^3)$, where $T$ is the number of time steps and $p$ is the number of unknown parameters at any given time. The efficient and parametric characterization of the solution path renders our approach highly suitable for cross-validation, where parameter estimation is required for varying regularization values. Despite its simplicity and efficiency, we show that our proposed approach achieves provably small estimation error for different classes of time-varying MRFs, namely Gaussian and discrete MRFs, with as few as one sample per time. Utilizing our algorithm, we can recover the complete solution path for instances of time-varying MRFs featuring over 30 million variables in less than 12 minutes on a standard laptop computer. Our code is available at \url{https://sites.google.com/usc.edu/gomez/data}. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2305.17744 [pdf, other]

Heterogeneous Matrix Factorization: When Features Differ by Datasets

Authors: Naichen Shi, Raed Al Kontar, Salar Fattahi

Abstract: In myriad statistical applications, data are collected from related but heterogeneous sources. These sources share some commonalities while containing idiosyncratic characteristics. One of the most fundamental challenges in such scenarios is to recover the shared and source-specific factors. Despite the existence of a few heuristic approaches, a generic algorithm with theoretical guarantees has ye… ▽ More In myriad statistical applications, data are collected from related but heterogeneous sources. These sources share some commonalities while containing idiosyncratic characteristics. One of the most fundamental challenges in such scenarios is to recover the shared and source-specific factors. Despite the existence of a few heuristic approaches, a generic algorithm with theoretical guarantees has yet to be established. In this paper, we tackle the problem by proposing a method called Heterogeneous Matrix Factorization to separate the shared and unique factors for a class of problems. HMF maintains the orthogonality between the shared and unique factors by leveraging an invariance property in the objective. The algorithm is easy to implement and intrinsically distributed. On the theoretic side, we show that for the square error loss, HMF will converge into the optimal solutions, which are close to the ground truth. HMF can be integrated auto-encoders to learn nonlinear feature mappings. Through a variety of case studies, we showcase HMF's benefits and applicability in video segmentation, time-series feature extraction, and recommender systems. △ Less

Submitted 27 March, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.15311 [pdf, other]

Personalized Dictionary Learning for Heterogeneous Datasets

Authors: Geyu Liang, Naichen Shi, Raed Al Kontar, Salar Fattahi

Abstract: We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning (DL), but also a… ▽ More We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning (DL), but also arise due to the unknown nature of the shared and unique features. In this paper, we rigorously formulate this problem and provide conditions under which the global and local dictionaries can be provably disentangled. Under these conditions, we provide a meta-algorithm called Personalized Matching and Averaging (PerMA) that can recover both global and local dictionaries from heterogeneous datasets. PerMA is highly efficient; it converges to the ground truth at a linear rate under suitable conditions. Moreover, it automatically borrows strength from strong learners to improve the prediction of weak learners. As a general framework for extracting global and local dictionaries, we show the application of PerDL in different learning tasks, such as training with imbalanced datasets and video surveillance. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.15276 [pdf, other]

Robust Sparse Mean Estimation via Incremental Learning

Authors: Jianhao Ma, Rui Ray Chen, Yinghui He, Salar Fattahi, Wei Hu

Abstract: In this paper, we study the problem of robust sparse mean estimation, where the goal is to estimate a $k$-sparse mean from a collection of partially corrupted samples drawn from a heavy-tailed distribution. Existing estimators face two critical challenges in this setting. First, they are limited by a conjectured computational-statistical tradeoff, implying that any computationally efficient algori… ▽ More In this paper, we study the problem of robust sparse mean estimation, where the goal is to estimate a $k$-sparse mean from a collection of partially corrupted samples drawn from a heavy-tailed distribution. Existing estimators face two critical challenges in this setting. First, they are limited by a conjectured computational-statistical tradeoff, implying that any computationally efficient algorithm needs $\tildeΩ(k^2)$ samples, while its statistically-optimal counterpart only requires $\tilde O(k)$ samples. Second, the existing estimators fall short of practical use as they scale poorly with the ambient dimension. This paper presents a simple mean estimator that overcomes both challenges under moderate conditions: it runs in near-linear time and memory (both with respect to the ambient dimension) while requiring only $\tilde O(k)$ samples to recover the true mean. At the core of our method lies an incremental learning phenomenon: we introduce a simple nonconvex framework that can incrementally learn the top-$k$ nonzero elements of the mean while keeping the zero elements arbitrarily small. Unlike existing estimators, our method does not need any prior knowledge of the sparsity level $k$. We prove the optimality of our estimator by providing a matching information-theoretic lower bound. Finally, we conduct a series of simulations to corroborate our theoretical findings. Our code is available at https://github.com/huihui0902/Robust_mean_estimation. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2302.10963 [pdf, other]

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

Authors: Jianhao Ma, Salar Fattahi

Abstract: We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a n… ▽ More We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided. △ Less

Submitted 4 April, 2025; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2210.12816 [pdf, other]

Simple Alternating Minimization Provably Solves Complete Dictionary Learning

Authors: Geyu Liang, Gavin Zhang, Salar Fattahi, Richard Y. Zhang

Abstract: This paper focuses on the noiseless complete dictionary learning problem, where the goal is to represent a set of given signals as linear combinations of a small number of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms and their poor scalab… ▽ More This paper focuses on the noiseless complete dictionary learning problem, where the goal is to represent a set of given signals as linear combinations of a small number of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms and their poor scalability when dealing with huge-scale datasets. Towards addressing these issues, we propose a simple and efficient algorithm that provably recovers the ground truth when applied to the nonconvex and discrete formulation of the problem in the noiseless setting. We also extend our proposed method to mini-batch and online settings where the data is huge-scale or arrives continuously over time. At the core of our proposed method lies an efficient preconditioning technique that transforms the unknown dictionary to a near-orthonormal one, for which we prove a simple alternating minimization technique converges linearly to the ground truth under minimal conditions. Our numerical experiments on synthetic and real datasets showcase the superiority of our method compared with the existing techniques. △ Less

Submitted 4 March, 2025; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.00346 [pdf, other]

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

Authors: Jianhao Ma, Lingjun Guo, Salar Fattahi

Abstract: This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solutio… ▽ More This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution trajectory. Theoretically, we use our proposed basis function decomposition to establish the convergence of gradient descent (GD) on several representative learning tasks. In particular, we improve the convergence of GD on symmetric matrix factorization and provide a completely new convergence result for the orthogonal symmetric tensor decomposition. Empirically, we illustrate the promise of our proposed framework on realistic deep neural networks (DNNs) across different architectures, gradient-based solvers, and datasets. Our key finding is that gradient-based algorithms monotonically learn the coefficients of a particular orthonormal function basis of DNNs defined as the eigenvectors of the conjugate kernel after training. Our code is available at https://github.com/jianhaoma/function-basis-decomposition. △ Less

Submitted 3 October, 2022; v1 submitted 1 October, 2022; originally announced October 2022.

arXiv:2207.07612 [pdf, other]

Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution

Authors: Jianhao Ma, Salar Fattahi

Abstract: This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the ne… ▽ More This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.10174 [pdf, other]

Efficient Inference of Spatially-varying Gaussian Markov Random Fields with Applications in Gene Regulatory Networks

Authors: Visweswaran Ravikumar, Tong Xu, Wajd N. Al-Holou, Salar Fattahi, Arvind Rao

Abstract: In this paper, we study the problem of inferring spatially-varying Gaussian Markov random fields (SV-GMRF) where the goal is to learn a network of sparse, context-specific GMRFs representing network relationships between genes. An important application of SV-GMRFs is in inference of gene regulatory networks from spatially-resolved transcriptomics datasets. The current work on inference of SV-GMRFs… ▽ More In this paper, we study the problem of inferring spatially-varying Gaussian Markov random fields (SV-GMRF) where the goal is to learn a network of sparse, context-specific GMRFs representing network relationships between genes. An important application of SV-GMRFs is in inference of gene regulatory networks from spatially-resolved transcriptomics datasets. The current work on inference of SV-GMRFs are based on the regularized maximum likelihood estimation (MLE) and suffer from overwhelmingly high computational cost due to their highly nonlinear nature. To alleviate this challenge, we propose a simple and efficient optimization problem in lieu of MLE that comes equipped with strong statistical and computational guarantees. Our proposed optimization problem is extremely efficient in practice: we can solve instances of SV-GMRFs with more than 2 million variables in less than 2 minutes. We apply the developed framework to study how gene regulatory networks in Glioblastoma are spatially rewired within tissue, and identify prominent activity of the transcription factor HES4 and ribosomal proteins as characterizing the gene expression network in the tumor peri-vascular niche that is known to harbor treatment resistant stem cells. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.03345 [pdf, other]

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification

Authors: Gavin Zhang, Salar Fattahi, Richard Y. Zhang

Abstract: We consider using gradient descent to minimize the nonconvex function $f(X)=φ(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $φ$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being global… ▽ More We consider using gradient descent to minimize the nonconvex function $f(X)=φ(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $φ$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $φ$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$. △ Less

Submitted 21 April, 2025; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: v2: accepted at JMLR. v3: minor correction in proof of Lemma 27

arXiv:2202.08788 [pdf, other]

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization

Authors: Jianhao Ma, Salar Fattahi

Abstract: In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with $\ell_1$-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instea… ▽ More In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with $\ell_1$-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the rank of the true solution is unknown and over-estimated instead. The over-estimation of the rank gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple SubGM with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we show that small initialization nullifies the effect of over-parameterization on the performance of SubGM, leading to an exponential improvement in its convergence rate. Moreover, we provide the first unifying framework for analyzing the behavior of SubGM under both outlier and Gaussian noise models, showing that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and--perhaps surprisingly--even if the globally optimal solutions do not correspond to the ground truth. At the core of our results is a robust variant of restricted isometry property, called Sign-RIP, which controls the deviation of the sub-differential of the $\ell_1$-loss from that of an ideal, expected loss. As a byproduct of our results, we consider a subclass of robust low-rank matrix recovery with Gaussian measurements, and show that the number of required samples to guarantee the global convergence of SubGM is independent of the over-parameterized rank. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2110.12547 [pdf, other]

A Graph-based Decomposition Method for Convex Quadratic Optimization with Indicators

Authors: Peijing Liu, Salar Fattahi, Andrés Gómez, Simge Küçükyavuz

Abstract: In this paper, we consider convex quadratic optimization problems with indicator variables when the matrix $Q$ defining the quadratic term in the objective is sparse. We use a graphical representation of the support of $Q$, and show that if this graph is a path, then we can solve the associated problem in polynomial time. This enables us to construct a compact extended formulation for the closure… ▽ More In this paper, we consider convex quadratic optimization problems with indicator variables when the matrix $Q$ defining the quadratic term in the objective is sparse. We use a graphical representation of the support of $Q$, and show that if this graph is a path, then we can solve the associated problem in polynomial time. This enables us to construct a compact extended formulation for the closure of the convex hull of the epigraph of the mixed-integer convex problem. Furthermore, we propose a novel decomposition method for general (sparse) $Q$, which leverages the efficient algorithm for the path case. Our computational experiments demonstrate the effectiveness of the proposed method compared to state-of-the-art mixed-integer optimization solvers. △ Less

Submitted 24 October, 2021; originally announced October 2021.

arXiv:2102.03585 [pdf, other]

Scalable Inference of Sparsely-changing Markov Random Fields with Strong Statistical Guarantees

Authors: Salar Fattahi, Andres Gomez

Abstract: In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we i… ▽ More In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing MRFs. The proposed optimization problem is formulated based on the exact $\ell_0$ regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error. As a special case, we derive sharp statistical guarantees for the inference of sparsely-changing Gaussian MRFs (GMRF) in the high-dimensional regime, showing that such problems can be learned with as few as one sample per time. Our proposed method is extremely efficient in practice: it can accurately estimate sparsely-changing graphical models with more than 500 million variables in less than one hour. △ Less

Submitted 6 February, 2021; originally announced February 2021.

arXiv:2102.02969 [pdf, other]

Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery

Authors: Jianhao Ma, Salar Fattahi

Abstract: Restricted isometry property (RIP), essentially stating that the linear measurements are approximately norm-preserving, plays a crucial role in studying low-rank matrix recovery problem. However, RIP fails in the robust setting, when a subset of the measurements are grossly corrupted with noise. In this work, we propose a robust restricted isometry property, called Sign-RIP, and show its broad app… ▽ More Restricted isometry property (RIP), essentially stating that the linear measurements are approximately norm-preserving, plays a crucial role in studying low-rank matrix recovery problem. However, RIP fails in the robust setting, when a subset of the measurements are grossly corrupted with noise. In this work, we propose a robust restricted isometry property, called Sign-RIP, and show its broad applications in robust low-rank matrix recovery. In particular, we show that Sign-RIP can guarantee the uniform convergence of the subdifferentials of the robust matrix recovery with nonsmooth loss function, even at the presence of arbitrarily dense and arbitrarily large outliers. Based on Sign-RIP, we characterize the location of the critical points in the robust rank-1 matrix recovery, and prove that they are either close to the true solution, or have small norm. Moreover, in the over-parameterized regime, where the rank of the true solution is over-estimated, we show that subgradient method converges to the true solution at a (nearly) dimension-free rate. Finally, we show that sign-RIP enjoys almost the same complexity as its classical counterparts, but provides significantly better robustness against noise. △ Less

Submitted 28 September, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

arXiv:2010.04015 [pdf, other]

Learning Partially Observed Linear Dynamical Systems from Logarithmic Number of Samples

Authors: Salar Fattahi

Abstract: In this work, we study the problem of learning partially observed linear dynamical systems from a single sample trajectory. A major practical challenge in the existing system identification methods is the undesirable dependency of their required sample size on the system dimension: roughly speaking, they presume and rely on sample sizes that scale linearly with respect to the system dimension. Evi… ▽ More In this work, we study the problem of learning partially observed linear dynamical systems from a single sample trajectory. A major practical challenge in the existing system identification methods is the undesirable dependency of their required sample size on the system dimension: roughly speaking, they presume and rely on sample sizes that scale linearly with respect to the system dimension. Evidently, in high-dimensional regime where the system dimension is large, it may be costly, if not impossible, to collect as many samples from the unknown system. In this paper, we will remedy this undesirable dependency on the system dimension by introducing an $\ell_1$-regularized estimation method that can accurately estimate the Markov parameters of the system, provided that the number of samples scale logarithmically with the system dimension. Our result significantly improves the sample complexity of learning partially observed linear dynamical systems: it shows that the Markov parameters of the system can be learned in the high-dimensional setting, where the number of samples is significantly smaller than the system dimension. Traditionally, the $\ell_1$-regularized estimators have been used to promote sparsity in the estimated parameters. By resorting to the notion of "weak sparsity", we show that, irrespective of the true sparsity of the system, a similar regularized estimator can be used to reduce the sample complexity of learning partially observed linear systems, provided that the true system is inherently stable. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:1909.09895 [pdf, other]

Efficient Learning of Distributed Linear-Quadratic Controllers

Authors: Salar Fattahi, Nikolai Matni, Somayeh Sojoudi

Abstract: In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as applied to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near… ▽ More In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as applied to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near-linear time complexity, both measured with respect to the dimension of the system. In particular, we provide sharp end-to-end guarantees on the stability and the performance of the designed distributed controller and prove that for sparse systems, the number of samples needed to guarantee robust and near optimal performance of the designed controller can be significantly smaller than the dimension of the system. Finally, we show that the proposed optimization problem can be solved to global optimality with near-linear time complexity by iteratively solving a series of small quadratic programs. △ Less

Submitted 10 October, 2019; v1 submitted 21 September, 2019; originally announced September 2019.

arXiv:1905.09937 [pdf, other]

On the Absence of Spurious Local Trajectories in Time-varying Nonconvex Optimization

Authors: S. Fattahi, C. Josz, Y. Ding, R. Mohammadi, J. Lavaei, S. Sojoudi

Abstract: In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution… ▽ More In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution in nonconvex (time-invariant) optimization. We develop an ordinary differential equation (ODE) associated with a time-varying nonlinear dynamical system which, at limit, characterizes the spurious local solutions of the time-varying optimization problem. We prove that the absence of spurious local trajectory is closely related to the transient behavior of the developed system. In particular, we show that if the problem is time-varying, the data variation may force all of the ODE trajectories initialized at arbitrary local minima at the initial time to gradually converge to the global solution trajectory. We study the Jacobian of the dynamical system along a local minimum trajectory and show how its eigenvalues are manipulated by the natural data variation in the problem, which may consequently trigger escaping poor local minima over time. △ Less

Submitted 30 October, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1904.09396 [pdf, ps, other]

Learning Sparse Dynamical Systems from a Single Sample Trajectory

Authors: Salar Fattahi, Nikolai Matni, Somayeh Sojoudi

Abstract: This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guar… ▽ More This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guarantees on the accurate recovery of both the sparsity structure and the parameter values of the system. In particular, we show that the proposed estimator can correctly identify the sparsity pattern of the system matrices with high probability, provided that the length of the sample trajectory exceeds a threshold. Furthermore, we show that this threshold scales polynomially in the number of nonzero elements in the system matrices, but logarithmically in the system dimensions --- this improves on existing sample complexity bounds for the sparse system identification problem. We further extend these results to obtain sharp bounds on the $\ell_{\infty}$-norm of the estimation error and show how different properties of the system---such as its stability level and \textit{mutual incoherency}---affect this bound. Finally, an extensive case study on power systems is presented to illustrate the performance of the proposed estimation method. △ Less

Submitted 19 April, 2019; originally announced April 2019.

arXiv:1812.11466 [pdf, other]

Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis

Authors: Salar Fattahi, Somayeh Sojoudi

Abstract: This work is concerned with the non-negative rank-1 robust principal component analysis (RPCA), where the goal is to recover the dominant non-negative principal components of a data matrix precisely, where a number of measurements could be grossly corrupted with sparse and arbitrary large noise. Most of the known techniques for solving the RPCA rely on convex relaxation methods by lifting the prob… ▽ More This work is concerned with the non-negative rank-1 robust principal component analysis (RPCA), where the goal is to recover the dominant non-negative principal components of a data matrix precisely, where a number of measurements could be grossly corrupted with sparse and arbitrary large noise. Most of the known techniques for solving the RPCA rely on convex relaxation methods by lifting the problem to a higher dimension, which significantly increase the number of variables. As an alternative, the well-known Burer-Monteiro approach can be used to cast the RPCA as a non-convex and non-smooth $\ell_1$ optimization problem with a significantly smaller number of variables. In this work, we show that the low-dimensional formulation of the symmetric and asymmetric positive rank-1 RPCA based on the Burer-Monteiro approach has benign landscape, i.e., 1) it does not have any spurious local solution, 2) has a unique global solution, and 3) its unique global solution coincides with the true components. An implication of this result is that simple local search algorithms are guaranteed to achieve a zero global optimality gap when directly applied to the low-dimensional formulation. Furthermore, we provide strong deterministic and probabilistic guarantees for the exact recovery of the true principal components. In particular, it is shown that a constant fraction of the measurements could be grossly corrupted and yet they would not create any spurious local solution. △ Less

Submitted 3 September, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

arXiv:1803.07753 [pdf, ps, other]

Sample Complexity of Sparse System Identification Problem

Authors: Salar Fattahi, Somayeh Sojoudi

Abstract: In this paper, we study the system identification problem for sparse linear time-invariant systems. We propose a sparsity promoting block-regularized estimator to identify the dynamics of the system with only a limited number of input-state data samples. We characterize the properties of this estimator under high-dimensional scaling, where the growth rate of the system dimension is comparable to o… ▽ More In this paper, we study the system identification problem for sparse linear time-invariant systems. We propose a sparsity promoting block-regularized estimator to identify the dynamics of the system with only a limited number of input-state data samples. We characterize the properties of this estimator under high-dimensional scaling, where the growth rate of the system dimension is comparable to or even faster than that of the number of available sample trajectories. In particular, using contemporary results on high-dimensional statistics, we show that the proposed estimator results in a small element-wise error, provided that the number of sample trajectories is above a threshold. This threshold depends polynomially on the size of each block and the number of nonzero elements at different rows of input and state matrices, but only logarithmically on the system dimension. A by-product of this result is that the number of sample trajectories required for sparse system identification is significantly smaller than the dimension of the system. Furthermore, we show that, unlike the recently celebrated least-squares estimators for system identification problems, the method developed in this work is capable of \textit{exact recovery} of the underlying sparsity structure of the system with the aforementioned number of data samples. Extensive case studies on synthetically generated systems, physical mass-spring networks, and multi-agent systems are offered to demonstrate the effectiveness of the proposed method. △ Less

Submitted 26 August, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

arXiv:1802.04911 [pdf, ps, other]

Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion

Authors: Richard Y. Zhang, Salar Fattahi, Somayeh Sojoudi

Abstract: The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and sol… ▽ More The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $ε$-accurate solution in $O(n\log(1/ε))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB. △ Less

Submitted 6 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: 35-th International Conference on Machine Learning (ICML 2018)

arXiv:1711.10428 [pdf, other]

A Bound Strengthening Method for Optimal Transmission Switching in Power Systems

Authors: Salar Fattahi, Javad Lavaei, Alper Atamturk

Abstract: This paper studies the optimal transmission switching (OTS) problem for power systems, where certain lines are fixed (uncontrollable) and the remaining ones are controllable via on/off switches. The goal is to identify a topology of the power grid that minimizes the cost of the system operation while satisfying the physical and operational constraints. Most of the existing methods for the problem… ▽ More This paper studies the optimal transmission switching (OTS) problem for power systems, where certain lines are fixed (uncontrollable) and the remaining ones are controllable via on/off switches. The goal is to identify a topology of the power grid that minimizes the cost of the system operation while satisfying the physical and operational constraints. Most of the existing methods for the problem are based on first converting the OTS into a mixed-integer linear program (MILP) or mixed-integer quadratic program (MIQP), and then iteratively solving a series of its convex relaxations. The performance of these methods depends heavily on the strength of the MILP or MIQP formulations. In this paper, it is shown that finding the strongest variable upper and lower bounds to be used in an MILP or MIQP formulation of the OTS based on the big-$M$ or McCormick inequalities is NP-hard. Furthermore, it is proven that unless P=NP, there is no constant-factor approximation algorithm for constructing these variable bounds. Despite the inherent difficulty of obtaining the strongest bounds in general, a simple bound strengthening method is presented to strengthen the convex relaxation of the problem when there exists a connected spanning subnetwork of the system with fixed lines. The proposed method can be treated as a preprocessing step that is independent of the solver to be later used for numerical calculations and can be carried out offline before initiating the solver. A remarkable speedup in the runtime of the mixed-integer solvers is obtained using the proposed bound strengthening method for medium- and large-scale real-world systems. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Report number: BCOL Research Report 17.06, IEOR, University of California-Berkeley

arXiv:1711.09131 [pdf, ps, other]

Sparse Inverse Covariance Estimation for Chordal Structures

Authors: Salar Fattahi, Richard Y. Zhang, Somayeh Sojoudi

Abstract: In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix,… ▽ More In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours. △ Less

Submitted 24 November, 2017; originally announced November 2017.

arXiv:1708.09479 [pdf, ps, other]

Graphical Lasso and Thresholding: Equivalence and Closed-form Solutions

Authors: Salar Fattahi, Somayeh Sojoudi

Abstract: Graphical Lasso (GL) is a popular method for learning the structure of an undirected graphical model, which is based on an $l_1$ regularization technique. The objective of this paper is to compare the computationally-heavy GL technique with a numerically-cheap heuristic method that is based on simply thresholding the sample covariance matrix. To this end, two notions of sign-consistent and inverse… ▽ More Graphical Lasso (GL) is a popular method for learning the structure of an undirected graphical model, which is based on an $l_1$ regularization technique. The objective of this paper is to compare the computationally-heavy GL technique with a numerically-cheap heuristic method that is based on simply thresholding the sample covariance matrix. To this end, two notions of sign-consistent and inverse-consistent matrices are developed, and then it is shown that the thresholding and GL methods are equivalent if: (i) the thresholded sample covariance matrix is both sign-consistent and inverse-consistent, and (ii) the gap between the largest thresholded and the smallest un-thresholded entries of the sample covariance matrix is not too small. By building upon this result, it is proved that the GL method---as a conic optimization problem---has an explicit closed-form solution if the thresholded sample covariance matrix has an acyclic structure. This result is then generalized to arbitrary sparse support graphs, where a formula is found to obtain an approximate solution of GL. Furthermore, it is shown that the approximation error of the derived explicit formula decreases exponentially fast with respect to the length of the minimum-length cycle of the sparsity graph. The developed results are demonstrated on synthetic data, functional MRI data, traffic flows for transportation networks, and massive randomly generated data sets. We show that the proposed method can obtain an accurate approximation of the GL for instances with the sizes as large as $80,000\times 80,000$ (more than 3.2 billion variables) in less than 30 minutes on a standard laptop computer running MATLAB, while other state-of-the-art methods do not converge within 4 hours. △ Less

Submitted 28 June, 2019; v1 submitted 30 August, 2017; originally announced August 2017.

Showing 1–31 of 31 results for author: Fattahi, S