Search | arXiv e-print repository

Gradient Methods with Online Scaling Part I. Theoretical Foundations

Authors: Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, Madeleine Udell

Abstract: This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning algorithm. Conseq… ▽ More This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning algorithm. Consequently, instantiations of OSGM achieve convergence rates that are asymptotically no worse than the optimal stepsize. OSGM yields desirable convergence guarantees on smooth convex problems, including 1) trajectory-dependent global convergence on smooth convex objectives; 2) an improved complexity result on smooth strongly convex problems, and 3) local superlinear convergence. Notably, OSGM constitutes a new family of first-order methods with non-asymptotic superlinear convergence, joining the celebrated quasi-Newton methods. Finally, OSGM explains the empirical success of the popular hypergradient-descent heuristic in optimization for machine learning. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: Extension of arXiv:2411.01803 and arXiv:2502.11229

arXiv:2505.13723 [pdf, ps, other]

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michał Dereziński, Madeleine Udell

Abstract: Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number o… ▽ More Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number of samples in the dataset. We propose an approximate, distributed, accelerated sketch-and-project algorithm ($\texttt{ADASAP}$) for solving these linear systems, which improves scalability. We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean. In particular, this yields the first efficient, condition number-free algorithm for estimating the posterior mean along the top spectral basis functions, showing that our approach is principled for GP inference. $\texttt{ADASAP}$ outperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, $\texttt{ADASAP}$ scales to a dataset with $> 3 \cdot 10^8$ samples, a feat which has not been accomplished in the literature. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 28 pages, 6 figures, 2 tables

arXiv:2502.16380 [pdf, ps, other]

Understanding Fixed Predictions via Confined Regions

Authors: Connor Lawless, Tsui-Wei Weng, Berk Ustun, Madeleine Udell

Abstract: Machine learning models can assign fixed predictions that preclude individuals from changing their outcome. Existing approaches to audit fixed predictions do so on a pointwise basis, which requires access to an existing dataset of individuals and may fail to anticipate fixed predictions in out-of-sample data. This work presents a new paradigm to identify fixed predictions by finding confined regio… ▽ More Machine learning models can assign fixed predictions that preclude individuals from changing their outcome. Existing approaches to audit fixed predictions do so on a pointwise basis, which requires access to an existing dataset of individuals and may fail to anticipate fixed predictions in out-of-sample data. This work presents a new paradigm to identify fixed predictions by finding confined regions of the feature space in which all individuals receive fixed predictions. This paradigm enables the certification of recourse for out-of-sample data, works in settings without representative datasets, and provides interpretable descriptions of individuals with fixed predictions. We develop a fast method to discover confined regions for linear classifiers using mixed-integer quadratically constrained programming. We conduct a comprehensive empirical study of confined regions across diverse applications. Our results highlight that existing pointwise verification methods fail to anticipate future individuals with fixed predictions, while our method both identifies them and provides an interpretable description. △ Less

Submitted 8 July, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

arXiv:2502.11229 [pdf, other]

Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

Authors: Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

Abstract: This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with emp… ▽ More This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, HDM automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of HDM reported in the literature and proposes efficient strategies to address it. We also develop two HDM variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, HDM-HB often matches the performance of L-BFGS, an efficient and practical quasi-Newton method, using less memory and cheaper iterations. △ Less

Submitted 16 March, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

arXiv:2501.04972 [pdf, ps, other]

Algebraic characterization of equivalence between optimization algorithms

Authors: Laurent Lessard, Madeleine Udell

Abstract: When are two algorithms the same? How can we be sure a recently proposed algorithm is novel, and not a minor twist on an existing method? In this paper, we present a framework for reasoning about equivalence between a broad class of iterative algorithms, with a focus on algorithms designed for convex optimization. We propose several notions of what it means for two algorithms to be equivalent, and… ▽ More When are two algorithms the same? How can we be sure a recently proposed algorithm is novel, and not a minor twist on an existing method? In this paper, we present a framework for reasoning about equivalence between a broad class of iterative algorithms, with a focus on algorithms designed for convex optimization. We propose several notions of what it means for two algorithms to be equivalent, and provide computationally tractable means to detect equivalence. Our main definition, oracle equivalence, states that two algorithms are equivalent if they result in the same sequence of calls to the function oracles (for suitable initialization). Borrowing from control theory, we use state-space realizations to represent algorithms and characterize algorithm equivalence via transfer functions. Our framework can also identify and characterize equivalence between algorithms that use different oracles that are related via a linear fractional transformation. Prominent examples include linear transformations and function conjugation. △ Less

Submitted 9 January, 2025; originally announced January 2025.

Comments: This paper generalizes and provides new analysis and examples compared to arxiv:2105.04684

arXiv:2411.16015 [pdf, other]

When Does Primal Interior Point Method Beat Primal-dual in Linear Optimization?

Authors: Wenzhi Gao, Huikang Liu, Yinyu Ye, Madeleine Udell

Abstract: The primal-dual interior point method (IPM) is widely regarded as the most efficient IPM variant for linear optimization. In this paper, we demonstrate that the improved stability of the pure primal IPM can allow speedups relative to a primal-dual solver, particularly as the IPM approaches convergence. The stability of the primal scaling matrix makes it possible to accelerate each primal IPM step… ▽ More The primal-dual interior point method (IPM) is widely regarded as the most efficient IPM variant for linear optimization. In this paper, we demonstrate that the improved stability of the pure primal IPM can allow speedups relative to a primal-dual solver, particularly as the IPM approaches convergence. The stability of the primal scaling matrix makes it possible to accelerate each primal IPM step using fast preconditioned iterative solvers for the normal equations. Crucially, we identify properties of the central path that make it possible to stabilize the normal equations. Experiments on benchmark datasets demonstrate the efficiency of primal IPM and showcase its potential for practical applications in linear optimization and beyond. △ Less

Submitted 24 November, 2024; originally announced November 2024.

arXiv:2411.01803 [pdf, other]

Gradient Methods with Online Scaling

Authors: Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, Madeleine Udell

Abstract: We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong co… ▽ More We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee with respect to the optimal scaling matrix for the iteration trajectory. For smooth strongly convex optimization, our results provide an $O(κ^\star \log(1/\varepsilon)$) complexity result, where $κ^\star$ is the condition number achievable by the optimal preconditioner, improving on the previous $O(\sqrt{n}κ^\star \log(1/\varepsilon))$ result. In particular, a variant of our method achieves superlinear convergence on convex quadratics. For smooth convex optimization, we show for the first time that the widely-used hypergradient descent heuristic improves on the convergence of gradient descent. △ Less

Submitted 5 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2407.10070 [pdf, other]

Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression

Authors: Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell

Abstract: Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive c… ▽ More Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive computational and storage costs. The standard approach to scale KRR to large datasets chooses a set of inducing points and solves an approximate version of the problem, inducing points KRR. However, the resulting solution tends to have worse predictive performance than the full KRR solution. In this work, we introduce a new solver, ASkotch, for full KRR that provides better solutions faster than state-of-the-art solvers for full and inducing points KRR. ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence. Under appropriate conditions, we show that ASkotch obtains condition-number-free linear convergence. This convergence analysis rests on the theory of ridge leverage scores and determinantal point processes. ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR. Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines. △ Less

Submitted 21 February, 2025; v1 submitted 14 July, 2024; originally announced July 2024.

Comments: 64 pages (including appendices), 16 figures, 5 tables

MSC Class: 65F10; 68W20; 90C06

arXiv:2404.14524 [pdf, other]

Randomized Nyström Preconditioned Interior Point-Proximal Method of Multipliers

Authors: Ya-Chi Chu, Luiz-Rafael Santos, Madeleine Udell

Abstract: We present a new algorithm for convex separable quadratic programming (QP) called Nys-IP-PMM, a regularized interior-point solver that uses low-rank structure to accelerate solution of the Newton system. The algorithm combines the interior point proximal method of multipliers (IP-PMM) with the randomized Nyström preconditioned conjugate gradient method as the inner linear system solver. Our algori… ▽ More We present a new algorithm for convex separable quadratic programming (QP) called Nys-IP-PMM, a regularized interior-point solver that uses low-rank structure to accelerate solution of the Newton system. The algorithm combines the interior point proximal method of multipliers (IP-PMM) with the randomized Nyström preconditioned conjugate gradient method as the inner linear system solver. Our algorithm is matrix-free: it accesses the input matrices solely through matrix-vector products, as opposed to methods involving matrix factorization. It works particularly well for separable QP instances with dense constraint matrices. We establish convergence of Nys-IP-PMM. Numerical experiments demonstrate its superior performance in terms of wallclock time compared to previous matrix-free IPM-based approaches. △ Less

Submitted 13 January, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

MSC Class: 90C06; 90C20; 90C51; 65F08

arXiv:2402.01868 [pdf, other]

Challenges in Training PINNs: A Loss Landscape Perspective

Authors: Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell

Abstract: This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing… ▽ More This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations. △ Less

Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024 Oral; 33 pages (including appendices), 10 figures, 3 tables

arXiv:2312.15594 [pdf, other]

Scalable Approximate Optimal Diagonal Preconditioning

Authors: Wenzhi Gao, Zhaonan Qu, Madeleine Udell, Yinyu Ye

Abstract: We consider the problem of finding the optimal diagonal preconditioner for a positive definite matrix. Although this problem has been shown to be solvable and various methods have been proposed, none of the existing approaches are scalable to matrices of large dimension, or when access is limited to black-box matrix-vector products, thereby significantly limiting their practical application. In vi… ▽ More We consider the problem of finding the optimal diagonal preconditioner for a positive definite matrix. Although this problem has been shown to be solvable and various methods have been proposed, none of the existing approaches are scalable to matrices of large dimension, or when access is limited to black-box matrix-vector products, thereby significantly limiting their practical application. In view of these challenges, we propose practical algorithms applicable to finding approximate optimal diagonal preconditioners of large sparse systems. Our approach is based on the idea of dimension reduction, and combines techniques from semi-definite programming (SDP), random projection, semi-infinite programming (SIP), and column generation. Numerical experiments demonstrate that our method scales to sparse matrices of size greater than $10^7$. Notably, our approach is efficient and implementable using only black-box matrix-vector product operations, making it highly practical for a wide variety of applications. △ Less

Submitted 5 November, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2310.08333 [pdf, other]

GeNIOS: an (almost) second-order operator-splitting solver for large-scale convex optimization

Authors: Theo Diamandis, Zachary Frangella, Shipu Zhao, Bartolomeo Stellato, Madeleine Udell

Abstract: We introduce the GEneralized Newton Inexact Operator Splitting solver (GeNIOS) for large-scale convex optimization. GeNIOS speeds up ADMM by approximately solving approximate subproblems: it uses a second-order approximation to the most challenging ADMM subproblem and solves it inexactly with a fast randomized solver. Despite these approximations, GeNIOS retains the convergence rate of classic ADM… ▽ More We introduce the GEneralized Newton Inexact Operator Splitting solver (GeNIOS) for large-scale convex optimization. GeNIOS speeds up ADMM by approximately solving approximate subproblems: it uses a second-order approximation to the most challenging ADMM subproblem and solves it inexactly with a fast randomized solver. Despite these approximations, GeNIOS retains the convergence rate of classic ADMM and can detect primal and dual infeasibility from the algorithm iterates. At each iteration, the algorithm solves a positive-definite linear system that arises from a second-order approximation of the first subproblem and computes an approximate proximal operator. GeNIOS solves the linear system using an indirect solver with a randomized preconditioner, making it particularly useful for large-scale problems with dense data. Our high-performance open-source implementation in Julia allows users to specify convex optimization problems directly (with or without conic reformulation) and allows extensive customization. We illustrate GeNIOS's performance on a variety of problem types. Notably, GeNIOS is more than ten times faster than existing solvers on large-scale, dense problems. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.02014 [pdf, other]

PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Authors: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell

Abstract: This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAG… ▽ More This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods. △ Less

Submitted 13 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 52 pages, 9 Figures

arXiv:2302.03863 [pdf, ps, other]

On the (linear) convergence of Generalized Newton Inexact ADMM

Authors: Zachary Frangella, Theo Diamandis, Bartolomeo Stellato, Madeleine Udell

Abstract: This paper presents GeNI-ADMM, a framework for large-scale composite convex optimization that facilitates theoretical analysis of both existing and new approximate ADMM schemes. GeNI-ADMM encompasses any ADMM algorithm that solves a first- or second-order approximation to the ADMM subproblem inexactly. GeNI-ADMM exhibits the usual $\mathcal{O} (1/t)$-convergence rate under standard hypotheses and… ▽ More This paper presents GeNI-ADMM, a framework for large-scale composite convex optimization that facilitates theoretical analysis of both existing and new approximate ADMM schemes. GeNI-ADMM encompasses any ADMM algorithm that solves a first- or second-order approximation to the ADMM subproblem inexactly. GeNI-ADMM exhibits the usual $\mathcal{O} (1/t)$-convergence rate under standard hypotheses and converges linearly under additional hypotheses such as strong convexity. Further, the GeNI-ADMM framework provides explicit convergence rates for ADMM variants accelerated with randomized linear algebra, such as NysADMM and sketch-and-solve ADMM, resolving an important open question on the convergence of these methods. This analysis quantifies the benefit of improved approximations and can aid in the design of new ADMM variants with faster convergence. △ Less

Submitted 18 June, 2025; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 31 pages, 4 figures, 2 tables

arXiv:2211.08597 [pdf, other]

SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates

Authors: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell

Abstract: SketchySGD improves upon existing stochastic gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian and by introducing an automated stepsize that works well across a wide range of convex machine learning problems. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. Further, in the ill… ▽ More SketchySGD improves upon existing stochastic gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian and by introducing an automated stepsize that works well across a wide range of convex machine learning problems. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. Further, in the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems. We validate this improvement empirically with ridge regression experiments on real data. Numerical experiments on both ridge and logistic regression problems with dense and sparse data, show that SketchySGD equipped with its default hyperparameters can achieve comparable or better results than popular stochastic gradient methods, even when they have been tuned to yield their best performance. In particular, SketchySGD is able to solve an ill-conditioned logistic regression problem with a data matrix that takes more than $840$GB RAM to store, while its competitors, even when tuned, are unable to make any progress. SketchySGD's ability to work out-of-the box with its default hyperparameters and excel on ill-conditioned problems is an advantage over other stochastic gradient methods, most of which require careful hyperparameter tuning (especially of the learning rate) to obtain good performance and degrade in the presence of ill-conditioning. △ Less

Submitted 20 February, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 65 pages, 43 figures, 8 tables

arXiv:2202.11599 [pdf, other]

NysADMM: faster composite convex optimization via low-rank approximation

Authors: Shipu Zhao, Zachary Frangella, Madeleine Udell

Abstract: This paper develops a scalable new algorithm, called NysADMM, to minimize a smooth convex loss function with a convex regularizer. NysADMM accelerates the inexact Alternating Direction Method of Multipliers (ADMM) by constructing a preconditioner for the ADMM subproblem from a randomized low-rank Nyström approximation. NysADMM comes with strong theoretical guarantees: it solves the ADMM subproblem… ▽ More This paper develops a scalable new algorithm, called NysADMM, to minimize a smooth convex loss function with a convex regularizer. NysADMM accelerates the inexact Alternating Direction Method of Multipliers (ADMM) by constructing a preconditioner for the ADMM subproblem from a randomized low-rank Nyström approximation. NysADMM comes with strong theoretical guarantees: it solves the ADMM subproblem in a constant number of iterations when the rank of the Nyström approximation is the effective dimension of the subproblem regularized Gram matrix. In practice, ranks much smaller than the effective dimension can succeed, so NysADMM uses an adaptive strategy to choose the rank that enjoys analogous guarantees. Numerical experiments on real-world datasets demonstrate that NysADMM can solve important applications, such as the lasso, logistic regression, and support vector machines, in half the time (or less) required by standard solvers. The breadth of problems on which NysADMM beats standard solvers is a surprise: it suggests that ADMM is a dominant paradigm for numerical optimization across a wide range of statistical learning problems that are usually solved with bespoke methods. △ Less

Submitted 2 July, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

arXiv:2110.02820 [pdf, other]

Randomized Nyström Preconditioning

Authors: Zachary Frangella, Joel A. Tropp, Madeleine Udell

Abstract: This paper introduces the Nyström PCG algorithm for solving a symmetric positive-definite linear system. The algorithm applies the randomized Nyström method to form a low-rank approximation of the matrix, which leads to an efficient preconditioner that can be deployed with the conjugate gradient algorithm. Theoretical analysis shows that preconditioned system has constant condition number as soon… ▽ More This paper introduces the Nyström PCG algorithm for solving a symmetric positive-definite linear system. The algorithm applies the randomized Nyström method to form a low-rank approximation of the matrix, which leads to an efficient preconditioner that can be deployed with the conjugate gradient algorithm. Theoretical analysis shows that preconditioned system has constant condition number as soon as the rank of the approximation is comparable with the number of effective degrees of freedom in the matrix. The paper also develops adaptive methods that provably achieve similar performance without knowledge of the effective dimension. Numerical tests show that Nyström PCG can rapidly solve large linear systems that arise in data analysis problems, and it surpasses several competing methods from the literature. △ Less

Submitted 17 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 37 pages, 3 figures

MSC Class: 65F08; 68W20; 65F55; 65F22

arXiv:2105.04684 [pdf, other]

An automatic system to detect equivalence between iterative algorithms

Authors: Shipu Zhao, Laurent Lessard, Madeleine Udell

Abstract: When are two algorithms the same? How can we be sure a recently proposed algorithm is novel, and not a minor twist on an existing method? In this paper, we present a framework for reasoning about equivalence between a broad class of iterative algorithms, with a focus on algorithms designed for convex optimization. We propose several notions of what it means for two algorithms to be equivalent, and… ▽ More When are two algorithms the same? How can we be sure a recently proposed algorithm is novel, and not a minor twist on an existing method? In this paper, we present a framework for reasoning about equivalence between a broad class of iterative algorithms, with a focus on algorithms designed for convex optimization. We propose several notions of what it means for two algorithms to be equivalent, and provide computationally tractable means to detect equivalence. Our main definition, oracle equivalence, states that two algorithms are equivalent if they result in the same sequence of calls to the function oracles (for suitable initialization). Borrowing from control theory, we use state-space realizations to represent algorithms and characterize algorithm equivalence via transfer functions. Our framework can also identify and characterize some algorithm transformations including permutations of the update equations, repetition of the iteration, and conjugation of some of the function oracles in the algorithm. To support the paper, we have developed a software package named Linnaeus that implements the framework to identify other iterative algorithms that are equivalent to an input algorithm. More broadly, this framework and software advances the goal of making mathematics searchable. △ Less

Submitted 9 January, 2025; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: This paper documents a software system for identifying equivalence between optimization algorithms. The analysis in this paper has been improved in arxiv:2501.04972

arXiv:2105.00105 [pdf, other]

Tensor Random Projection for Low Memory Dimension Reduction

Authors: Yiming Sun, Yang Guo, Joel A. Tropp, Madeleine Udell

Abstract: Random projections reduce the dimension of a set of vectors while preserving structural information, such as distances between vectors in the set. This paper proposes a novel use of row-product random matrices in random projection, where we call it Tensor Random Projection (TRP). It requires substantially less memory than existing dimension reduction maps. The TRP map is formed as the Khatri-Rao p… ▽ More Random projections reduce the dimension of a set of vectors while preserving structural information, such as distances between vectors in the set. This paper proposes a novel use of row-product random matrices in random projection, where we call it Tensor Random Projection (TRP). It requires substantially less memory than existing dimension reduction maps. The TRP map is formed as the Khatri-Rao product of several smaller random projections, and is compatible with any base random projection including sparse maps, which enable dimension reduction with very low query cost and no floating point operations. We also develop a reduced variance extension. We provide a theoretical analysis of the bias and variance of the TRP, and a non-asymptotic error analysis for a TRP composed of two smaller maps. Experiments on both synthetic and MNIST data show that our method performs as well as conventional methods with substantially less storage. △ Less

Submitted 30 April, 2021; originally announced May 2021.

Comments: In NeurIPS Workshop on Relational Representation Learning (2018)

arXiv:2101.00323 [pdf, other]

TenIPS: Inverse Propensity Sampling for Tensor Completion

Authors: Chengrun Yang, Lijun Ding, Ziyang Wu, Madeleine Udell

Abstract: Tensors are widely used to represent multiway arrays of data. The recovery of missing entries in a tensor has been extensively studied, generally under the assumption that entries are missing completely at random (MCAR). However, in most practical settings, observations are missing not at random (MNAR): the probability that a given entry is observed (also called the propensity) may depend on other… ▽ More Tensors are widely used to represent multiway arrays of data. The recovery of missing entries in a tensor has been extensively studied, generally under the assumption that entries are missing completely at random (MCAR). However, in most practical settings, observations are missing not at random (MNAR): the probability that a given entry is observed (also called the propensity) may depend on other entries in the tensor or even on the value of the missing entry. In this paper, we study the problem of completing a partially observed tensor with MNAR observations, without prior information about the propensities. To complete the tensor, we assume that both the original tensor and the tensor of propensities have low multilinear rank. The algorithm first estimates the propensities using a convex relaxation and then predicts missing values using a higher-order SVD approach, reweighting the observed tensor by the inverse propensities. We provide finite-sample error bounds on the resulting complete tensor. Numerical experiments demonstrate the effectiveness of our approach. △ Less

Submitted 22 April, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

Comments: AISTATS 2021

arXiv:2012.00183 [pdf, other]

A Strict Complementarity Approach to Error Bound and Sensitivity of Solution of Conic Programs

Authors: Lijun Ding, Madeleine Udell

Abstract: In this paper, we provide an elementary, geometric, and unified framework to analyze conic programs that we call the strict complementarity approach. This framework allows us to establish error bounds and quantify the sensitivity of the solution. The framework uses three classical ideas from convex geometry and linear algebra: linear regularity of convex sets, facial reduction, and orthogonal deco… ▽ More In this paper, we provide an elementary, geometric, and unified framework to analyze conic programs that we call the strict complementarity approach. This framework allows us to establish error bounds and quantify the sensitivity of the solution. The framework uses three classical ideas from convex geometry and linear algebra: linear regularity of convex sets, facial reduction, and orthogonal decomposition. We show how to use this framework to derive error bounds for linear programming (LP), second order cone programming (SOCP), and semidefinite programming (SDP). △ Less

Submitted 16 September, 2022; v1 submitted 30 November, 2020; originally announced December 2020.

Comments: 23 pages, 2 figures. In this revision, we added an approach based on conic decomposition. See Section D for details

arXiv:2006.16142 [pdf, other]

$k$FW: A Frank-Wolfe style algorithm with stronger subproblem oracles

Authors: Lijun Ding, Jicong Fan, Madeleine Udell

Abstract: This paper proposes a new variant of Frank-Wolfe (FW), called $k$FW. Standard FW suffers from slow convergence: iterates often zig-zag as update directions oscillate around extreme points of the constraint set. The new variant, $k$FW, overcomes this problem by using two stronger subproblem oracles in each iteration. The first is a $k$ linear optimization oracle ($k$LOO) that computes the $k$ best… ▽ More This paper proposes a new variant of Frank-Wolfe (FW), called $k$FW. Standard FW suffers from slow convergence: iterates often zig-zag as update directions oscillate around extreme points of the constraint set. The new variant, $k$FW, overcomes this problem by using two stronger subproblem oracles in each iteration. The first is a $k$ linear optimization oracle ($k$LOO) that computes the $k$ best update directions (rather than just one). The second is a $k$ direction search ($k$DS) that minimizes the objective over a constraint set represented by the $k$ best update directions and the previous iterate. When the problem solution admits a sparse representation, both oracles are easy to compute, and $k$FW converges quickly for smooth convex objectives and several interesting constraint sets: $k$FW achieves finite $\frac{4L_f^3D^4}{γδ^2}$ convergence on polytopes and group norm balls, and linear convergence on spectrahedra and nuclear norm balls. Numerical experiments validate the effectiveness of $k$FW and demonstrate an order-of-magnitude speedup over existing approaches. △ Less

Submitted 15 November, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: 20 pages main text, 10 figures

arXiv:2002.10673 [pdf, other]

On the simplicity and conditioning of low rank semidefinite programs

Authors: Lijun Ding, Madeleine Udell

Abstract: Low rank matrix recovery problems appear widely in statistics, combinatorics, and imaging. One celebrated method for solving these problems is to formulate and solve a semidefinite program (SDP). It is often known that the exact solution to the SDP with perfect data recovers the solution to the original low rank matrix recovery problem. It is more challenging to show that an approximate solution t… ▽ More Low rank matrix recovery problems appear widely in statistics, combinatorics, and imaging. One celebrated method for solving these problems is to formulate and solve a semidefinite program (SDP). It is often known that the exact solution to the SDP with perfect data recovers the solution to the original low rank matrix recovery problem. It is more challenging to show that an approximate solution to the SDP formulated with noisy problem data acceptably solves the original problem; arguments are usually ad hoc for each problem setting, and can be complex. In this note, we identify a set of conditions that we call simplicity that limit the error due to noisy problem data or incomplete convergence. In this sense, simple SDPs are robust: simple SDPs can be (approximately) solved efficiently at scale; and the resulting approximate solutions, even with noisy data, can be trusted. Moreover, we show that simplicity holds generically, and also for many structured low rank matrix recovery problems, including the stochastic block model, $\mathbb{Z}_2$ synchronization, and matrix completion. Formally, we call an SDP simple if it has a surjective constraint map, admits a unique primal and dual solution pair, and satisfies strong duality and strict complementarity. However, simplicity is not a panacea: we show the Burer-Monteiro formulation of the SDP may have spurious second-order critical points, even for a simple SDP with a rank 1 solution. △ Less

Submitted 22 July, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

Comments: 24 pages, 1 figure, and 1 table

arXiv:1912.02949 [pdf, other]

doi 10.1137/19M1305045

Scalable Semidefinite Programming

Authors: Alp Yurtsever, Joel A. Tropp, Olivier Fercoq, Madeleine Udell, Volkan Cevher

Abstract: Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This paper develops a provably correct randomized algorithm for solving large, weakly constrained SDP problems by economizing on the storage and arithmetic costs. Numerical evidence shows that the method is effective for a range of applications, including relax… ▽ More Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This paper develops a provably correct randomized algorithm for solving large, weakly constrained SDP problems by economizing on the storage and arithmetic costs. Numerical evidence shows that the method is effective for a range of applications, including relaxations of MaxCut, abstract phase retrieval, and quadratic assignment. Running on a laptop equivalent, the algorithm can handle SDP instances where the matrix variable has over $10^{14}$ entries. △ Less

Submitted 25 March, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

MSC Class: 90C22; 65K05 (Primary); 65F99 (Secondary)

Journal ref: SIAM Journal on Mathematics of Data Science, vol. 3, num. 1, pp. 171-200, Feb. 2021

arXiv:1904.10951 [pdf, other]

Low-Rank Tucker Approximation of a Tensor From Streaming Data

Authors: Yiming Sun, Yang Guo, Charlene Luo, Joel Tropp, Madeleine Udell

Abstract: This paper describes a new algorithm for computing a low-Tucker-rank approximation of a tensor. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor, and it uses storage prop… ▽ More This paper describes a new algorithm for computing a low-Tucker-rank approximation of a tensor. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor, and it uses storage proportional to the degrees of freedom in the output Tucker approximation. The algorithm does not require a second pass over the tensor, although it can exploit another view to compute a superior approximation. The paper provides a rigorous theoretical guarantee on the approximation error. Extensive numerical experiments show that that the algorithm produces useful results that improve on the state of the art for streaming Tucker decomposition. △ Less

Submitted 30 April, 2021; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: Appendix includes supplement from published version

Journal ref: SIAM Journal on Mathematics of Data Science 2.4 (2020): 1123-1150

arXiv:1902.08651 [pdf, other]

Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation

Authors: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher

Abstract: This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to… ▽ More This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to set algorithm parameters with confidence and an a posteriori error estimator that allows the user to validate the quality of the reconstructed matrix. In comparison to previous techniques, the new method achieves smaller relative approximation errors and is less sensitive to parameter choices. As concrete applications, the paper outlines how the algorithm can be used to compress a Navier--Stokes simulation and a sea surface temperature dataset. △ Less

Submitted 22 February, 2019; originally announced February 2019.

Comments: 70 pages, 33 figures

MSC Class: 65F30; 68W20

arXiv:1902.03373 [pdf, other]

An Optimal-Storage Approach to Semidefinite Programming using Approximate Complementarity

Authors: Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A. Tropp, Madeleine Udell

Abstract: This paper develops a new storage-optimal algorithm that provably solves generic semidefinite programs (SDPs) in standard form. This method is particularly effective for weakly constrained SDPs. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace w… ▽ More This paper develops a new storage-optimal algorithm that provably solves generic semidefinite programs (SDPs) in standard form. This method is particularly effective for weakly constrained SDPs. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution. This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) Solve the dual SDP approximately; (2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix; (3) solve the compressed primal SDP. The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs. △ Less

Submitted 17 June, 2020; v1 submitted 9 February, 2019; originally announced February 2019.

Comments: 35 pages, 24 pages of main text, 17 figures

arXiv:1808.05274 [pdf, other]

Frank-Wolfe Style Algorithms for Large Scale Optimization

Authors: Lijun Ding, Madeleine Udell

Abstract: We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $\mathcal{O}(\frac{1}{k})$. We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $\mathcal{O}(\frac{1}{k})$. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Comments: 28 pages, 5 figures, a chapter of the book "Large-Scale and Distributed Optimization", Springer's Lecture Notes in Mathematics Series, volume 2227, https://www.springer.com/us/book/9783319974774

arXiv:1807.07531 [pdf, other]

Limited Memory Kelley's Method Converges for Composite Convex and Submodular Objectives

Authors: Song Zhou, Swati Gupta, Madeleine Udell

Abstract: The original simplicial method (OSM), a variant of the classic Kelley's cutting plane method, has been shown to converge to the minimizer of a composite convex and submodular objective, though no rate of convergence for this method was known. Moreover, OSM is required to solve subproblems in each iteration whose size grows linearly in the number of iterations. We propose a limited memory version o… ▽ More The original simplicial method (OSM), a variant of the classic Kelley's cutting plane method, has been shown to converge to the minimizer of a composite convex and submodular objective, though no rate of convergence for this method was known. Moreover, OSM is required to solve subproblems in each iteration whose size grows linearly in the number of iterations. We propose a limited memory version of Kelley's method (L-KM) and os OSM that requires limited memory (at most n + 1 constraints for an n-dimensional problem) independent of the iteration. We prove convergence for L-KM when the convex part of the objective (g) is strongly convex and show it converges linearly when g is also smooth. Our analysis relies on duality between minimization of the composite objective and minimization of a convex function over the corresponding submodular base polytope. We introduce a limited memory version, L-FCFW, of the Fully-Corrective Frank-Wolfe (FCFW) method with approximate correction, to solve the dual problem. We show that L-FCFW and L-KM are dual algorithms that produce the same sequence of iterates; hence both converge linearly (when g is smooth and strongly convex) and with limited memory. We propose L-KM to minimize composite convex and submodular objectives; however, our results on L-FCFW hold for general polytopes and may be of independent interest. △ Less

Submitted 17 December, 2018; v1 submitted 19 July, 2018; originally announced July 2018.

Comments: 15 pages, 3 figures with 12 sub-figures

MSC Class: 90C25; 90C27; 90C30

arXiv:1706.05736 [pdf, other]

Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data

Authors: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher

Abstract: Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nyst… ▽ More Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nystrom approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples. △ Less

Submitted 18 June, 2017; originally announced June 2017.

arXiv:1702.06838 [pdf, other]

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Authors: Alp Yurtsever, Madeleine Udell, Joel A. Tropp, Volkan Cevher

Abstract: This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to… ▽ More This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics. △ Less

Submitted 22 February, 2017; originally announced February 2017.

arXiv:1610.05604 [pdf, other]

Dynamic Assortment Personalization in High Dimensions

Authors: Nathan Kallus, Madeleine Udell

Abstract: We study the problem of dynamic assortment personalization with large, heterogeneous populations and wide arrays of products, and demonstrate the importance of structural priors for effective, efficient large-scale personalization. Assortment personalization is the problem of choosing, for each individual (type), a best assortment of products, ads, or other offerings (items) so as to maximize reve… ▽ More We study the problem of dynamic assortment personalization with large, heterogeneous populations and wide arrays of products, and demonstrate the importance of structural priors for effective, efficient large-scale personalization. Assortment personalization is the problem of choosing, for each individual (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce and online advertising where both items and types can number in the millions. We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with $m$ contexts (types) and exponentially many arms (assortments of the $n$ items). We assume that each type's preferences follow a simple parametric model with $n$ parameters. In all, there are $mn$ parameters, and existing literature suggests that order optimal regret scales as $mn$. However, the data required to estimate so many parameters is orders of magnitude larger than the data available in most revenue management applications; and the optimal regret under these models is unacceptably high. In this paper, we impose a natural structure on the problem -- a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically. △ Less

Submitted 2 May, 2019; v1 submitted 18 October, 2016; originally announced October 2016.

arXiv:1609.03285 [pdf, other]

Disciplined Multi-Convex Programming

Authors: Xinyue Shen, Steven Diamond, Madeleine Udell, Yuantao Gu, Stephen Boyd

Abstract: A multi-convex optimization problem is one in which the variables can be partitioned into sets over which the problem is convex when the other variables are fixed. Multi-convex problems are generally solved approximately using variations on alternating or cyclic minimization. Multi-convex problems arise in many applications, such as nonnegative matrix factorization, generalized low rank models, an… ▽ More A multi-convex optimization problem is one in which the variables can be partitioned into sets over which the problem is convex when the other variables are fixed. Multi-convex problems are generally solved approximately using variations on alternating or cyclic minimization. Multi-convex problems arise in many applications, such as nonnegative matrix factorization, generalized low rank models, and structured control synthesis, to name just a few. In most applications to date the multi-convexity is simple to verify by hand. In this paper we study the automatic detection and verification of multi-convexity using the ideas of disciplined convex programming. We describe an implementation of our proposed method that detects and verifies multi-convexity, and then invokes one of the general solution methods. △ Less

Submitted 8 October, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

arXiv:1609.00048 [pdf, other]

doi 10.1137/17M1111590

Practical sketching algorithms for low-rank matrix approximation

Authors: Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher

Abstract: This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably… ▽ More This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data. △ Less

Submitted 2 January, 2018; v1 submitted 31 August, 2016; originally announced September 2016.

MSC Class: Primary 65F30; Secondary 68W20

Journal ref: SIAM J. Matrix Analysis and Applications, Vol. 38, num. 4, pp. 1454-1485, Dec. 2017

arXiv:1606.02338 [pdf, other]

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

Authors: Damek Davis, Brent Edmunds, Madeleine Udell

Abstract: We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates… ▽ More We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix factorization problems. △ Less

Submitted 7 June, 2016; originally announced June 2016.

MSC Class: 90C15; 65K05; 90C26; 90C30

arXiv:1509.05113 [pdf, other]

Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choices

Authors: Nathan Kallus, Madeleine Udell

Abstract: We consider the problem of learning the preferences of a heterogeneous population by observing choices from an assortment of products, ads, or other offerings. Our observation model takes a form common in assortment planning applications: each arriving customer is offered an assortment consisting of a subset of all possible offerings; we observe only the assortment and the customer's single choice… ▽ More We consider the problem of learning the preferences of a heterogeneous population by observing choices from an assortment of products, ads, or other offerings. Our observation model takes a form common in assortment planning applications: each arriving customer is offered an assortment consisting of a subset of all possible offerings; we observe only the assortment and the customer's single choice. In this paper we propose a mixture choice model with a natural underlying low-dimensional structure, and show how to estimate its parameters. In our model, the preferences of each customer or segment follow a separate parametric choice model, but the underlying structure of these parameters over all the models has low dimension. We show that a nuclear-norm regularized maximum likelihood estimator can learn the preferences of all customers using a number of observations much smaller than the number of item-customer combinations. This result shows the potential for structural assumptions to speed up learning and improve revenues in assortment planning and customization. We provide a specialized factored gradient descent algorithm and study the success of the approach empirically. △ Less

Submitted 7 June, 2016; v1 submitted 16 September, 2015; originally announced September 2015.

arXiv:1410.4821 [pdf, ps, other]

Convex Optimization in Julia

Authors: Madeleine Udell, Karanveer Mohan, David Zeng, Jenny Hong, Steven Diamond, Stephen Boyd

Abstract: This paper describes Convex, a convex optimization modeling framework in Julia. Convex translates problems from a user-friendly functional language into an abstract syntax tree describing the problem. This concise representation of the global structure of the problem allows Convex to infer whether the problem complies with the rules of disciplined convex programming (DCP), and to pass the problem… ▽ More This paper describes Convex, a convex optimization modeling framework in Julia. Convex translates problems from a user-friendly functional language into an abstract syntax tree describing the problem. This concise representation of the global structure of the problem allows Convex to infer whether the problem complies with the rules of disciplined convex programming (DCP), and to pass the problem to a suitable solver. These operations are carried out in Julia using multiple dispatch, which dramatically reduces the time required to verify DCP compliance and to parse a problem into conic form. Convex then automatically chooses an appropriate backend solver to solve the conic form problem. △ Less

Submitted 17 October, 2014; originally announced October 2014.

Comments: To appear in Proceedings of the Workshop on High Performance Technical Computing in Dynamic Languages (HPTCDL) 2014

arXiv:1410.4158 [pdf, ps, other]

Bounding Duality Gap for Separable Problems with Linear Constraints

Authors: Madeleine Udell, Stephen Boyd

Abstract: We consider the problem of minimizing a sum of non-convex functions over a compact domain, subject to linear inequality and equality constraints. Approximate solutions can be found by solving a convexified version of the problem, in which each function in the objective is replaced by its convex envelope. We propose a randomized algorithm to solve the convexified problem which finds an $ε$-suboptim… ▽ More We consider the problem of minimizing a sum of non-convex functions over a compact domain, subject to linear inequality and equality constraints. Approximate solutions can be found by solving a convexified version of the problem, in which each function in the objective is replaced by its convex envelope. We propose a randomized algorithm to solve the convexified problem which finds an $ε$-suboptimal solution to the original problem. With probability one, $ε$ is bounded by a term proportional to the maximal number of active constraints in the problem. The bound does not depend on the number of variables in the problem or the number of terms in the objective. In contrast to previous related work, our proof is constructive, self-contained, and gives a bound that is tight. △ Less

Submitted 8 January, 2016; v1 submitted 15 October, 2014; originally announced October 2014.

arXiv:1410.0342 [pdf, other]

Generalized Low Rank Models

Authors: Madeleine Udell, Corinne Horn, Reza Zadeh, Stephen Boyd

Abstract: Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse… ▽ More Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, $k$-means, $k$-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. △ Less

Submitted 5 May, 2015; v1 submitted 1 October, 2014; originally announced October 2014.

Comments: 84 pages, 19 figures

Showing 1–39 of 39 results for author: Udell, M