Skip to main content

Showing 1–50 of 71 results for author: Drusvyatskiy, D

.
  1. arXiv:2505.08277  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    Iteratively reweighted kernel machines efficiently learn sparse functions

    Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

    Abstract: The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influ… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2504.03148  [pdf, other

    math.PR

    Spectral norm bound for the product of random Fourier-Walsh matrices

    Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

    Abstract: We consider matrix products of the form $A_1(A_2A_2)^\top\ldots(A_{m}A_{m}^\top)A_{m+1}$, where $A_i$ are normalized random Fourier-Walsh matrices. We identify an interesting polynomial scaling regime when the operator norm of the expected matrix product tends to zero as the dimension tends to infinity.

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 18 pages, 2 figures

  3. arXiv:2502.05305  [pdf, other

    stat.ML cs.LG math.OC

    Online Covariance Estimation in Nonsmooth Stochastic Approximation

    Authors: Liwei Jiang, Abhishek Roy, Krishna Balasubramanian, Damek Davis, Dmitriy Drusvyatskiy, Sen Na

    Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of Hájek and Le Cam. However, no methods have been proposed to estimate this covariance matrix in a nonsmooth and potenti… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 46 pages, 1 figure

  4. arXiv:2502.01886  [pdf, other

    math.OC math.RT stat.ML

    Invariant Kernels: Rank Stabilization and Generalization Across Dimensions

    Authors: Mateo Díaz, Dmitriy Drusvyatskiy, Jack Kendrick, Rekha R. Thomas

    Abstract: Symmetry arises often when learning from high dimensional data. For example, data sets consisting of point clouds, graphs, and unordered sets appear routinely in contemporary applications, and exhibit rich underlying symmetries. Understanding the benefits of symmetry on the statistical and numerical efficiency of learning algorithms is an active area of research. In this work, we show that symmetr… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  5. arXiv:2409.19791  [pdf, other

    math.OC cs.LG

    Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away fro… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 58 pages, 5 figures

    MSC Class: 65K05; 65K10; 90C30; 90C06

  6. arXiv:2405.09676  [pdf, ps, other

    math.ST math.OC stat.ML

    The radius of statistical efficiency

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singu… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    MSC Class: 90C15; 49K40; 62F12; 90C31

  7. arXiv:2401.04553  [pdf, other

    stat.ML cs.LG

    Linear Recursive Feature Machines provably recover low-rank matrices

    Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Dmitriy Drusvyatskiy

    Abstract: A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  8. arXiv:2306.02601  [pdf, other

    cs.LG math.OC stat.ML

    Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

    Authors: Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma

    Abstract: Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method,… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  9. arXiv:2303.16277  [pdf, ps, other

    math.OC

    The slope robustly determines convex functions

    Authors: Aris Daniilidis, Dmitriy Drusvyatskiy

    Abstract: We show that the deviation between the slopes of two convex functions controls the deviation between the functions themselves. This result reveals that the slope -- a one dimensional construct -- robustly determines convex functions, up to a constant of integration.

    Submitted 28 March, 2023; originally announced March 2023.

    MSC Class: 26B25; 49K40; 37C10; 49J52

  10. arXiv:2301.06632  [pdf, other

    math.OC math.ST stat.ML

    Asymptotic normality and optimality in nonsmooth stochastic approximation

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: In their seminal work, Polyak and Juditsky showed that stochastic approximation algorithms for solving smooth equations enjoy a central limit theorem. Moreover, it has since been argued that the asymptotic covariance of the method is best possible among any estimation procedure in a local minimax sense of Hájek and Le Cam. A long-standing open question in this line of work is whether similar guara… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: The arxiv report arXiv:2108.11832 has been split into two parts. This is Part 2 of the original submission, augmented by a some new results and a reworked exposition

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  11. arXiv:2207.04173  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotica… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 49 pages, 1 figure. v2: revised asymptotic optimality results and reworked exposition. v3: minor updates

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 25(90):1-49, 2024

  12. arXiv:2204.08281  [pdf, other

    math.OC cs.LG stat.ML

    Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

    Authors: Mitas Ray, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff

    Abstract: This paper studies the problem of expected loss minimization given a data distribution that is dependent on the decision-maker's action and evolves dynamically in time according to a geometric decay process. Novel algorithms for both the information setting in which the decision-maker has a first order gradient oracle and the setting in which they have simply a loss function oracle are introduced.… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at AAAI 2022

  13. arXiv:2203.03756  [pdf, other

    cs.LG math.OC stat.ML

    Flat minima generalize for low-rank matrix recovery

    Authors: Lijun Ding, Dmitriy Drusvyatskiy, Maryam Fazel, Zaid Harchaoui

    Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameter… ▽ More

    Submitted 17 February, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 36 pages

  14. arXiv:2201.03398  [pdf, other

    cs.GT cs.LG math.OC

    Multiplayer Performative Prediction: Learning in Decision-Dependent Games

    Authors: Adhyyan Narang, Evan Faulkner, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff

    Abstract: Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter… ▽ More

    Submitted 6 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

  15. arXiv:2112.06969  [pdf, ps, other

    math.OC

    A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye

    Abstract: Zhang et al. introduced a novel modification of Goldstein's classical subgradient method, with an efficiency guarantee of $O(\varepsilon^{-4})$ for minimizing Lipschitz functions. Their work, however, makes use of a nonstandard subgradient oracle model and requires the function to be directionally differentiable. In this paper, we show that both of these assumptions can be dropped by simply adding… ▽ More

    Submitted 15 February, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 14 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  16. arXiv:2111.09456  [pdf, ps, other

    cs.GT

    Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

    Authors: Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J Ratliff

    Abstract: The influential work of Bravo et al. 2018 shows that derivative free play in strongly monotone games has complexity $O(d^2/\varepsilon^3)$, where $\varepsilon$ is the target accuracy on the expected squared distance to the solution. This note shows that the efficiency estimate is actually $O(d^2/\varepsilon^2)$, which reduces to the known efficiency guarantee for the method in unconstrained optimi… ▽ More

    Submitted 6 April, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  17. arXiv:2108.11832  [pdf, other

    math.OC cs.LG

    Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: We show that the subgradient method converges only to local minimizers when applied to generic Lipschitz continuous and subdifferentially regular functions that are definable in an o-minimal structure. At a high level, the argument we present is appealingly transparent: we interpret the nonsmooth dynamics as an approximate Riemannian gradient method on a certain distinguished submanifold that capt… ▽ More

    Submitted 9 January, 2023; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: Version 1 of the arxiv report has been split into two parts. Version 2 of the arxiv report is Part 1 of the original submission. Part 2 will appear as a separate arxiv submission

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  18. arXiv:2108.07356  [pdf, other

    math.OC cs.LG

    Stochastic Optimization under Distributional Drift

    Authors: Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui

    Abstract: We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic converg… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: 56 pages, 7 figures. v2: unified analysis of time- and decision-dependent settings; updated numerical experiments. v3: added references and updated exposition. v4: minor updates to match the version published in JMLR

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 24(147):1-56, 2023

  19. arXiv:2106.09815  [pdf, other

    math.OC cs.LG stat.ML

    Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

    Authors: Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict sad… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 29 pages, 1 figure

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  20. arXiv:2102.08484  [pdf, ps, other

    math.OC

    Conservative and semismooth derivatives are equivalent for semialgebraic maps

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: Subgradient and Newton algorithms for nonsmooth optimization require generalized derivatives to satisfy subtle approximation properties: conservativity for the former and semismoothness for the latter. Though these two properties originate in entirely different contexts, we show that in the semi-algebraic setting they are equivalent. Both properties for a generalized derivative simply require it t… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 12 pages

    MSC Class: Primary: 49J53; 49J52; Secondary: 32B20; 14P15

  21. arXiv:2011.11173  [pdf, other

    math.OC

    Stochastic optimization with decision-dependent distributions

    Authors: Dmitriy Drusvyatskiy, Lin Xiao

    Abstract: Stochastic optimization problems often involve data distributions that change in reaction to the decision variables. This is the case for example when members of the population respond to a deployed classifier by manipulating their features so as to improve the likelihood of being positively labeled. Recent works on performative prediction have identified an intriguing solution concept for such pr… ▽ More

    Submitted 13 December, 2020; v1 submitted 22 November, 2020; originally announced November 2020.

    Comments: 60 pages

    MSC Class: 90C15; 90C25

  22. Stochastic optimization over proximally smooth sets

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Zhan Shi

    Abstract: We introduce a class of stochastic algorithms for minimizing weakly convex functions over proximally smooth sets. As their main building blocks, the algorithms use simplified models of the objective function and the constraint set, along with a retraction operation to restore feasibility. All the proposed methods come equipped with a finite time efficiency guarantee in terms of a natural stationar… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    MSC Class: 65K05; 65K10; 90C15; 90C30

  23. arXiv:1912.07146  [pdf, other

    math.OC cs.LG stat.ML

    Proximal methods avoid active strict saddles of weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

    Submitted 16 February, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

    Comments: 43 pages, 2 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  24. arXiv:1910.13604  [pdf, other

    math.OC

    Pathological subgradient dynamics

    Authors: Aris Daniilidis, Dmitriy Drusvyatskiy

    Abstract: We construct examples of Lipschitz continuous functions, with pathological subgradient dynamics both in continuous and discrete time. In both settings, the iterates generate bounded trajectories, and yet fail to detect any (generalized) critical points of the function.

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: 14 pages, 1 figure

    MSC Class: 90C30; 49J52; 65K10

  25. arXiv:1908.07615  [pdf, other

    math.OC

    Iterative Linearized Control: Stable Algorithms and Complexity Guarantees

    Authors: Vincent Roulet, Siddhartha Srinivasa, Dmitriy Drusvyatskiy, Zaid Harchaoui

    Abstract: We examine popular gradient-based algorithms for nonlinear control in the light of the modern complexity analysis of first-order optimization algorithms. The examination reveals that the complexity bounds can be clearly stated in terms of calls to a computational oracle related to dynamic programming and implementable by gradient back-propagation using machine learning software libraries such as P… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: Short version appeared in International Conference on Machine Learning (ICML) 2019

  26. arXiv:1907.13307  [pdf, ps, other

    math.OC cs.LG stat.ML

    From low probability to high confidence in stochastic convex optimization

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang

    Abstract: Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on "light-tail" noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strong… ▽ More

    Submitted 16 October, 2019; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: 37 pages

    MSC Class: 65K05; 65K10; 90C15; 90C25

  27. arXiv:1907.09547  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic algorithms with geometric step decay converge linearly on sharp functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos

    Abstract: Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called \emph{geometric step decay} and proceeds by halving the step size after every few epochs. In recent work,… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  28. arXiv:1904.10020  [pdf, other

    math.OC cs.LG

    Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence

    Authors: Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy

    Abstract: The task of recovering a low-rank matrix from its noisy linear measurements plays a central role in computational science. Smooth formulations of the problem often exhibit an undesirable phenomenon: the condition number, classically defined, scales poorly with the dimension of the ambient space. In contrast, we here show that in a variety of concrete circumstances, nonsmooth penalty formulations d… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: 80 pages

    MSC Class: 65K10; 90C06

  29. arXiv:1901.01624  [pdf, other

    math.OC cs.LG math.ST

    Composite optimization for robust blind deconvolution

    Authors: Vasileios Charisopoulos, Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: The blind deconvolution problem seeks to recover a pair of vectors from a set of rank one bilinear measurements. We consider a natural nonsmooth formulation of the problem and show that under standard statistical assumptions, its moduli of weak convexity, sharpness, and Lipschitz continuity are all dimension independent. This phenomenon persists even when up to half of the measurements are corrupt… ▽ More

    Submitted 18 January, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: 60 pages, 14 figures

    MSC Class: 65K10; 90C06

  30. arXiv:1811.01298  [pdf, ps, other

    math.OC

    Inexact alternating projections on nonconvex sets

    Authors: Dmitriy Drusvyatskiy, Adrian S. Lewis

    Abstract: Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. Exact projection onto nonconvex sets is typically intractable, but we show that computationally-cheap inexact projections may suffice instead. In particular, if one set is defined by sufficient… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

    MSC Class: 49M20; 65K10; 90C30

  31. arXiv:1810.07590  [pdf, ps, other

    math.OC cs.LG

    Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We investigate the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex. Compositions of Lipschitz convex functions with smooth maps are the primary examples of such losses. We analyze the estimation quality of such nonsmooth and nonconvex problems by their sample average approximations. Our main results establish dimension-… ▽ More

    Submitted 17 December, 2018; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: 36 pages

    MSC Class: 65K10; 90C15; 68Q32

  32. arXiv:1807.00255  [pdf, ps, other

    math.OC cs.LG

    Stochastic model-based minimization under high-order growth

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee

    Abstract: Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively sample and minimize stochastic convex models of the objective function. Assuming that the one-sided approximation quality and the variation of the models is controlled by a Bregman divergence, we show that the scheme drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. Under additional co… ▽ More

    Submitted 30 June, 2018; originally announced July 2018.

    Comments: 30 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  33. arXiv:1804.07795  [pdf, other

    math.OC cs.LG

    Stochastic subgradient method converges on tame functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee

    Abstract: This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity? We prove that the stochastic subgradient method, on any semialgebraic locally Lipschitz function, produces limit points that are all first-order stationary. More generally, our result applies to any function with a Whitney stratifiable graph. In part… ▽ More

    Submitted 25 May, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: 32 pages, 1 figure

    MSC Class: 65K05; 65K10; 90C15; 90C30

  34. arXiv:1803.06523  [pdf, other

    math.OC cs.LG

    Stochastic model-based minimization of weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. As a consequence, we obtain the first complexity guarantees for the stochastic proximal… ▽ More

    Submitted 26 August, 2018; v1 submitted 17 March, 2018; originally announced March 2018.

    Comments: 33 pages, 4 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30

  35. arXiv:1803.02461  [pdf, other

    math.OC

    Subgradient methods for sharp weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee, Courtney Paquette

    Abstract: Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and prova… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

    Comments: 16 pages, 3 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30

  36. arXiv:1802.08556  [pdf, ps, other

    math.OC

    Complexity of finding near-stationary points of convex functions stochastically

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: In a recent paper, we showed that the stochastic subgradient method applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. In this supplementary note, we present a stochastic subgradient method for minimizing a convex function, with the improved rate $\widetilde O(k^{-1/2})$.

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: 9 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  37. arXiv:1802.02988  [pdf, ps, other

    math.OC cs.LG

    Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We prove that the proximal stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. As a consequence, we resolve an open question on the convergence rate of the proximal stochastic gradient method for minimizing the sum of a smooth nonconvex function and a convex proximable function.

    Submitted 19 February, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

    Comments: 12 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  38. arXiv:1712.06038  [pdf, ps, other

    math.OC

    The proximal point method revisited

    Authors: Dmitriy Drusvyatskiy

    Abstract: In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.

    Submitted 16 December, 2017; originally announced December 2017.

    Comments: 11 pages, submitted to SIAG/OPT Views and News

    MSC Class: 65K05; 90C06; 90C25; 90C30

  39. arXiv:1711.03247  [pdf, other

    math.OC

    The nonsmooth landscape of phase retrieval

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette

    Abstract: We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that as the number of Gaussia… ▽ More

    Submitted 6 January, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: 42 Pages, 15 figures

    MSC Class: 65K10; 90C06

  40. arXiv:1706.03705  [pdf, other

    math.OC

    The many faces of degeneracy in conic optimization

    Authors: Dmitriy Drusvyatskiy, Henry Wolkowicz

    Abstract: Slater's condition -- existence of a "strictly feasible solution" -- is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf n… ▽ More

    Submitted 12 June, 2017; originally announced June 2017.

    Comments: 99 pages, 5 figures, 2 tables

  41. arXiv:1703.10993  [pdf, other

    stat.ML math.OC

    Catalyst Acceleration for Gradient-Based Non-Convex Optimization

    Authors: Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui

    Abstract: We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and sign… ▽ More

    Submitted 31 December, 2018; v1 submitted 31 March, 2017; originally announced March 2017.

  42. arXiv:1702.08649  [pdf, other

    math.OC

    Foundations of gauge and perspective duality

    Authors: Alexandre Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Kellie MacPhee

    Abstract: We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allo… ▽ More

    Submitted 18 June, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

    Comments: 29 pages

  43. arXiv:1610.03446  [pdf, ps, other

    math.OC

    Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

    Authors: Dmitriy Drusvyatskiy, Alexander D. Ioffe, Adrian S. Lewis

    Abstract: We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show th… ▽ More

    Submitted 11 October, 2016; originally announced October 2016.

    Comments: 23 pages

    MSC Class: 65K05; 90C30; 49M37; 65K10

  44. arXiv:1606.02395  [pdf, ps, other

    math.OC

    Efficient quadratic penalization through the partial minimization technique

    Authors: Aleksandr Y. Aravkin, Dmitriy Drusvyatskiy, Tristan van Leeuwen

    Abstract: Common computational problems, such as parameter estimation in dynamic models and PDE constrained optimization, require data fitting over a set of auxiliary parameters subject to physical constraints over an underlying state. Naive quadratically penalized formulations, commonly used in practice, suffer from inherent ill-conditioning. We show that surprisingly the partial minimization technique reg… ▽ More

    Submitted 17 September, 2017; v1 submitted 8 June, 2016; originally announced June 2016.

    Comments: 8 pages, 9 figures

    MSC Class: 65K05; 65K10; 86-08

  45. arXiv:1605.00125  [pdf, ps, other

    math.OC

    Efficiency of minimizing compositions of convex functions and smooth maps

    Authors: Dmitriy Drusvyatskiy, Courtney Paquette

    Abstract: We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency… ▽ More

    Submitted 14 August, 2017; v1 submitted 30 April, 2016; originally announced May 2016.

    MSC Class: 97N60; 90C25; 90C06; 90C30

  46. arXiv:1604.06543  [pdf, other

    math.OC

    An optimal first order method based on optimal quadratic averaging

    Authors: Dmitriy Drusvyatskiy, Maryam Fazel, Scott Roy

    Abstract: In a recent paper, Bubeck, Lee, and Singh introduced a new first order method for minimizing smooth strongly convex functions. Their geometric descent algorithm, largely inspired by the ellipsoid method, enjoys the optimal linear rate of convergence. We show that the same iterate sequence is generated by a scheme that in each iteration computes an optimal average of quadratic lower-models of the f… ▽ More

    Submitted 28 February, 2017; v1 submitted 22 April, 2016; originally announced April 2016.

    Comments: 23 pages

    MSC Class: 90C25; 90C06

  47. arXiv:1602.06661  [pdf, ps, other

    math.OC

    Error bounds, quadratic growth, and linear convergence of proximal methods

    Authors: Dmitriy Drusvyatskiy, Adrian S. Lewis

    Abstract: The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the "error" -- the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error boun… ▽ More

    Submitted 27 June, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: 35 pages

    MSC Class: 90C25; 90C31; 90C55; 65K10

  48. arXiv:1602.01506  [pdf, other

    math.OC math.NA

    Level-set methods for convex optimization

    Authors: Aleksandr Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Scott Roy

    Abstract: Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based o… ▽ More

    Submitted 3 February, 2016; originally announced February 2016.

    Comments: 38 pages

  49. arXiv:1601.07210  [pdf, ps, other

    math.OC

    The Euclidean Distance Degree of Orthogonally Invariant Matrix Varieties

    Authors: Dmitriy Drusvyatskiy, Hon-Leung Lee, Giorgio Ottaviani, Rekha R. Thomas

    Abstract: We show that the Euclidean distance degree of a real orthogonally invariant matrix variety equals the Euclidean distance degree of its restriction to diagonal matrices. We illustrate how this result can greatly simplify calculations in concrete circumstances.

    Submitted 26 January, 2016; originally announced January 2016.

    Comments: 18 pages

    MSC Class: 90C26; 15A18; 14R20; 14N10

  50. arXiv:1506.05170  [pdf, ps, other

    math.OC

    Variational analysis of spectral functions simplified

    Authors: D. Drusvyatskiy, C. Kempton

    Abstract: Spectral functions of symmetric matrices -- those depending on matrices only through their eigenvalues -- appear often in optimization. A cornerstone variational analytic tool for studying such functions is a formula relating their subdifferentials to the subdifferentials of their diagonal restrictions. This paper presents a new, short, and revealing derivation of this result. We then round off th… ▽ More

    Submitted 22 July, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

    Comments: 17 pages

    MSC Class: 49J52; 15A18; 49J53; 49R05; 58D19