Skip to main content

Showing 1–24 of 24 results for author: Drusvyatskiy, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08277  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    Iteratively reweighted kernel machines efficiently learn sparse functions

    Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

    Abstract: The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influ… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2502.05305  [pdf, other

    stat.ML cs.LG math.OC

    Online Covariance Estimation in Nonsmooth Stochastic Approximation

    Authors: Liwei Jiang, Abhishek Roy, Krishna Balasubramanian, Damek Davis, Dmitriy Drusvyatskiy, Sen Na

    Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of Hájek and Le Cam. However, no methods have been proposed to estimate this covariance matrix in a nonsmooth and potenti… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 46 pages, 1 figure

  3. arXiv:2409.19791  [pdf, other

    math.OC cs.LG

    Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away fro… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 58 pages, 5 figures

    MSC Class: 65K05; 65K10; 90C30; 90C06

  4. arXiv:2401.04553  [pdf, other

    stat.ML cs.LG

    Linear Recursive Feature Machines provably recover low-rank matrices

    Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Dmitriy Drusvyatskiy

    Abstract: A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  5. arXiv:2306.02601  [pdf, other

    cs.LG math.OC stat.ML

    Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

    Authors: Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma

    Abstract: Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method,… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  6. arXiv:2207.04173  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotica… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 49 pages, 1 figure. v2: revised asymptotic optimality results and reworked exposition. v3: minor updates

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 25(90):1-49, 2024

  7. arXiv:2204.08281  [pdf, other

    math.OC cs.LG stat.ML

    Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

    Authors: Mitas Ray, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff

    Abstract: This paper studies the problem of expected loss minimization given a data distribution that is dependent on the decision-maker's action and evolves dynamically in time according to a geometric decay process. Novel algorithms for both the information setting in which the decision-maker has a first order gradient oracle and the setting in which they have simply a loss function oracle are introduced.… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at AAAI 2022

  8. arXiv:2203.03756  [pdf, other

    cs.LG math.OC stat.ML

    Flat minima generalize for low-rank matrix recovery

    Authors: Lijun Ding, Dmitriy Drusvyatskiy, Maryam Fazel, Zaid Harchaoui

    Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameter… ▽ More

    Submitted 17 February, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 36 pages

  9. arXiv:2201.03398  [pdf, other

    cs.GT cs.LG math.OC

    Multiplayer Performative Prediction: Learning in Decision-Dependent Games

    Authors: Adhyyan Narang, Evan Faulkner, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff

    Abstract: Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter… ▽ More

    Submitted 6 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

  10. arXiv:2111.09456  [pdf, ps, other

    cs.GT

    Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

    Authors: Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J Ratliff

    Abstract: The influential work of Bravo et al. 2018 shows that derivative free play in strongly monotone games has complexity $O(d^2/\varepsilon^3)$, where $\varepsilon$ is the target accuracy on the expected squared distance to the solution. This note shows that the efficiency estimate is actually $O(d^2/\varepsilon^2)$, which reduces to the known efficiency guarantee for the method in unconstrained optimi… ▽ More

    Submitted 6 April, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  11. arXiv:2108.11832  [pdf, other

    math.OC cs.LG

    Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: We show that the subgradient method converges only to local minimizers when applied to generic Lipschitz continuous and subdifferentially regular functions that are definable in an o-minimal structure. At a high level, the argument we present is appealingly transparent: we interpret the nonsmooth dynamics as an approximate Riemannian gradient method on a certain distinguished submanifold that capt… ▽ More

    Submitted 9 January, 2023; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: Version 1 of the arxiv report has been split into two parts. Version 2 of the arxiv report is Part 1 of the original submission. Part 2 will appear as a separate arxiv submission

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  12. arXiv:2108.07356  [pdf, other

    math.OC cs.LG

    Stochastic Optimization under Distributional Drift

    Authors: Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui

    Abstract: We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic converg… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: 56 pages, 7 figures. v2: unified analysis of time- and decision-dependent settings; updated numerical experiments. v3: added references and updated exposition. v4: minor updates to match the version published in JMLR

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 24(147):1-56, 2023

  13. arXiv:2106.09815  [pdf, other

    math.OC cs.LG stat.ML

    Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

    Authors: Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict sad… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 29 pages, 1 figure

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  14. arXiv:1912.07146  [pdf, other

    math.OC cs.LG stat.ML

    Proximal methods avoid active strict saddles of weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

    Submitted 16 February, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

    Comments: 43 pages, 2 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  15. arXiv:1907.13307  [pdf, ps, other

    math.OC cs.LG stat.ML

    From low probability to high confidence in stochastic convex optimization

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang

    Abstract: Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on "light-tail" noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strong… ▽ More

    Submitted 16 October, 2019; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: 37 pages

    MSC Class: 65K05; 65K10; 90C15; 90C25

  16. arXiv:1907.09547  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic algorithms with geometric step decay converge linearly on sharp functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos

    Abstract: Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called \emph{geometric step decay} and proceeds by halving the step size after every few epochs. In recent work,… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  17. arXiv:1904.10020  [pdf, other

    math.OC cs.LG

    Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence

    Authors: Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy

    Abstract: The task of recovering a low-rank matrix from its noisy linear measurements plays a central role in computational science. Smooth formulations of the problem often exhibit an undesirable phenomenon: the condition number, classically defined, scales poorly with the dimension of the ambient space. In contrast, we here show that in a variety of concrete circumstances, nonsmooth penalty formulations d… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: 80 pages

    MSC Class: 65K10; 90C06

  18. arXiv:1901.01624  [pdf, other

    math.OC cs.LG math.ST

    Composite optimization for robust blind deconvolution

    Authors: Vasileios Charisopoulos, Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: The blind deconvolution problem seeks to recover a pair of vectors from a set of rank one bilinear measurements. We consider a natural nonsmooth formulation of the problem and show that under standard statistical assumptions, its moduli of weak convexity, sharpness, and Lipschitz continuity are all dimension independent. This phenomenon persists even when up to half of the measurements are corrupt… ▽ More

    Submitted 18 January, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: 60 pages, 14 figures

    MSC Class: 65K10; 90C06

  19. arXiv:1810.07590  [pdf, ps, other

    math.OC cs.LG

    Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We investigate the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex. Compositions of Lipschitz convex functions with smooth maps are the primary examples of such losses. We analyze the estimation quality of such nonsmooth and nonconvex problems by their sample average approximations. Our main results establish dimension-… ▽ More

    Submitted 17 December, 2018; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: 36 pages

    MSC Class: 65K10; 90C15; 68Q32

  20. arXiv:1807.00255  [pdf, ps, other

    math.OC cs.LG

    Stochastic model-based minimization under high-order growth

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee

    Abstract: Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively sample and minimize stochastic convex models of the objective function. Assuming that the one-sided approximation quality and the variation of the models is controlled by a Bregman divergence, we show that the scheme drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. Under additional co… ▽ More

    Submitted 30 June, 2018; originally announced July 2018.

    Comments: 30 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  21. arXiv:1804.07795  [pdf, other

    math.OC cs.LG

    Stochastic subgradient method converges on tame functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee

    Abstract: This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity? We prove that the stochastic subgradient method, on any semialgebraic locally Lipschitz function, produces limit points that are all first-order stationary. More generally, our result applies to any function with a Whitney stratifiable graph. In part… ▽ More

    Submitted 25 May, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: 32 pages, 1 figure

    MSC Class: 65K05; 65K10; 90C15; 90C30

  22. arXiv:1803.06523  [pdf, other

    math.OC cs.LG

    Stochastic model-based minimization of weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. As a consequence, we obtain the first complexity guarantees for the stochastic proximal… ▽ More

    Submitted 26 August, 2018; v1 submitted 17 March, 2018; originally announced March 2018.

    Comments: 33 pages, 4 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30

  23. arXiv:1802.02988  [pdf, ps, other

    math.OC cs.LG

    Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We prove that the proximal stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. As a consequence, we resolve an open question on the convergence rate of the proximal stochastic gradient method for minimizing the sum of a smooth nonconvex function and a convex proximable function.

    Submitted 19 February, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

    Comments: 12 pages

    MSC Class: 65K05; 65K10; 90C15; 90C30

  24. arXiv:1108.4336  [pdf, other

    cs.CG math.CO

    Complexity of a Single Face in an Arrangement of s-Intersecting Curves

    Authors: Boris Aronov, Dmitriy Drusvyatskiy

    Abstract: Consider a face F in an arrangement of n Jordan curves in the plane, no two of which intersect more than s times. We prove that the combinatorial complexity of F is O(λ_s(n)), O(λ_{s+1}(n)), and O(λ_{s+2}(n)), when the curves are bi-infinite, semi-infinite, or bounded, respectively; λ_k(n) is the maximum length of a Davenport-Schinzel sequence of order k on an alphabet of n symbols. Our bounds a… ▽ More

    Submitted 22 August, 2011; originally announced August 2011.

    Comments: 9 pages, 5 figures

    MSC Class: 52C30; 52C45 ACM Class: I.3.5