Skip to main content

Showing 1–32 of 32 results for author: Wright, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09951  [pdf, ps, other

    math.OC cs.LG stat.ML

    Towards Weaker Variance Assumptions for Stochastic Optimization

    Authors: Ahmet Alacaoglu, Yura Malitsky, Stephen J. Wright

    Abstract: We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship t… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  2. arXiv:2502.03701  [pdf, ps, other

    math.OC cs.LG

    First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

    Authors: Oscar Smee, Fred Roosta, Stephen J. Wright

    Abstract: Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural scaling, which often necessitates expensive line searches or heuristic tuning to determine an appropriate step size. In this paper, we address this limitation b… ▽ More

    Submitted 2 June, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    MSC Class: 49

  3. arXiv:2412.11003  [pdf, other

    cs.LG math.OC stat.ML

    Optimal Rates for Robust Stochastic Convex Optimization

    Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

    Abstract: Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the $ε$-contamination model, where an adversary can inspect and replace up to an $ε$-fraction of the samples, a fundamental open problem is determining the optimal rates for robust st… ▽ More

    Submitted 23 April, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

    Comments: The 6th annual Symposium on Foundations of Responsible Computing (FORC 2025)

  4. arXiv:2407.09690  [pdf, other

    cs.LG cs.CR math.OC

    Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

    Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

    Abstract: We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential… ▽ More

    Submitted 6 September, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: The 41st International Conference on Machine Learning (ICML 2024)

  5. arXiv:2403.10547  [pdf, ps, other

    math.OC cs.AI cs.DS cs.LG

    Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

    Authors: Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

    Abstract: Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong c… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  6. arXiv:2402.11173  [pdf, other

    cs.LG cs.CR math.OC

    How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

    Authors: Andrew Lowy, Jonathan Ullman, Stephen J. Wright

    Abstract: We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-c… ▽ More

    Submitted 19 August, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  7. arXiv:2402.05071  [pdf, other

    math.OC cs.LG stat.ML

    Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity

    Authors: Ahmet Alacaoglu, Donghwan Kim, Stephen J. Wright

    Abstract: We focus on constrained, $L$-smooth, potentially stochastic and nonconvex-nonconcave min-max problems either satisfying $ρ$-cohypomonotonicity or admitting a solution to the $ρ$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $ρ>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction… ▽ More

    Submitted 12 August, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the International Conference on Machine Learning (ICML) 2024

  8. arXiv:2311.00678  [pdf, other

    math.OC cs.LG stat.ML

    Complexity of Single Loop Algorithms for Nonlinear Programming with Stochastic Objective and Constraints

    Authors: Ahmet Alacaoglu, Stephen J. Wright

    Abstract: We analyze the complexity of single-loop quadratic penalty and augmented Lagrangian algorithms for solving nonconvex optimization problems with functional equality constraints. We consider three cases, in all of which the objective is stochastic and smooth, that is, an expectation over an unknown distribution that is accessed by sampling. The nature of the equality constraints differs among the th… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  9. arXiv:2310.18841  [pdf, ps, other

    math.OC cs.LG

    A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

    Authors: Shuyao Li, Stephen J. Wright

    Abstract: We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradie… ▽ More

    Submitted 26 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

  10. arXiv:2310.04006  [pdf, other

    math.OC cs.LG

    Accelerating optimization over the space of probability measures

    Authors: Shi Chen, Qin Li, Oliver Tse, Stephen J. Wright

    Abstract: The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this contex… ▽ More

    Submitted 10 November, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

  11. arXiv:2306.02192  [pdf, other

    cs.LG math.NA

    Correcting auto-differentiation in neural-ODE training

    Authors: Yewei Xu, Shi Chen, Qin Li, Stephen J. Wright

    Abstract: Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs high-order forms to approximate the underlying ODE flows (such as the Linear Multistep Method (LMM)), brute-force computation using auto-differentiation often produces non-converging arti… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  12. arXiv:2302.04972  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Differentially Private Optimization for Smooth Nonconvex ERM

    Authors: Changyu Gao, Stephen J. Wright

    Abstract: We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM. We use line search, mini-batching, and a two-phase strategy to improve the speed and practicality of the algorithm. Numerical experiments demonstrate the effectiveness of these approaches.

    Submitted 9 June, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  13. arXiv:2301.07831  [pdf, other

    math.NA cs.MS stat.CO

    Multi-output multilevel best linear unbiased estimators via semidefinite programming

    Authors: M. Croci, K. E. Willcox, S. J. Wright

    Abstract: Multifidelity forward uncertainty quantification (UQ) problems often involve multiple quantities of interest and heterogeneous models (e.g., different grids, equations, dimensions, physics, surrogate and reduced-order models). While computational efficiency is key in this context, multi-output strategies in multilevel/multifidelity methods are either sub-optimal or non-existent. In this paper we e… ▽ More

    Submitted 15 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: 22 pages, 5 figures, 3 tables

  14. arXiv:2212.05088  [pdf, other

    math.OC cs.LG

    Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

    Authors: Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

    Abstract: Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent prog… ▽ More

    Submitted 27 January, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  15. arXiv:2201.07684  [pdf, other

    math.OC cs.LG stat.ML

    On the Complexity of a Practical Primal-Dual Coordinate Method

    Authors: Ahmet Alacaoglu, Volkan Cevher, Stephen J. Wright

    Abstract: We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling. Our complexity bounds either match or improve the best-known results in the literature for both dense and sparse (strongly)-convex-(strongly)-concave problem… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  16. arXiv:2111.01842  [pdf, other

    math.OC cs.LG

    Coordinate Linear Variance Reduction for Generalized Linear Programming

    Authors: Chaobing Song, Cheuk Yin Lin, Stephen J. Wright, Jelena Diakonikolas

    Abstract: We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coo… ▽ More

    Submitted 6 April, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: 39 pages, NeurIPS 2022

  17. arXiv:2104.11079  [pdf, other

    cs.AI cs.CE

    Randomized Algorithms for Scientific Computing (RASC)

    Authors: Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart

    Abstract: Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc… ▽ More

    Submitted 21 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

  18. arXiv:2102.13643  [pdf, other

    math.OC cs.LG math.NA

    Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

    Authors: Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

    Abstract: We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has t… ▽ More

    Submitted 7 April, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

    Comments: 33 pages, 18 figures

  19. arXiv:2010.11366  [pdf, ps, other

    stat.ML cs.LG

    Random Coordinate Underdamped Langevin Monte Carlo

    Authors: Zhiyan Ding, Qin Li, Jianfeng Lu, Stephen J. Wright

    Abstract: The Underdamped Langevin Monte Carlo (ULMC) is a popular Markov chain Monte Carlo sampling method. It requires the computation of the full gradient of the log-density at each iteration, an expensive operation if the dimension of the problem is high. We propose a sampling method called Random Coordinate ULMC (RC-ULMC), which selects a single coordinate at each iteration to be updated and leaves the… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  20. arXiv:2010.01405  [pdf, ps, other

    stat.ML cs.LG

    Random Coordinate Langevin Monte Carlo

    Authors: Zhiyan Ding, Qin Li, Jianfeng Lu, Stephen J. Wright

    Abstract: Langevin Monte Carlo (LMC) is a popular Markov chain Monte Carlo sampling method. One drawback is that it requires the computation of the full gradient at each iteration, an expensive operation if the dimension of the problem is high. We propose a new sampling method: Random Coordinate LMC (RC-LMC). At each iteration, a single coordinate is randomly selected to be updated by a multiple of the part… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  21. arXiv:2005.13815  [pdf, ps, other

    cs.LG math.OC stat.ML

    Adversarial Classification via Distributional Robustness with Wasserstein Ambiguity

    Authors: Nam Ho-Nguyen, Stephen J. Wright

    Abstract: We study a model for adversarial classification based on distributionally robust chance constraints. We show that under Wasserstein ambiguity, the model aims to minimize the conditional value-at-risk of the distance to misclassification, and we explore links to adversarial classification models proposed earlier and to maximum-margin classifiers. We also provide a reformulation of the distributiona… ▽ More

    Submitted 3 November, 2021; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 32 pages

  22. arXiv:1912.08756  [pdf, other

    cs.LG cs.IR stat.ML

    Interleaved Composite Quantization for High-Dimensional Similarity Search

    Authors: Soroosh Khoram, Stephen J Wright, Jing Li

    Abstract: Similarity search retrieves the nearest neighbors of a query vector from a dataset of high-dimensional vectors. As the size of the dataset grows, the cost of performing the distance computations needed to implement a query can become prohibitive. A method often used to reduce this computational cost is quantization of the vector space and location-based encoding of the dataset vectors. These encod… ▽ More

    Submitted 18 December, 2019; v1 submitted 18 December, 2019; originally announced December 2019.

  23. arXiv:1912.06508  [pdf, other

    cs.LG math.OC stat.ML

    A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

    Authors: Ching-pei Lee, Cong Han Lim, Stephen J. Wright

    Abstract: We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

    Comments: arXiv admin note: text overlap with arXiv:1803.01370

  24. arXiv:1803.01370  [pdf, other

    math.OC cs.LG stat.ML

    A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

    Authors: Ching-pei Lee, Cong Han Lim, Stephen J. Wright

    Abstract: We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations, and we… ▽ More

    Submitted 26 May, 2018; v1 submitted 4 March, 2018; originally announced March 2018.

    Comments: In the proceedings of The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

  25. arXiv:1801.08019  [pdf, other

    cs.LG stat.ML

    Training Set Debugging Using Trusted Items

    Authors: Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright

    Abstract: Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for man- ual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning. Specifical… ▽ More

    Submitted 24 January, 2018; originally announced January 2018.

    Comments: AAAI 2018

  26. arXiv:1710.05916  [pdf, other

    cs.CE

    Using Neural Networks to Detect Line Outages from PMU Data

    Authors: Ching-pei Lee, Stephen J. Wright

    Abstract: We propose an approach based on neural networks and the AC power flow equations to identify single- and double-line outages in a power grid using the information from phasor measurement unit sensors (PMUs) placed on only a subset of the buses. Rather than inferring the outage from the sensor data by inverting the physical model, our approach uses the AC model to simulate sensor responses to all ou… ▽ More

    Submitted 27 March, 2018; v1 submitted 16 October, 2017; originally announced October 2017.

  27. arXiv:1309.6964  [pdf, other

    cs.CV

    Online Algorithms for Factorization-Based Structure from Motion

    Authors: Ryan Kennedy, Laura Balzano, Stephen J. Wright, Camillo J. Taylor

    Abstract: We present a family of online algorithms for real-time factorization-based structure from motion, leveraging a relationship between incremental singular value decomposition and recently proposed methods for online matrix completion. Our methods are orders of magnitude faster than previous state of the art, can handle missing data and a variable number of feature points, and are robust to noise and… ▽ More

    Submitted 16 July, 2016; v1 submitted 26 September, 2013; originally announced September 2013.

  28. arXiv:1307.5494  [pdf, other

    math.NA cs.LG stat.ML

    On GROUSE and Incremental SVD

    Authors: Laura Balzano, Stephen J. Wright

    Abstract: GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an incremental algorithm for identifying a subspace of Rn from a sequence of vectors in this subspace, where only a subset of components of each vector is revealed at each iteration. Recent analysis has shown that GROUSE converges locally at an expected linear rate, under certain assumptions. GROUSE has a similar flavor to the incrementa… ▽ More

    Submitted 20 July, 2013; originally announced July 2013.

  29. arXiv:1207.0577  [pdf, ps, other

    stat.ML cs.LG

    Robust Dequantized Compressive Sensing

    Authors: Ji Liu, Stephen J. Wright

    Abstract: We consider the reconstruction problem in compressed sensing in which the observations are recorded in a finite number of bits. They may thus contain quantization errors (from being rounded to the nearest representable value) and saturation errors (from being outside the range of representable values). Our formulation has an objective of weighted $\ell_2$-$\ell_1$ type, along with constraints that… ▽ More

    Submitted 10 October, 2013; v1 submitted 3 July, 2012; originally announced July 2012.

  30. arXiv:1111.0432  [pdf, ps, other

    cs.LG cs.AI

    Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

    Authors: Sangkyun Lee, Stephen J. Wright

    Abstract: Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper describes efficient subgradient approaches without such limitations. Our approaches make use of randomized low-dimensional approximations to nonlinear kernels, a… ▽ More

    Submitted 3 November, 2011; v1 submitted 2 November, 2011; originally announced November 2011.

    Comments: An extended version of the ICPRAM 2012 paper

  31. arXiv:1106.5730  [pdf, other

    math.OC cs.LG

    HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

    Authors: Feng Niu, Benjamin Recht, Christopher Re, Stephen J. Wright

    Abstract: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be impleme… ▽ More

    Submitted 11 November, 2011; v1 submitted 28 June, 2011; originally announced June 2011.

    Comments: 22 pages, 10 figures

  32. arXiv:1104.4385  [pdf, other

    cs.CV stat.ML

    Convex Approaches to Model Wavelet Sparsity Patterns

    Authors: Nikhil S Rao, Robert D. Nowak, Stephen J. Wright, Nick G. Kingsbury

    Abstract: Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-con… ▽ More

    Submitted 22 April, 2011; originally announced April 2011.