Skip to main content

Showing 1–33 of 33 results for author: Berahas, A S

Searching in archive math. Search in all archives.
.
  1. arXiv:2506.03358  [pdf, ps, other

    math.OC

    A line search framework with restarting for noisy optimization problems

    Authors: Albert S. Berahas, Michael J. O'Neill, Clément W. Royer

    Abstract: Nonlinear optimization methods are typically iterative and make use of gradient information to determine a direction of improvement and function information to effectively check for progress. When this information is corrupted by noise, designing a convergent and practical algorithmic process becomes challenging, as care must be taken to avoid taking bad steps due to erroneous information. For thi… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  2. arXiv:2505.19382  [pdf, ps, other

    math.OC

    Retrospective Approximation Sequential Quadratic Programming for Stochastic Optimization with General Deterministic Nonlinear Constraints

    Authors: Albert S. Berahas, Raghu Bollapragada, Shagun Gupta

    Abstract: In this paper, we propose a framework based on the Retrospective Approximation (RA) paradigm to solve optimization problems with a stochastic objective function and general nonlinear deterministic constraints. This framework sequentially constructs increasingly accurate approximations of the true problems which are solved to a specified accuracy via a deterministic solver, thereby decoupling the u… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 63 pages, 9 figures

    MSC Class: 49M05; 49M37; 65K05; 90C06; 90C25; 90C30; 90C35

  3. arXiv:2503.06702  [pdf, other

    math.OC

    Optimistic Noise-Aware Sequential Quadratic Programming for Equality Constrained Optimization with Rank-Deficient Jacobians

    Authors: Albert S. Berahas, Jiahao Shi, Baoyu Zhou

    Abstract: We propose and analyze a sequential quadratic programming algorithm for minimizing a noisy nonlinear smooth function subject to noisy nonlinear smooth equality constraints. The algorithm uses a step decomposition strategy and, as a result, is robust to potential rank-deficiency in the constraints, allows for two different step size strategies, and has an early stopping mechanism. Under the linear… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  4. arXiv:2411.10378  [pdf, other

    math.OC

    Exploiting Negative Curvature in Conjunction with Adaptive Sampling: Theoretical Results and a Practical Algorithm

    Authors: Albert S. Berahas, Raghu Bollapragada, Wanping Dong

    Abstract: In this paper, we propose algorithms that exploit negative curvature for solving noisy nonlinear nonconvex unconstrained optimization problems. We consider both deterministic and stochastic inexact settings, and develop two-step algorithms that combine directions of negative curvature and descent directions to update the iterates. Under reasonable assumptions, we prove second-order convergence res… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: 39 pages, 6 figures

  5. arXiv:2406.11144  [pdf, other

    math.OC

    Modified Line Search Sequential Quadratic Methods for Equality-Constrained Optimization with Unified Global and Local Convergence Guarantees

    Authors: Albert S. Berahas, Raghu Bollapragada, Jiahao Shi

    Abstract: In this paper, we propose a method that has foundations in the line search sequential quadratic programming paradigm for solving general nonlinear equality constrained optimization problems. The method employs a carefully designed modified line search strategy that utilizes second-order information of both the objective and constraint functions, as required, to mitigate the Maratos effect. Contrar… ▽ More

    Submitted 26 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  6. arXiv:2404.14758  [pdf, other

    math.OC cs.LG stat.ML

    Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

    Authors: Sachin Garg, Albert S. Berahas, Michał Dereziński

    Abstract: We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-or… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    MSC Class: 65K05; 90C06; 90C30

  7. arXiv:2312.06814  [pdf, other

    math.OC

    A Flexible Gradient Tracking Algorithmic Framework for Decentralized Optimization

    Authors: Albert S. Berahas, Raghu Bollapragada, Shagun Gupta

    Abstract: In decentralized optimization over networks, each node in the network has a portion of the global objective function and the aim is to collectively optimize this function. Gradient tracking methods have emerged as a popular alternative for solving such problems due to their strong theoretical guarantees and robust empirical performance. These methods perform two operations (steps) at each iteratio… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 36 pages, 7 figures, 1 table

    MSC Class: 49M05; 49M37; 65K05; 90C06; 90C25; 90C30; 90C35

  8. arXiv:2311.08615  [pdf, other

    math.OC cs.LG

    Non-Uniform Smoothness for Gradient Descent

    Authors: Albert S. Berahas, Lindon Roberts, Fred Roosta

    Abstract: The analysis of gradient descent-type methods typically relies on the Lipschitz continuity of the objective gradient. This generally requires an expensive hyperparameter tuning process to appropriately calibrate a stepsize for a given problem. In this work we introduce a local first-order smoothness oracle (LFSO) which generalizes the Lipschitz continuous gradients smoothness condition and is appl… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    MSC Class: 65K05; 90C30

  9. arXiv:2309.02626  [pdf, other

    math.OC stat.ML

    Adaptive Consensus: A network pruning approach for decentralized optimization

    Authors: Suhail M. Shah, Albert S. Berahas, Raghu Bollapragada

    Abstract: We consider network-based decentralized optimization problems, where each node in the network possesses a local function and the objective is to collectively attain a consensus solution that minimizes the sum of all the local functions. A major challenge in decentralized optimization is the reliance on communication which remains a considerable bottleneck in many applications. To address this chal… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 35 pages, 3 figures

  10. arXiv:2303.14289  [pdf, other

    math.OC

    Balancing Communication and Computation in Gradient Tracking Algorithms for Decentralized Optimization

    Authors: Albert S. Berahas, Raghu Bollapragada, Shagun Gupta

    Abstract: Gradient tracking methods have emerged as one of the most popular approaches for solving decentralized optimization problems over networks. In this setting, each node in the network has a portion of the global objective function, and the goal is to collectively optimize this function. At every iteration, gradient tracking methods perform two operations (steps): $(1)$ compute local gradients, and… ▽ More

    Submitted 24 November, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: 37 pages, 4 figures, 1 table

  11. arXiv:2301.00477  [pdf, other

    math.OC stat.ML

    A Sequential Quadratic Programming Method with High Probability Complexity Bounds for Nonlinear Equality Constrained Stochastic Optimization

    Authors: Albert S. Berahas, Miaolan Xie, Baoyu Zhou

    Abstract: A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasona… ▽ More

    Submitted 5 October, 2024; v1 submitted 1 January, 2023; originally announced January 2023.

    Comments: 29 pages, 2 figures

  12. arXiv:2210.02418  [pdf, other

    math.OC

    Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients

    Authors: Vivak Patel, Albert S. Berahas

    Abstract: Gradient descent (GD) is a collection of continuous optimization methods that have achieved immeasurable success in practice. Owing to data science applications, GD with diminishing step sizes has become a prominent variant. While this variant of GD has been well-studied in the literature for objectives with globally Lipschitz continuous gradients or by requiring bounded iterates, objectives from… ▽ More

    Submitted 24 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: 32 pages, 1 figure, 1 Table

    MSC Class: 90C26; 68T99; 68W40

  13. arXiv:2206.00712  [pdf, other

    math.OC

    An Adaptive Sampling Sequential Quadratic Programming Method for Equality Constrained Stochastic Optimization

    Authors: Albert S. Berahas, Raghu Bollapragada, Baoyu Zhou

    Abstract: This paper presents a methodology for using varying sample sizes in sequential quadratic programming (SQP) methods for solving equality constrained stochastic optimization problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the gradient in conjunction with inexact solutions to the SQP subproblems. Under reasonable assumptions on the… ▽ More

    Submitted 21 March, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 58 pages, 9 figures, 1 table

  14. arXiv:2205.03667  [pdf, ps, other

    math.OC

    First- and Second-Order High Probability Complexity Bounds for Trust-Region Methods with Noisy Oracles

    Authors: Liyuan Cao, Albert S. Berahas, Katya Scheinberg

    Abstract: In this paper, we present convergence guarantees for a modified trust-region method designed for minimizing objective functions whose value and gradient and Hessian estimates are computed with noise. These estimates are produced by generic stochastic oracles, which are not assumed to be unbiased or consistent. We introduce these oracles and show that they are more general and have more relaxed ass… ▽ More

    Submitted 1 July, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: 42 pages, 5 figures

  15. arXiv:2204.04161  [pdf, other

    math.OC

    Accelerating Stochastic Sequential Quadratic Programming for Equality Constrained Optimization using Predictive Variance Reduction

    Authors: Albert S. Berahas, Jiahao Shi, Zihong Yi, Baoyu Zhou

    Abstract: In this paper, we propose a stochastic method for solving equality constrained optimization problems that utilizes predictive variance reduction. Specifically, we develop a method based on the sequential quadratic programming paradigm that employs variance reduction in the gradient approximations. Under reasonable assumptions, we prove that a measure of first-order stationarity evaluated at the it… ▽ More

    Submitted 24 March, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: 42 pages, 5 figures, 4 tables

  16. arXiv:2107.11908  [pdf, other

    math.OC

    Full-low evaluation methods for derivative-free optimization

    Authors: Albert S. Berahas, Oumaima Sohab, Luis Nunes Vicente

    Abstract: We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical performance for functions of all types, from smooth to non-smooth, and under different noise regimes. To this end, we have developed Full-Low Evaluation methods, organized around two main types of iterations. The first iteration type is expensive in function evaluat… ▽ More

    Submitted 28 October, 2022; v1 submitted 25 July, 2021; originally announced July 2021.

  17. arXiv:2106.13015  [pdf, other

    math.OC stat.ML

    A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

    Authors: Albert S. Berahas, Frank E. Curtis, Michael J. O'Neill, Daniel P. Robinson

    Abstract: A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems in which the objective function is defined by an expectation of a stochastic function. The algorithmic structure of the proposed method is based on a step decomposition strategy that is known in the literature to be widely effective in practice, wherein each search dire… ▽ More

    Submitted 16 March, 2023; v1 submitted 24 June, 2021; originally announced June 2021.

    Report number: Lehigh ISE Technical Report 21T-013-R1

  18. arXiv:2006.03949  [pdf, other

    math.OC stat.ML

    SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

    Authors: Majid Jahani, Mohammadreza Nazari, Rachael Tappenden, Albert S. Berahas, Martin Takáč

    Abstract: This work presents a new algorithm for empirical risk minimization. The algorithm bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement. To this end, partial curvature information is incorporated to help with ill-conditioning, while simult… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: 38 pages, 74 figures

  19. arXiv:2006.01892  [pdf, other

    stat.ML cs.LG math.DS

    Finite Difference Neural Networks: Fast Prediction of Partial Differential Equations

    Authors: Zheng Shi, Nur Sila Gulgec, Albert S. Berahas, Shamim N. Pakzad, Martin Takáč

    Abstract: Discovering the underlying behavior of complex systems is an important topic in many science and engineering disciplines. In this paper, we propose a novel neural network framework, finite difference neural networks (FDNet), to learn partial differential equations from data. Specifically, our proposed finite difference inspired network is designed to learn the underlying governing partial differen… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 38 pages, 48 figures

  20. On the Convergence of Nested Decentralized Gradient Methods with Multiple Consensus and Gradient Steps

    Authors: Albert S. Berahas, Raghu Bollapragada, Ermin Wei

    Abstract: In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where the cost of communication and/or computation can be expensive. We extend and generalize the analysis for a class of nested gradient-based distributed algorithms (NEAR-DGD; Berahas, Bollapragada, Keskar and Wei, 2018) to account for multiple gradient steps at every iteration. We show the… ▽ More

    Submitted 7 July, 2021; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: 12 pages, 4 figures. arXiv admin note: text overlap with arXiv:1903.08149

  21. arXiv:1910.04055  [pdf, other

    math.OC

    Global Convergence Rate Analysis of a Generic Line Search Algorithm with Noise

    Authors: Albert S. Berahas, Liyuan Cao, Katya Scheinberg

    Abstract: In this paper, we develop convergence analysis of a modified line search method for objective functions whose value is computed with noise and whose gradient estimates are inexact and possibly random. The noise is assumed to be bounded in absolute value without any additional assumptions. We extend the framework based on stochastic methods from [Cartis and Scheinberg, 2018] which was developed to… ▽ More

    Submitted 3 March, 2021; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: 30 pages. arXiv admin note: text overlap with arXiv:1905.01332

  22. arXiv:1905.13096  [pdf, other

    math.OC stat.ML

    Scaling Up Quasi-Newton Algorithms: Communication Efficient Distributed SR1

    Authors: Majid Jahani, Mohammadreza Nazari, Sergey Rusakov, Albert S. Berahas, Martin Takáč

    Abstract: In this paper, we present a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications at every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant that: (i) drastically reduces the amount of data… ▽ More

    Submitted 13 May, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: 24 pages, 14 figures, 6 tables

  23. arXiv:1905.13043  [pdf, other

    math.OC stat.ML

    Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization

    Authors: Albert S Berahas, Liyuan Cao, Krzysztof Choromanski, Katya Scheinberg

    Abstract: In this paper, we consider derivative free optimization problems, where the objective function is smooth but is computed with some amount of noise, the function evaluations are expensive and no derivative information is available. We are motivated by policy optimization problems in reinforcement learning that have recently become popular [Choromaski et al. 2018; Fazel et al. 2018; Salimans et al.… ▽ More

    Submitted 2 June, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: 14 pages, 2 figures. arXiv admin note: text overlap with arXiv:1905.01332

  24. arXiv:1905.01332  [pdf, other

    math.OC

    A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization

    Authors: Albert S. Berahas, Liyuan Cao, Krzysztof Choromanski, Katya Scheinberg

    Abstract: In this paper, we analyze several methods for approximating gradients of noisy functions using only function values. These methods include finite differences, linear interpolation, Gaussian smoothing and smoothing on a sphere. The methods differ in the number of functions sampled, the choice of the sample points, and the way in which the gradient approximations are derived. For each method, we der… ▽ More

    Submitted 25 March, 2021; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: 42 pages, 7 figures, 4 tables

  25. arXiv:1903.08149  [pdf, other

    math.OC

    Nested Distributed Gradient Methods with Adaptive Quantized Communication

    Authors: Albert S. Berahas, Charikleia Iakovidou, Ermin Wei

    Abstract: In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where communication can be costly. We propose and analyze a class of nested distributed gradient methods with adaptive quantized communication (NEAR-DGD+Q). We show the effect of performing multiple quantized communication steps on the rate of convergence and on the size of the neighborhood of… ▽ More

    Submitted 26 August, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

    Comments: 9 pages, 2 figures. arXiv admin note: text overlap with arXiv:1709.02999

  26. arXiv:1903.03471  [pdf, other

    math.OC

    Limited-Memory BFGS with Displacement Aggregation

    Authors: Albert S. Berahas, Frank E. Curtis, Baoyu Zhou

    Abstract: A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS (a.k.a. L-BFGS) method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can ach… ▽ More

    Submitted 25 August, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

    Report number: Lehigh University ISE/COR@L Technical Report 19T-001

  27. arXiv:1901.09997  [pdf, other

    math.OC cs.LG stat.ML

    Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

    Authors: Albert S. Berahas, Majid Jahani, Peter Richtárik, Martin Takáč

    Abstract: We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to… ▽ More

    Submitted 27 July, 2021; v1 submitted 28 January, 2019; originally announced January 2019.

    Comments: 50 pages, 33 figures

  28. arXiv:1803.10173  [pdf, other

    math.OC

    Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods

    Authors: Albert S. Berahas, Richard H. Byrd, Jorge Nocedal

    Abstract: This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval $h$ based on the noise estimation techniques of Hamming (2012) and Moré and Wild (2011). This noise estimation procedure and the selection of $h$ are in… ▽ More

    Submitted 8 January, 2019; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: 26 pages, 9 figures

  29. arXiv:1709.02999  [pdf, other

    math.OC

    Balancing Communication and Computation in Distributed Optimization

    Authors: Albert S. Berahas, Raghu Bollapragada, Nitish Shirish Keskar, Ermin Wei

    Abstract: Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains. A distributed optimization method typically consists of two key components: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in the network requires some form of informa… ▽ More

    Submitted 31 May, 2018; v1 submitted 9 September, 2017; originally announced September 2017.

    Comments: 16 pages, 4 figures. Accepted to IEEE Transactions on Automatic Control

  30. arXiv:1707.08552  [pdf, other

    math.OC cs.LG stat.ML

    A Robust Multi-Batch L-BFGS Method for Machine Learning

    Authors: Albert S. Berahas, Martin Takáč

    Abstract: This paper describes an implementation of the L-BFGS method designed to deal with two adversarial situations. The first occurs in distributed computing environments where some of the computational nodes devoted to the evaluation of the function and gradient are unable to return results on time. A similar challenge occurs in a multi-batch approach in which the data points used to compute function a… ▽ More

    Submitted 27 August, 2019; v1 submitted 26 July, 2017; originally announced July 2017.

    Comments: 50 pages, 33 figures. Extension of NIPS 2016 paper: arXiv:1605.06049

  31. arXiv:1705.06211  [pdf, other

    math.OC cs.LG stat.ML

    An Investigation of Newton-Sketch and Subsampled Newton Methods

    Authors: Albert S. Berahas, Raghu Bollapragada, Jorge Nocedal

    Abstract: Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton's method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: Hessian subsampling and randomi… ▽ More

    Submitted 30 May, 2019; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: 36 pages, 22 figures

  32. arXiv:1605.06049  [pdf, other

    math.OC cs.LG stat.ML

    A Multi-Batch L-BFGS Method for Machine Learning

    Authors: Albert S. Berahas, Jorge Nocedal, Martin Takáč

    Abstract: The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the… ▽ More

    Submitted 23 October, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: NIPS 2016. 31 pages, 22 figures

  33. arXiv:1511.01169  [pdf, ps, other

    cs.LG math.OC stat.ML

    adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

    Authors: Nitish Shirish Keskar, Albert S. Berahas

    Abstract: Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known "vanishing/exploding" gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or atte… ▽ More

    Submitted 23 February, 2016; v1 submitted 3 November, 2015; originally announced November 2015.