-
Large-scale Optimization with Linear Equality Constraints using Reduced Compact Representation
Authors:
Johannes J. Brust,
Roummel F. Marcia,
Cosmin G. Petra,
Michael A. Saunders
Abstract:
For optimization problems with linear equality constraints, we prove that the (1,1) block of the inverse KKT matrix remains unchanged when projected onto the nullspace of the constraint matrix. We develop reduced compact representations of the limited-memory inverse BFGS Hessian to compute search directions efficiently when the constraint Jacobian is sparse. Orthogonal projections are implemented…
▽ More
For optimization problems with linear equality constraints, we prove that the (1,1) block of the inverse KKT matrix remains unchanged when projected onto the nullspace of the constraint matrix. We develop reduced compact representations of the limited-memory inverse BFGS Hessian to compute search directions efficiently when the constraint Jacobian is sparse. Orthogonal projections are implemented by a sparse QR factorization or a preconditioned LSQR iteration. In numerical experiments two proposed trust-region algorithms improve in computation times, often significantly, compared to previous implementations of related algorithms and compared to IPOPT.
△ Less
Submitted 23 August, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Quasi-Newton Optimization Methods For Deep Learning Applications
Authors:
Jacob Rafati,
Roummel F. Marcia
Abstract:
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compu…
▽ More
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives to SGD. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this chapter, we propose efficient optimization methods based on L-BFGS quasi-Newton methods using line search and trust-region strategies. Our methods bridge the disparity between first- and second-order methods by using gradient information to calculate low-rank updates to Hessian approximations. We provide formal convergence analysis of these methods as well as empirical results on deep learning applications, such as image classification tasks and deep reinforcement learning on a set of ATARI 2600 video games. Our results show a robust convergence with preferred generalization characteristics as well as fast training time.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
Deep Reinforcement Learning via L-BFGS Optimization
Authors:
Jacob Rafati,
Roummel F. Marcia
Abstract:
Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and nonlinear unconstrained optimization problem. Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as st…
▽ More
Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and nonlinear unconstrained optimization problem. Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle points and their performance can be seriously obstructed by ill-conditioning. Furthermore, SGD methods require exhaustive trial and error to fine-tune many learning parameters. Using second derivative information can result in improved convergence properties, but computing the Hessian matrix for large-scale problems is not practical. Quasi-Newton methods require only first-order gradient information, like SGD, but they can construct a low rank approximation of the Hessian matrix and result in superlinear convergence. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this paper, we introduce an efficient optimization method, based on the limited memory BFGS quasi-Newton method using line search strategy -- as an alternative to SGD methods. Our method bridges the disparity between first order methods and second order methods by continuing to use gradient information to calculate a low-rank Hessian approximations. We provide formal convergence analysis as well as empirical results on a subset of the classic ATARI 2600 games. Our results show a robust convergence with preferred generalization characteristics, as well as fast training time and no need for the experience replaying mechanism.
△ Less
Submitted 16 April, 2019; v1 submitted 6 November, 2018;
originally announced November 2018.
-
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations
Authors:
Jennifer B. Erway,
Joshua Griffin,
Roummel F. Marcia,
Riadh Omheni
Abstract:
Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not requ…
▽ More
Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but suffer from approximating a potentially indefinite Hessian with a positive-definite matrix. Hessian-free methods leverage the ability to perform Hessian-vector multiplication without needing the entire Hessian matrix, but each iteration's complexity is significantly greater than quasi-Newton methods. In this paper we propose an alternative approach for solving ML problems based on a quasi-Newton trust-region framework for solving large-scale optimization problems that allow for indefinite Hessian approximations. Numerical experiments on a standard testing data set show that with a fixed computational time budget, the proposed methods achieve better results than the traditional limited-memory BFGS and the Hessian-free methods.
△ Less
Submitted 22 May, 2019; v1 submitted 30 June, 2018;
originally announced July 2018.
-
Compressive Coded Aperture Keyed Exposure Imaging with Optical Flow Reconstruction
Authors:
Zachary T. Harmany,
Roummel F. Marcia,
Rebecca M. Willett
Abstract:
This paper describes a coded aperture and keyed exposure approach to compressive video measurement which admits a small physical platform, high photon efficiency, high temporal resolution, and fast reconstruction algorithms. The proposed projections satisfy the Restricted Isometry Property (RIP), and hence compressed sensing theory provides theoretical guarantees on the video reconstruction qualit…
▽ More
This paper describes a coded aperture and keyed exposure approach to compressive video measurement which admits a small physical platform, high photon efficiency, high temporal resolution, and fast reconstruction algorithms. The proposed projections satisfy the Restricted Isometry Property (RIP), and hence compressed sensing theory provides theoretical guarantees on the video reconstruction quality. Moreover, the projections can be easily implemented using existing optical elements such as spatial light modulators (SLMs). We extend these coded mask designs to novel dual-scale masks (DSMs) which enable the recovery of a coarse-resolution estimate of the scene with negligible computational cost. We develop fast numerical algorithms which utilize both temporal correlations and optical flow in the video sequence as well as the innovative structure of the projections. Our numerical experiments demonstrate the efficacy of the proposed approach on short-wave infrared data.
△ Less
Submitted 26 June, 2013;
originally announced June 2013.
-
Spatio-temporal Compressed Sensing with Coded Apertures and Keyed Exposures
Authors:
Zachary T. Harmany,
Roummel F. Marcia,
Rebecca M. Willett
Abstract:
Optical systems which measure independent random projections of a scene according to compressed sensing (CS) theory face a myriad of practical challenges related to the size of the physical platform, photon efficiency, the need for high temporal resolution, and fast reconstruction in video settings. This paper describes a coded aperture and keyed exposure approach to compressive measurement in opt…
▽ More
Optical systems which measure independent random projections of a scene according to compressed sensing (CS) theory face a myriad of practical challenges related to the size of the physical platform, photon efficiency, the need for high temporal resolution, and fast reconstruction in video settings. This paper describes a coded aperture and keyed exposure approach to compressive measurement in optical systems. The proposed projections satisfy the Restricted Isometry Property for sufficiently sparse scenes, and hence are compatible with theoretical guarantees on the video reconstruction quality. These concepts can be implemented in both space and time via either amplitude modulation or phase shifting, and this paper describes the relative merits of the two approaches in terms of theoretical performance, noise and hardware considerations, and experimental results. Fast numerical algorithms which account for the nonnegativity of the projections and temporal correlations in a video sequence are developed and applied to microscopy and short-wave infrared data.
△ Less
Submitted 6 January, 2012; v1 submitted 30 November, 2011;
originally announced November 2011.
-
This is SPIRAL-TAP: Sparse Poisson Intensity Reconstruction ALgorithms - Theory and Practice
Authors:
Zachary T. Harmany,
Roummel F. Marcia,
Rebecca M. Willett
Abstract:
The observations in many applications consist of counts of discrete events, such as photons hitting a detector, which cannot be effectively modeled using an additive bounded or Gaussian noise model, and instead require a Poisson noise model. As a result, accurate reconstruction of a spatially or temporally distributed phenomenon (f*) from Poisson data (y) cannot be effectively accomplished by mini…
▽ More
The observations in many applications consist of counts of discrete events, such as photons hitting a detector, which cannot be effectively modeled using an additive bounded or Gaussian noise model, and instead require a Poisson noise model. As a result, accurate reconstruction of a spatially or temporally distributed phenomenon (f*) from Poisson data (y) cannot be effectively accomplished by minimizing a conventional penalized least-squares objective function. The problem addressed in this paper is the estimation of f* from y in an inverse problem setting, where (a) the number of unknowns may potentially be larger than the number of observations and (b) f* admits a sparse approximation. The optimization formulation considered in this paper uses a penalized negative Poisson log-likelihood objective function with nonnegativity constraints (since Poisson intensities are naturally nonnegative). In particular, the proposed approach incorporates key ideas of using separable quadratic approximations to the objective function at each iteration and penalization terms related to l1 norms of coefficient vectors, total variation seminorms, and partition-based multiscale estimation methods.
△ Less
Submitted 12 October, 2011; v1 submitted 24 May, 2010;
originally announced May 2010.