Skip to main content

Showing 1–9 of 9 results for author: Shi, H M

Searching in archive math. Search in all archives.
.
  1. arXiv:2309.06497  [pdf, other

    cs.LG cs.DC cs.MS math.OC

    A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

    Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

    Abstract: Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 38 pages, 8 figures, 5 tables

  2. arXiv:2110.06380  [pdf, other

    math.OC

    Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

    Authors: Hao-Jun Michael Shi, Yuchen Xie, Melody Qiming Xuan, Jorge Nocedal

    Abstract: A common approach for minimizing a smooth nonlinear function is to employ finite-difference approximations to the gradient. While this can be easily performed when no error is present within the function evaluations, when the function is noisy, the optimal choice requires information about the noise level and higher-order derivatives of the function, which is often unavailable. Given the noise lev… ▽ More

    Submitted 22 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 39 pages, 20 tables, 6 figures

  3. arXiv:2102.09762  [pdf, other

    math.OC

    On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

    Authors: Hao-Jun Michael Shi, Melody Qiming Xuan, Figen Oztoprak, Jorge Nocedal

    Abstract: The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calcu… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 82 pages, 38 tables, 29 figures

  4. arXiv:2010.04352  [pdf, other

    math.OC

    A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

    Authors: Hao-Jun Michael Shi, Yuchen Xie, Richard Byrd, Jorge Nocedal

    Abstract: This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the lin… ▽ More

    Submitted 8 September, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: 27 pages, 13 figures, 2 tables

  5. arXiv:2008.09653  [pdf, ps, other

    math.OC cs.GT econ.TH math.PR

    Search for a moving target in a competitive environment

    Authors: Benoit Duvocelle, János Flesch, Hui Min Shi, Dries Vermeulen

    Abstract: We consider a discrete-time dynamic search game in which a number of players compete to find an invisible object that is moving according to a time-varying Markov chain. We examine the subgame perfect equilibria of these games. The main result of the paper is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles, i.e. those strategy profiles in which the players… ▽ More

    Submitted 25 August, 2020; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: 14 pages, 0 figures

  6. arXiv:1802.05374  [pdf, other

    math.OC cs.LG stat.ML

    A Progressive Batching L-BFGS Method for Machine Learning

    Authors: Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, Ping Tak Peter Tang

    Abstract: The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization pr… ▽ More

    Submitted 30 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: ICML 2018. 25 pages, 17 figures, 2 tables

  7. arXiv:1610.00040  [pdf, other

    math.OC stat.ML

    A Primer on Coordinate Descent Algorithms

    Authors: Hao-Jun Michael Shi, Shenyinying Tu, Yangyang Xu, Wotao Yin

    Abstract: This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coord… ▽ More

    Submitted 12 January, 2017; v1 submitted 30 September, 2016; originally announced October 2016.

    Report number: UCLA CAM Report 16-67

  8. arXiv:1601.00062  [pdf, other

    stat.ML cs.LG math.OC

    Practical Algorithms for Learning Near-Isometric Linear Embeddings

    Authors: Jerry Luo, Kayla Shapiro, Hao-Jun Michael Shi, Qi Yang, Kan Zhu

    Abstract: We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points $\mathcal{X}$, we consider the secant set $S(\mathcal{X})$ that consists of all pairwise difference vectors of $\mathcal{X}$, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite… ▽ More

    Submitted 22 April, 2016; v1 submitted 1 January, 2016; originally announced January 2016.

    MSC Class: 90C90

  9. arXiv:1512.09184  [pdf, other

    cs.IT math.NA

    Methods for Quantized Compressed Sensing

    Authors: Hao-Jun Michael Shi, Mindy Case, Xiaoyi Gu, Shenyinying Tu, Deanna Needell

    Abstract: In this paper, we compare and catalog the performance of various greedy quantized compressed sensing algorithms that reconstruct sparse signals from quantized compressed measurements. We also introduce two new greedy approaches for reconstruction: Quantized Compressed Sampling Matching Pursuit (QCoSaMP) and Adaptive Outlier Pursuit for Quantized Iterative Hard Thresholding (AOP-QIHT). We compare t… ▽ More

    Submitted 30 December, 2015; originally announced December 2015.

    MSC Class: 94A12; 60D05; 90C25