Skip to main content

Showing 1–9 of 9 results for author: De Sa, C

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.05033  [pdf, other

    cs.LG math.OC

    Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

    Authors: Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

    Abstract: We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical… ▽ More

    Submitted 4 November, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2311.05580  [pdf, other

    cs.DS cs.AI cs.CC math.PR

    Inference for Probabilistic Dependency Graphs

    Authors: Oliver E. Richardson, Joseph Y. Halpern, Christopher De Sa

    Abstract: Probabilistic dependency graphs (PDGs) are a flexible class of probabilistic graphical models, subsuming Bayesian Networks and Factor Graphs. They can also capture inconsistent beliefs, and provide a way of measuring the degree of this inconsistency. We present the first tractable inference algorithm for PDGs with discrete variables, making the asymptotic complexity of PDG inference similar that o… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: extended version of the paper with corrected reduction proof

    Journal ref: PMLR 216:1741-1751, 2023

  3. arXiv:2302.00845  [pdf, other

    cs.LG cs.DC math.OC

    Coordinating Distributed Example Orders for Provably Accelerated Training

    Authors: A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

    Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: whil… ▽ More

    Submitted 21 December, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  4. arXiv:2210.06705  [pdf, ps, other

    cs.LG cs.AI math.OC

    From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

    Authors: Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

    Abstract: Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is provi… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  5. arXiv:2107.08596  [pdf, other

    stat.ML cs.LG math.DG

    Equivariant Manifold Flows

    Authors: Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

    Abstract: Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries -- a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning… ▽ More

    Submitted 27 January, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: Published at NeurIPS 2021

  6. arXiv:2006.10254  [pdf, other

    stat.ML cs.LG math.DG

    Neural Manifold Ordinary Differential Equations

    Authors: Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

    Abstract: To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Submitted to NeurIPS 2020

  7. arXiv:1707.02670  [pdf, other

    math.OC cs.DS cs.LG math.NA stat.ML

    Accelerated Stochastic Power Iteration

    Authors: Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

    Abstract: Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/Δ)$ full-data passes to recover the principal component of a matrix with eigen-gap $Δ$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrtΔ)$ passes. Modern applications, however, motivate m… ▽ More

    Submitted 9 July, 2017; originally announced July 2017.

    Comments: 37 pages, 5 figures

  8. arXiv:1506.06438  [pdf, ps, other

    cs.LG math.OC stat.ML

    Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

    Authors: Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

    Abstract: Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifica… ▽ More

    Submitted 2 October, 2015; v1 submitted 21 June, 2015; originally announced June 2015.

  9. arXiv:1411.1134  [pdf, ps, other

    cs.LG math.OC stat.ML

    Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

    Authors: Christopher De Sa, Kunle Olukotun, Christopher Ré

    Abstract: Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within… ▽ More

    Submitted 10 February, 2015; v1 submitted 4 November, 2014; originally announced November 2014.