Skip to main content

Showing 1–7 of 7 results for author: Glasgow, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.13110  [pdf, other

    stat.ML cs.LG

    Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

    Authors: Margalit Glasgow, Denny Wu, Joan Bruna

    Abstract: We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessia… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 70 pages

  2. arXiv:2405.11667  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

    Authors: Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U. Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro

    Abstract: Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice, including mini-batch SGD. Despite this success, theoretically proving the dominance of local SGD in settings with reasonable data heterogeneity has been difficult, creating a significant gap between theory and practice. In this paper, we provide new lower bounds for local SGD under… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  3. arXiv:2309.15111  [pdf, ps, other

    cs.LG stat.ML

    SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

    Authors: Margalit Glasgow

    Abstract: In this work, we consider the optimization process of minibatch stochastic gradient descent (SGD) on a 2-layer neural network with data separated by a quadratic ground truth function. We prove that with data drawn from the $d$-dimensional Boolean hypercube labeled by the quadratic ``XOR'' function $y = -x_ix_j$, it is possible to train to a population error $o(1)$ with $d \:\text{polylog}(d)$ samp… ▽ More

    Submitted 2 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  4. arXiv:2303.03327  [pdf, ps, other

    cs.LG stat.ML

    Tight Bounds for $γ$-Regret via the Decision-Estimation Coefficient

    Authors: Margalit Glasgow, Alexander Rakhlin

    Abstract: In this work, we give a statistical characterization of the $γ$-regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is $γ$ times the optimal solution. The $γ$-regret emerges in structured bandit problems over a function class $\mathcal{F}$ where finding an exact optimum of $f \in \mathcal{F}$ is intractable. Our characterization is given… ▽ More

    Submitted 21 July, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  5. arXiv:2111.03741  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

    Authors: Margalit Glasgow, Honglin Yuan, Tengyu Ma

    Abstract: Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the ex… ▽ More

    Submitted 11 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted to AISTATS 2022. The first two authors contributed equally

  6. arXiv:2009.10717  [pdf, other

    cs.LG cs.DC stat.ML

    Asynchronous Distributed Optimization with Stochastic Delays

    Authors: Margalit Glasgow, Mary Wootters

    Abstract: We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of variance-reduced gradient algorithms like SAGA work well -- little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the… ▽ More

    Submitted 10 March, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2006.09638

  7. arXiv:2006.09638  [pdf, other

    stat.ML cs.DC cs.LG

    Approximate Gradient Coding with Optimal Decoding

    Authors: Margalit Glasgow, Mary Wootters

    Abstract: In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly. Our work is motivated by the challenge of creating a… ▽ More

    Submitted 6 August, 2021; v1 submitted 16 June, 2020; originally announced June 2020.