Skip to main content

Showing 1–7 of 7 results for author: Patel, K K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.11667  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

    Authors: Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U. Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro

    Abstract: Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice, including mini-batch SGD. Despite this success, theoretically proving the dominance of local SGD in settings with reasonable data heterogeneity has been difficult, creating a significant gap between theory and practice. In this paper, we provide new lower bounds for local SGD under… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  2. arXiv:2311.17586  [pdf, other

    cs.LG math.OC stat.ML

    Federated Online and Bandit Convex Optimization

    Authors: Kumar Kshitij Patel, Lingxiao Wang, Aadirupa Saha, Nati Sebro

    Abstract: We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  3. arXiv:2110.02954  [pdf, other

    math.OC cs.LG stat.ML

    A Stochastic Newton Algorithm for Distributed Convex Optimization

    Authors: Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

    Abstract: We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  4. arXiv:2006.04735  [pdf, other

    cs.LG math.OC stat.ML

    Minibatch vs Local SGD for Heterogeneous Distributed Learning

    Authors: Blake Woodworth, Kumar Kshitij Patel, Nathan Srebro

    Abstract: We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective; the goal is to optimize w.r.t. the average objective; and machines can only communicate intermittently. We argue that, (i) Minibatch SGD (even without acceleration) domina… ▽ More

    Submitted 1 March, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 34 pages

  5. arXiv:2002.07839  [pdf, other

    cs.LG math.OC stat.ML

    Is Local SGD Better than Minibatch SGD?

    Authors: Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

    Abstract: We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibat… ▽ More

    Submitted 20 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: 29 pages

  6. arXiv:1904.11325  [pdf, other

    cs.LG math.OC stat.ML

    Communication trade-offs for synchronized distributed SGD with large step size

    Authors: Kumar Kshitij Patel, Aymeric Dieuleveut

    Abstract: Synchronous mini-batch SGD is state-of-the-art for large-scale distributed machine learning. However, in practice, its convergence is bottlenecked by slow communication rounds between worker nodes. A natural solution to reduce communication is to use the \emph{`local-SGD'} model in which the workers train their model independently and synchronize every once in a while. This algorithm improves the… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

  7. arXiv:1808.07217  [pdf, other

    cs.LG stat.ML

    Don't Use Large Mini-Batches, Use Local SGD

    Authors: Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

    Abstract: Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency and scalability gains in recent years. However, progress faces a major roadblock, as models trained with large batches often do not generalize well, i.e. they do not show good accuracy on new data. As a remedy, we… ▽ More

    Submitted 17 February, 2020; v1 submitted 22 August, 2018; originally announced August 2018.

    Comments: To appear in ICLR 2020