Skip to main content

Showing 1–17 of 17 results for author: Charles, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2201.02664  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

    Authors: Nicole Mitchell, Johannes Ballé, Zachary Charles, Jakub Konečný

    Abstract: A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server. We present a comprehensive empirical study of the statistics of model updates in FL, as well as the role and benefits of various compression techniques. Motivated by these observations, we propose a novel method to reduce the average communicatio… ▽ More

    Submitted 19 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  2. arXiv:2106.02305  [pdf, other

    cs.LG cs.DC stat.ML

    Local Adaptivity in Federated Learning: Convergence and Consistency

    Authors: Jianyu Wang, Zheng Xu, Zachary Garrett, Zachary Charles, Luyang Liu, Gauri Joshi

    Abstract: The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have be… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  3. arXiv:2103.05032  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning

    Authors: Zachary Charles, Jakub Konečný

    Abstract: We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms. We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss we exactly characterize. Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition nu… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021. PMLR: Volume 130

  4. arXiv:2007.00878  [pdf, other

    cs.LG math.OC stat.ML

    On the Outsized Importance of Learning Rates in Local Update Methods

    Authors: Zachary Charles, Jakub Konečný

    Abstract: We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate l… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  5. arXiv:2003.00295  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Adaptive Federated Optimization

    Authors: Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

    Abstract: Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have… ▽ More

    Submitted 8 September, 2021; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Published as a conference paper at ICLR 2021

  6. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  7. arXiv:1907.12205  [pdf, other

    cs.LG cs.DC stat.ML

    DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

    Authors: Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but… ▽ More

    Submitted 7 March, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

  8. arXiv:1905.09209  [pdf, other

    cs.LG math.OC stat.ML

    Convergence and Margin of Adversarial Training on Separable Data

    Authors: Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

    Abstract: Adversarial training is a technique for training robust machine learning models. To encourage robustness, it iteratively computes adversarial examples for the model, and then re-trains on these examples via some update rule. This work analyzes the performance of adversarial training on linearly separable data, and provides bounds on the number of iterations required for large margin. We show that… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

  9. arXiv:1905.03177  [pdf, other

    cs.LG stat.ML

    Does Data Augmentation Lead to Positive Margin?

    Authors: Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos

    Abstract: Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  10. arXiv:1901.09671  [pdf, other

    cs.LG cs.DC cs.IT math.OC stat.ML

    ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

    Authors: Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

  11. arXiv:1811.03531  [pdf, other

    cs.LG stat.ML

    A Geometric Perspective on the Transferability of Adversarial Directions

    Authors: Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos

    Abstract: State-of-the-art machine learning models frequently misclassify inputs that have been perturbed in an adversarial manner. Adversarial perturbations generated for a given input and a specific classifier often seem to be effective on other inputs and even different classifiers. In other words, adversarial perturbations seem to transfer between different inputs, models, and even different neural netw… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

  12. arXiv:1806.04090  [pdf, other

    stat.ML cs.DC cs.LG

    ATOMO: Communication-efficient Learning via Atomic Sparsification

    Authors: Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos

    Abstract: Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular va… ▽ More

    Submitted 8 November, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

  13. arXiv:1805.10378  [pdf, other

    stat.ML cs.DC cs.IT cs.LG stat.CO

    Gradient Coding via the Stochastic Block Model

    Authors: Zachary Charles, Dimitris Papailiopoulos

    Abstract: Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning. Due to the size and scale of modern data, gradient computations are often distributed across multiple compute nodes. Unfortunately, such distributed implementations can face significant delays caused by straggler nodes, i.e., nodes that a… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  14. arXiv:1803.09877  [pdf, other

    stat.ML cs.DC cs.IT cs.LG cs.NE

    DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

    Authors: Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur… ▽ More

    Submitted 21 June, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: Accepted by ICML 2018

  15. arXiv:1711.06771  [pdf, other

    stat.ML cs.DC cs.IT cs.LG stat.CO

    Approximate Gradient Coding via Sparse Random Graphs

    Authors: Zachary Charles, Dimitris Papailiopoulos, Jordan Ellenberg

    Abstract: Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time. Coding-theoretic techniques have been recently proposed to mitigate stragglers via algorithmic redundancy. Prior work in coded computation and gradient coding has mainly focused on exact recovery of the desired output. However, slightly inexact solutions c… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

  16. arXiv:1710.08402  [pdf, other

    stat.ML cs.IT cs.LG math.OC

    Stability and Generalization of Learning Algorithms that Converge to Global Optima

    Authors: Zachary Charles, Dimitris Papailiopoulos

    Abstract: We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-Łojasiewicz (PL) and the quadratic growth (QG) conditions. W… ▽ More

    Submitted 23 October, 2017; originally announced October 2017.

    Comments: 27 pages, 5 figures

  17. arXiv:1707.02461  [pdf, other

    stat.ML

    Subspace Clustering with Missing and Corrupted Data

    Authors: Zachary Charles, Amin Jalali, Rebecca Willett

    Abstract: Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces. One popular approach, sparse subspace clustering (SSC), represents each sample as a weighted combination of the other samples, with weights of minimal $\ell_1$ nor… ▽ More

    Submitted 15 January, 2018; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: 31 pages, 2 figures