Skip to main content

Showing 1–17 of 17 results for author: Hendrikx, H

Searching in archive math. Search in all archives.
.
  1. arXiv:2410.10418  [pdf, other

    math.OC stat.ML

    Unified Breakdown Analysis for Byzantine Robust Gossip

    Authors: Renaud Gaucher, Aymeric Dieuleveut, Hadrien Hendrikx

    Abstract: In decentralized machine learning, different devices communicate in a peer-to-peer manner to collaboratively learn from each other's data. Such approaches are vulnerable to misbehaving (or Byzantine) devices. We introduce $\mathrm{F}\text{-}\rm RG$, a general framework for building robust decentralized algorithms with guarantees arising from robust-sum-like aggregation rules $\mathrm{F}$. We then… ▽ More

    Submitted 3 February, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

  2. arXiv:2404.12213  [pdf, ps, other

    math.OC

    Investigating Variance Definitions for Mirror Descent with Relative Smoothness

    Authors: Hadrien Hendrikx

    Abstract: Mirror Descent is a popular algorithm, that extends Gradients Descent (GD) beyond the Euclidean geometry. One of its benefits is to enable strong convergence guarantees through smooth-like analyses, even for objectives with exploding or vanishing curvature. This is achieved through the introduction of the notion of relative smoothness, which holds in many of the common use-cases of Mirror descent.… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  3. arXiv:2308.15250  [pdf, other

    cs.LG cs.CR math.OC

    The Relative Gaussian Mechanism and its Application to Private Gradient Descent

    Authors: Hadrien Hendrikx, Paul Mangold, Aurélien Bellet

    Abstract: The Gaussian Mechanism (GM), which consists in adding Gaussian noise to a vector-valued query before releasing it, is a standard privacy protection mechanism. In particular, given that the query respects some L2 sensitivity property (the L2 distance between outputs on any two neighboring inputs is bounded), GM guarantees Rényi Differential Privacy (RDP). Unfortunately, precisely bounding the L2 se… ▽ More

    Submitted 19 March, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  4. arXiv:2305.01588  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

    Authors: Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

    Abstract: Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value $c >0$. It is widely used for example for stabilizing the training of deep learning models (Goodfellow et al., 2016), or for enforcing differential privacy (Abadi et al., 2016). Despite popularity and simplicity of the clipping mechanism, its conve… ▽ More

    Submitted 9 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  5. arXiv:2301.02151  [pdf, other

    cs.LG cs.DC math.OC

    Beyond spectral gap (extended): The role of the topology in decentralized learning

    Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi

    Abstract: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. In the decentralized setting, in which workers communicate over a sparse graph, current theory fails to capture important aspects of real-world behavior. First, the `spectral gap' of the communica… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Extended version of the other paper (with the same name), that includes (among other things) theory for the heterogeneous case. arXiv admin note: substantial text overlap with arXiv:2206.03093

  6. arXiv:2206.03093  [pdf, other

    cs.LG math.OC stat.ML

    Beyond spectral gap: The role of the topology in decentralized learning

    Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi

    Abstract: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects o… ▽ More

    Submitted 8 November, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  7. arXiv:2205.15015  [pdf, other

    math.OC cs.DC

    A principled framework for the design and analysis of token algorithms

    Authors: Hadrien Hendrikx

    Abstract: We consider a decentralized optimization problem, in which $n$ nodes collaborate to optimize a global objective function using local communications only. While many decentralized algorithms focus on \emph{gossip} communications (pairwise averaging), we consider a different scheme, in which a ``token'' that contains the current estimate of the model performs a random walk over the network, and upda… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  8. arXiv:2106.07644  [pdf, other

    math.OC cs.LG cs.MA math.PR stat.ML

    A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

    Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

    Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More

    Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

  9. arXiv:2106.03585  [pdf, other

    math.OC cs.MA math.PR stat.ML

    Asynchronous speedup in decentralized optimization

    Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulie

    Abstract: In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametr… ▽ More

    Submitted 1 September, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  10. arXiv:2104.09813  [pdf, other

    math.OC

    Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

    Authors: Radu-Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

    Abstract: We study the problem of minimizing a relatively-smooth convex function using stochastic Bregman gradient methods. We first prove the convergence of Bregman Stochastic Gradient Descent (BSGD) to a region that depends on the noise (magnitude of the gradients) at the optimum. In particular, BSGD with a constant step-size converges to the exact minimizer when this noise is zero (\emph{interpolation} s… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  11. arXiv:2011.02379  [pdf, other

    cs.DC cs.MA math.OC

    Asynchrony and Acceleration in Gossip Algorithms

    Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulié

    Abstract: This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the in… ▽ More

    Submitted 7 February, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    MSC Class: 68Q87; 60G55; 90-10

  12. arXiv:2006.14384  [pdf, other

    math.OC cs.DC

    Dual-Free Stochastic Decentralized Optimization with Variance Reduction

    Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

    Abstract: We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensiv… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

  13. arXiv:2005.10675  [pdf, other

    math.OC cs.DC

    An Optimal Algorithm for Decentralized Finite Sum Optimization

    Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulie

    Abstract: Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregati… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.11394

  14. arXiv:2002.10726  [pdf, other

    math.OC cs.DC

    Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

    Authors: Hadrien Hendrikx, Lin Xiao, Sebastien Bubeck, Francis Bach, Laurent Massoulie

    Abstract: We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

  15. arXiv:1905.11394  [pdf, other

    math.OC cs.DC

    An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums

    Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulie

    Abstract: Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregati… ▽ More

    Submitted 12 June, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Code available in source files. arXiv admin note: substantial text overlap with arXiv:1901.09865

  16. arXiv:1901.09865  [pdf, other

    math.OC cs.DC

    Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums

    Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

    Abstract: In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with $m$ components. ADFS converges linearly when local functions are smooth, and matches the rates of the best known finite sum algorithms when execut… ▽ More

    Submitted 17 July, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

  17. arXiv:1810.02660  [pdf, other

    math.OC cs.DC cs.LG

    Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives

    Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

    Abstract: In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion. We propose the algorithm $ESDACD$, a decentralized accelerated algorithm that only requires local synchrony. Its rate depends on the condition number $κ$ of the local functions as well as the network topology and delays. Under mild assumption… ▽ More

    Submitted 22 February, 2019; v1 submitted 5 October, 2018; originally announced October 2018.