-
ByzFL: Research Framework for Robust Federated Learning
Authors:
Marc González,
Rachid Guerraoui,
Rafael Pinot,
Geovani Rizk,
John Stephan,
François Taïani
Abstract:
We present ByzFL, an open-source Python library for developing and benchmarking robust federated learning (FL) algorithms. ByzFL provides a unified and extensible framework that includes implementations of state-of-the-art robust aggregators, a suite of configurable attacks, and tools for simulating a variety of FL scenarios, including heterogeneous data distributions, multiple training algorithms…
▽ More
We present ByzFL, an open-source Python library for developing and benchmarking robust federated learning (FL) algorithms. ByzFL provides a unified and extensible framework that includes implementations of state-of-the-art robust aggregators, a suite of configurable attacks, and tools for simulating a variety of FL scenarios, including heterogeneous data distributions, multiple training algorithms, and adversarial threat models. The library enables systematic experimentation via a single JSON-based configuration file and includes built-in utilities for result visualization. Compatible with PyTorch tensors and NumPy arrays, ByzFL is designed to facilitate reproducible research and rapid prototyping of robust FL solutions. ByzFL is available at https://byzfl.epfl.ch/, with source code hosted on GitHub: https://github.com/LPD-EPFL/byzfl.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Towards Trustworthy Federated Learning with Untrusted Participants
Authors:
Youssef Allouah,
Rachid Guerraoui,
John Stephan
Abstract:
Resilience against malicious participants and data privacy are essential for trustworthy federated learning, yet achieving both with good utility typically requires the strong assumption of a trusted central server. This paper shows that a significantly weaker assumption suffices: each pair of participants shares a randomness seed unknown to others. In a setting where malicious participants may co…
▽ More
Resilience against malicious participants and data privacy are essential for trustworthy federated learning, yet achieving both with good utility typically requires the strong assumption of a trusted central server. This paper shows that a significantly weaker assumption suffices: each pair of participants shares a randomness seed unknown to others. In a setting where malicious participants may collude with an untrusted server, we propose CafCor, an algorithm that integrates robust gradient aggregation with correlated noise injection, using shared randomness between participants. We prove that CafCor achieves strong privacy-utility trade-offs, significantly outperforming local differential privacy (DP) methods, which do not make any trust assumption, while approaching central DP utility, where the server is fully trusted. Empirical results on standard benchmarks validate CafCor's practicality, showing that privacy and robustness can coexist in distributed systems without sacrificing utility or trusting the server.
△ Less
Submitted 4 June, 2025; v1 submitted 3 May, 2025;
originally announced May 2025.
-
CrowdProve: Community Proving for ZK Rollups
Authors:
John Stephan,
Matej Pavlovic,
Antonio Locascio,
Benjamin Livshits
Abstract:
Zero-Knowledge (ZK) rollups have become a popular solution for scaling blockchain systems, offering improved transaction throughput and reduced costs by aggregating Layer 2 transactions and submitting them as a single batch to a Layer 1 blockchain. However, the computational burden of generating validity proofs, a key feature of ZK rollups, presents significant challenges in terms of performance a…
▽ More
Zero-Knowledge (ZK) rollups have become a popular solution for scaling blockchain systems, offering improved transaction throughput and reduced costs by aggregating Layer 2 transactions and submitting them as a single batch to a Layer 1 blockchain. However, the computational burden of generating validity proofs, a key feature of ZK rollups, presents significant challenges in terms of performance and decentralization. Current solutions rely on centralized infrastructure to handle the computational tasks, limiting the scalability and decentralization of rollup systems.
This paper proposes CrowdProve, a prover orchestration layer for outsourcing computation to unreliable commodity hardware run by a broad community of small provers. We apply CrowdProve to proving transaction batches for a popular ZK rollup.
Through our experimental evaluation, we demonstrate that community proving can achieve performance comparable to, and in some cases better than, existing centralized deployments. Our results show that even systems utilizing modest hardware configurations can match the performance of centralized solutions, making community-based proof generation a viable and cost-effective alternative. CrowdProve allows both the rollup operator and community participants to benefit: the operator reduces infrastructure costs by leveraging idle community hardware, while community provers are compensated for their contributions.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
Overcoming the Challenges of Batch Normalization in Federated Learning
Authors:
Rachid Guerraoui,
Rafael Pinot,
Geovani Rizk,
John Stephan,
François Taiani
Abstract:
Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We int…
▽ More
Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We introduce in this paper Federated BatchNorm (FBN), a novel scheme that restores the benefits of batch normalization in federated learning. Essentially, FBN ensures that the batch normalization during training is consistent with what would be achieved in a centralized execution, hence preserving the distribution of the data, and providing running statistics that accurately approximate the global statistics. FBN thereby reduces the external covariate shift and matches the evaluation performance of the centralized setting. We also show that, with a slight increase in complexity, we can robustify FBN to mitigate erroneous statistics and potentially adversarial attacks.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Adaptive Gradient Clipping for Robust Federated Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
Nirupam Gupta,
Ahmed Jellouli,
Geovani Rizk,
John Stephan
Abstract:
Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: en…
▽ More
Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: enhancing robustness against some attacks while being ineffective or even detrimental against others. To address this limitation, we propose a principled adaptive clipping strategy, Adaptive Robust Clipping (ARC), which dynamically adjusts clipping thresholds based on the input gradients. We prove that ARC not only preserves the theoretical robustness guarantees of SOTA Robust-DGD methods but also provably improves asymptotic convergence when the model is well-initialized. Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings.
△ Less
Submitted 9 May, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Causal Forecasting for Pricing
Authors:
Douglas Schultz,
Johannes Stephan,
Julian Sieber,
Trudie Yeh,
Manuel Kunz,
Patrick Doupe,
Tim Januschowski
Abstract:
This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transforme…
▽ More
This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transformer-based forecasting models. In extensive empirical experiments, we show on the one hand that our method estimates the causal effect better in a fully controlled setting via synthetic, yet realistic data. On the other hand, we demonstrate on real-world data that our method outperforms forecasting methods in off-policy settings (i.e., when there's a change in the pricing policy) while only slightly trailing in the on-policy setting.
△ Less
Submitted 30 January, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Robustness, Efficiency, or Privacy: Pick Two in Machine Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
John Stephan
Abstract:
The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the cost…
▽ More
The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed ML architectures, from both theoretical and empirical perspectives. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these two objectives entails a notable compromise in computational efficiency. In short, traditional noise injection hurts accuracy by concealing poisoned inputs, while cryptographic methods clash with poisoning defenses due to their non-linear nature. However, we outline future research directions aimed at reconciling this compromise with efficiency by considering weaker threat models.
△ Less
Submitted 11 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
SABLE: Secure And Byzantine robust LEarning
Authors:
Antoine Choffrut,
Rachid Guerraoui,
Rafael Pinot,
Renaud Sirdey,
John Stephan,
Martin Zuber
Abstract:
Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes pos…
▽ More
Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes poses a risk of corrupting the learning process by transmitting inaccurate information to the server. Secondly, a curious server may compromise the privacy of individual nodes, sometimes reconstructing the entirety of the nodes' data. Homomorphic encryption (HE) has emerged as a leading security measure to preserve privacy in distributed learning under non-Byzantine scenarios. However, the extensive computational demands of HE, particularly for high-dimensional ML models, have deterred attempts to design purely homomorphic operators for non-linear robust aggregators. This paper introduces SABLE, the first homomorphic and Byzantine robust distributed learning algorithm. SABLE leverages HTS, a novel and efficient homomorphic operator implementing the prominent coordinate-wise trimmed mean robust aggregator. Designing HTS enables us to implement HMED, a novel homomorphic median aggregator. Extensive experiments on standard ML tasks demonstrate that SABLE achieves practical execution times while maintaining an ML accuracy comparable to its non-private counterpart.
△ Less
Submitted 14 December, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Deep Learning based Forecasting: a case study from the online fashion industry
Authors:
Manuel Kunz,
Stefan Birr,
Mones Raslan,
Lei Ma,
Zhen Li,
Adele Gouttes,
Mateusz Koren,
Tofigh Naghibi,
Johannes Stephan,
Mariia Bulycheva,
Matthias Grzeschik,
Armin Kekić,
Michael Narodovitch,
Kashif Rasul,
Julian Sieber,
Tim Januschowski
Abstract:
Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed invento…
▽ More
Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed inventory assumption requires a special treatment via controlling the relationship between price and demand closely. In this case study, we describe the data and our modelling approach for this forecasting problem in detail and present empirical results that highlight the effectiveness of our approach.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
On the Privacy-Robustness-Utility Trilemma in Distributed Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any alg…
▽ More
The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.
△ Less
Submitted 29 May, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity
Authors:
Youssef Allouah,
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine mach…
▽ More
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
On the Impossible Safety of Large AI Models
Authors:
El-Mahdi El-Mhamdi,
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Lê-Nguyên Hoang,
Rafael Pinot,
Sébastien Rouault,
John Stephan
Abstract:
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging featur…
▽ More
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.
△ Less
Submitted 9 May, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Robust Collaborative Learning with Linear Gradient Overhead
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Lê Nguyên Hoang,
Rafael Pinot,
John Stephan
Abstract:
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous da…
▽ More
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(α, λ)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.
△ Less
Submitted 3 June, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
Authors:
Wael Elwasif,
William Godoy,
Nick Hagerty,
J. Austin Harris,
Oscar Hernandez,
Balint Joo,
Paul Kent,
Damien Lebrun-Grandie,
Elijah Maccarthy,
Veronica G. Melesse Vergara,
Bronson Messer,
Ross Miller,
Sarp Opal,
Sergei Bastrakov,
Michael Bussmann,
Alexander Debus,
Klaus Steinger,
Jan Stephan,
Rene Widera,
Spencer H. Bryngelson,
Henry Le Berre,
Anand Radhakrishnan,
Jefferey Young,
Sunita Chandrasekaran,
Florina Ciorba
, et al. (6 additional authors not shown)
Abstract:
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The syst…
▽ More
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The systems are connected together using Infiniband high-bandwidth low-latency interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.
△ Less
Submitted 19 December, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arg…
▽ More
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches.
We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: \emph{resilient averaging} at the server and \emph{distributed momentum} at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading
Authors:
Jeffrey Kelling,
Sergei Bastrakov,
Alexander Debus,
Thomas Kluge,
Matt Leinhauser,
Richard Pausch,
Klaus Steiniger,
Jan Stephan,
René Widera,
Jeff Young,
Michael Bussmann,
Sunita Chandrasekaran,
Guido Juckeland
Abstract:
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the required portability, but, to existing codes, they thems…
▽ More
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the required portability, but, to existing codes, they themselves represent yet another API to port to. Here, we present our approach of porting the GPU-accelerated particle-in-cell code PIConGPU to OpenACC and OpenMP target by adding two new backends to its existing C++-template metaprogramming-based offloading abstraction layer alpaka and avoiding other modifications to the application code. We introduce our approach in the face of conflicts between requirements and available features in the standards as well as practical hurdles posed by immature compiler support.
△ Less
Submitted 24 January, 2022; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Combining Differential Privacy and Byzantine Resilience in Distributed SGD
Authors:
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
Sebastien Rouault,
John Stephan
Abstract:
Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architectu…
▽ More
Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(α,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(α,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
△ Less
Submitted 5 October, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Probabilistic Time Series Forecasting with Implicit Quantile Networks
Authors:
Adèle Gouttes,
Kashif Rasul,
Mateusz Koren,
Johannes Stephan,
Tofigh Naghibi
Abstract:
Here, we propose a general method for probabilistic time series forecasting. We combine an autoregressive recurrent neural network to model temporal dynamics with Implicit Quantile Networks to learn a large class of distributions over a time-series target. When compared to other probabilistic neural forecasting models on real- and simulated data, our approach is favorable in terms of point-wise pr…
▽ More
Here, we propose a general method for probabilistic time series forecasting. We combine an autoregressive recurrent neural network to model temporal dynamics with Implicit Quantile Networks to learn a large class of distributions over a time-series target. When compared to other probabilistic neural forecasting models on real- and simulated data, our approach is favorable in terms of point-wise prediction accuracy as well as on estimating the underlying temporal distribution.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
Authors:
Rachid Guerraoui,
Nirupam Gupta,
Rafaël Pinot,
Sébastien Rouault,
John Stephan
Abstract:
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theor…
▽ More
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets; showing that it is impractical to ensure DP and Byzantine resilience simultaneously.
△ Less
Submitted 24 June, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.