-
Efficient Machine Unlearning by Model Splitting and Core Sample Selection
Authors:
Maximilian Egger,
Rawad Bitar,
Rüdiger Urbanke
Abstract:
Machine unlearning is essential for meeting legal obligations such as the right to be forgotten, which requires the removal of specific data from machine learning models upon request. While several approaches to unlearning have been proposed, existing solutions often struggle with efficiency and, more critically, with the verification of unlearning - particularly in the case of weak unlearning gua…
▽ More
Machine unlearning is essential for meeting legal obligations such as the right to be forgotten, which requires the removal of specific data from machine learning models upon request. While several approaches to unlearning have been proposed, existing solutions often struggle with efficiency and, more critically, with the verification of unlearning - particularly in the case of weak unlearning guarantees, where verification remains an open challenge. We introduce a generalized variant of the standard unlearning metric that enables more efficient and precise unlearning strategies. We also present an unlearning-aware training procedure that, in many cases, allows for exact unlearning. We term our approach MaxRR. When exact unlearning is not feasible, MaxRR still supports efficient unlearning with properties closely matching those achieved through full retraining.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Multi-Terminal Remote Generation and Estimation Over a Broadcast Channel With Correlated Priors
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Nir Weinberger,
Deniz Gündüz
Abstract:
We study the multi-terminal remote estimation problem under a rate constraint, in which the goal of the encoder is to help each decoder estimate a function over a certain distribution -- while the distribution is known only to the encoder, the function to be estimated is known only to the decoders, and can also be different for each decoder. The decoders can observe correlated samples from prior d…
▽ More
We study the multi-terminal remote estimation problem under a rate constraint, in which the goal of the encoder is to help each decoder estimate a function over a certain distribution -- while the distribution is known only to the encoder, the function to be estimated is known only to the decoders, and can also be different for each decoder. The decoders can observe correlated samples from prior distributions, instantiated through shared randomness with the encoder. To achieve this, we employ remote generation, where the encoder helps decoders generate samples from the underlying distribution by using the samples from the prior through importance sampling. While methods such as minimal random coding can be used to efficiently transmit samples to each decoder individually using their importance scores, it is unknown if the correlation among the samples from the priors can reduce the communication cost using the availability of a broadcast link. We propose a hierarchical importance sampling strategy that facilitates, in the case of non-zero Gács-Körner common information among the priors of the decoders, a common sampling step leveraging the availability of a broadcast channel. This is followed by a refinement step for the individual decoders. We present upper bounds on the bias and the estimation error for unicast transmission, which is of independent interest. We then introduce a method that splits into two phases, dedicated to broadcast and unicast transmission, respectively, and show the reduction in communication cost.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Source Anonymity for Private Random Walk Decentralized Learning
Authors:
Maximilian Egger,
Svenja Lage,
Rawad Bitar,
Antonia Wachter-Zeh
Abstract:
This paper considers random walk-based decentralized learning, where at each iteration of the learning process, one user updates the model and sends it to a randomly chosen neighbor until a convergence criterion is met. Preserving data privacy is a central concern and open problem in decentralized learning. We propose a privacy-preserving algorithm based on public-key cryptography and anonymizatio…
▽ More
This paper considers random walk-based decentralized learning, where at each iteration of the learning process, one user updates the model and sends it to a randomly chosen neighbor until a convergence criterion is met. Preserving data privacy is a central concern and open problem in decentralized learning. We propose a privacy-preserving algorithm based on public-key cryptography and anonymization. In this algorithm, the user updates the model and encrypts the result using a distant user's public key. The encrypted result is then transmitted through the network with the goal of reaching that specific user. The key idea is to hide the source's identity so that, when the destination user decrypts the result, it does not know who the source was. The challenge is to design a network-dependent probability distribution (at the source) over the potential destinations such that, from the receiver's perspective, all users have a similar likelihood of being the source. We introduce the problem and construct a scheme that provides anonymity with theoretical guarantees. We focus on random regular graphs to establish rigorous guarantees.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Federated One-Shot Learning with Data Privacy and Objective-Hiding
Authors:
Maximilian Egger,
Rüdiger Urbanke,
Rawad Bitar
Abstract:
Privacy in federated learning is crucial, encompassing two key aspects: safeguarding the privacy of clients' data and maintaining the privacy of the federator's objective from the clients. While the first aspect has been extensively studied, the second has received much less attention.
We present a novel approach that addresses both concerns simultaneously, drawing inspiration from techniques in…
▽ More
Privacy in federated learning is crucial, encompassing two key aspects: safeguarding the privacy of clients' data and maintaining the privacy of the federator's objective from the clients. While the first aspect has been extensively studied, the second has received much less attention.
We present a novel approach that addresses both concerns simultaneously, drawing inspiration from techniques in knowledge distillation and private information retrieval to provide strong information-theoretic privacy guarantees.
Traditional private function computation methods could be used here; however, they are typically limited to linear or polynomial functions. To overcome these constraints, our approach unfolds in three stages. In stage 0, clients perform the necessary computations locally. In stage 1, these results are shared among the clients, and in stage 2, the federator retrieves its desired objective without compromising the privacy of the clients' data. The crux of the method is a carefully designed protocol that combines secret-sharing-based multi-party computation and a graph-based private information retrieval scheme. We show that our method outperforms existing tools from the literature when properly adapted to this setting.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
BICompFL: Stochastic Federated Learning with Bi-Directional Compression
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Nir Weinberger,
Deniz Gündüz
Abstract:
We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather than deterministic parameters. Stochastic FL offers a principled approach to compression, and has been shown to reduce the communication load under perfect downlink transmission from the federator to…
▽ More
We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather than deterministic parameters. Stochastic FL offers a principled approach to compression, and has been shown to reduce the communication load under perfect downlink transmission from the federator to the clients. However, in practice, both the uplink and downlink communications are constrained. We show that bi-directional compression for stochastic FL has inherent challenges, which we address by introducing BICompFL. Our BICompFL is experimentally shown to reduce the communication cost by an order of magnitude compared to multiple benchmarks, while maintaining state-of-the-art accuracies. Theoretically, we study the communication cost of BICompFL through a new analysis of an importance-sampling based technique, which exposes the interplay between uplink and downlink communication costs.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated Learning
Authors:
Maximilian Egger,
Mayank Bakshi,
Rawad Bitar
Abstract:
We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs. We introduce transformed robust aggregation to give convergence guarantees for general non-convex objectives under client data heterogeneity. Empirical evaluations for standard learning tasks and fin…
▽ More
We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs. We introduce transformed robust aggregation to give convergence guarantees for general non-convex objectives under client data heterogeneity. Empirical evaluations for standard learning tasks and fine-tuning large language models show that CyBeR-0 exhibits stable performance with only a few scalars per-round communication cost and reduced memory requirements.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Between Close Enough to Reveal and Far Enough to Protect: a New Privacy Region for Correlated Data
Authors:
Luis Maßny,
Rawad Bitar,
Fangwei Ye,
Salim El Rouayheb
Abstract:
When users make personal privacy choices, correlation between their data can cause inadvertent leakage about users who do not want to share their data by other users sharing their data. As a solution, we consider local redaction mechanisms. As prior works proposed data-independent privatization mechanisms, we study the family of data-independent local redaction mechanisms and upper-bound their uti…
▽ More
When users make personal privacy choices, correlation between their data can cause inadvertent leakage about users who do not want to share their data by other users sharing their data. As a solution, we consider local redaction mechanisms. As prior works proposed data-independent privatization mechanisms, we study the family of data-independent local redaction mechanisms and upper-bound their utility when data correlation is modeled by a stationary Markov process. In contrast, we derive a novel data-dependent mechanism, which improves the utility by leveraging a data-dependent leakage measure.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
CAT and DOG: Improved Codes for Private Distributed Matrix Multiplication
Authors:
Christoph Hofmeister,
Rawad Bitar,
Antonia Wachter-Zeh
Abstract:
We present novel constructions of polynomial codes for private distributed matrix multiplication (PDMM/SDMM) using outer product partitioning (OPP). We extend the degree table framework from the literature to cyclic-addition degree tables (CATs). By using roots of unity as evaluation points, we enable modulo-addition in the table. Based on CATs, we present an explicit construction, called CATx, th…
▽ More
We present novel constructions of polynomial codes for private distributed matrix multiplication (PDMM/SDMM) using outer product partitioning (OPP). We extend the degree table framework from the literature to cyclic-addition degree tables (CATs). By using roots of unity as evaluation points, we enable modulo-addition in the table. Based on CATs, we present an explicit construction, called CATx, that requires fewer workers than existing schemes in the low-privacy regime. Additionally, we present new families of schemes based on conventional degree tables, called GASPrs and DOGrs, that outperform the state-of-the-art for a wide range of parameters.
△ Less
Submitted 1 March, 2025; v1 submitted 21 January, 2025;
originally announced January 2025.
-
Scalable and Reliable Over-the-Air Federated Edge Learning
Authors:
Maximilian Egger,
Christoph Hofmeister,
Cem Kaya,
Rawad Bitar,
Antonia Wachter-Zeh
Abstract:
Federated edge learning (FEEL) has emerged as a core paradigm for large-scale optimization. However, FEEL still suffers from a communication bottleneck due to the transmission of high-dimensional model updates from the clients to the federator. Over-the-air computation (AirComp) leverages the additive property of multiple-access channels by aggregating the clients' updates over the channel to save…
▽ More
Federated edge learning (FEEL) has emerged as a core paradigm for large-scale optimization. However, FEEL still suffers from a communication bottleneck due to the transmission of high-dimensional model updates from the clients to the federator. Over-the-air computation (AirComp) leverages the additive property of multiple-access channels by aggregating the clients' updates over the channel to save communication resources. While analog uncoded transmission can benefit from the increased signal-to-noise ratio (SNR) due to the simultaneous transmission of many clients, potential errors may severely harm the learning process for small SNRs. To alleviate this problem, channel coding approaches were recently proposed for AirComp in FEEL. However, their error-correction capability degrades with an increasing number of clients. We propose a digital lattice-based code construction with constant error-correction capabilities in the number of clients, and compare to nested-lattice codes, well-known for their optimal rate and power efficiency in the point-to-point AWGN channel.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Self-Regulating Random Walks for Resilient Decentralized Learning on Graphs
Authors:
Maximilian Egger,
Rawad Bitar,
Ghadir Ayache,
Antonia Wachter-Zeh,
Salim El Rouayheb
Abstract:
Consider the setting of multiple random walks (RWs) on a graph executing a certain computational task. For instance, in decentralized learning via RWs, a model is updated at each iteration based on the local data of the visited node and then passed to a randomly chosen neighbor. RWs can fail due to node or link failures. The goal is to maintain a desired number of RWs to ensure failure resilience.…
▽ More
Consider the setting of multiple random walks (RWs) on a graph executing a certain computational task. For instance, in decentralized learning via RWs, a model is updated at each iteration based on the local data of the visited node and then passed to a randomly chosen neighbor. RWs can fail due to node or link failures. The goal is to maintain a desired number of RWs to ensure failure resilience. Achieving this is challenging due to the lack of a central entity to track which RWs have failed to replace them with new ones by forking (duplicating) surviving ones. Without duplications, the number of RWs will eventually go to zero, causing a catastrophic failure of the system. We propose two decentralized algorithms called DecAFork and DecAFork+ that can maintain the number of RWs in the graph around a desired value even in the presence of arbitrary RW failures. Nodes continuously estimate the number of surviving RWs by estimating their return time distribution and fork the RWs when failures are likely to happen. DecAFork+ additionally allows terminations to avoid overloading the network by forking too many RWs. We present extensive numerical simulations that show the performance of DecAFork and DecAFork+ regarding fast detection and reaction to failures compared to a baseline, and establish theoretical guarantees on the performance of both algorithms.
△ Less
Submitted 10 February, 2025; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Capacity-Maximizing Input Symbol Selection for Discrete Memoryless Channels
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Deniz Gündüz,
Nir Weinberger
Abstract:
Motivated by communication systems with constrained complexity, we consider the problem of input symbol selection for discrete memoryless channels (DMCs). Given a DMC, the goal is to find a subset of its input alphabet, so that the optimal input distribution that is only supported on these symbols maximizes the capacity among all other subsets of the same size (or smaller). We observe that the res…
▽ More
Motivated by communication systems with constrained complexity, we consider the problem of input symbol selection for discrete memoryless channels (DMCs). Given a DMC, the goal is to find a subset of its input alphabet, so that the optimal input distribution that is only supported on these symbols maximizes the capacity among all other subsets of the same size (or smaller). We observe that the resulting optimization problem is non-concave and non-submodular, and so generic methods for such cases do not have theoretical guarantees. We derive an analytical upper bound on the capacity loss when selecting a subset of input symbols based only on the properties of the transition matrix of the channel. We propose a selection algorithm that is based on input-symbols clustering, and an appropriate choice of representatives for each cluster, which uses the theoretical bound as a surrogate objective function. We provide numerical experiments to support the findings.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization
Authors:
Afonso de Sá Delgado Neto,
Maximilian Egger,
Mayank Bakshi,
Rawad Bitar
Abstract:
We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide t…
▽ More
We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
LoByITFL: Low Communication Secure and Private Federated Learning
Authors:
Yue Xia,
Christoph Hofmeister,
Maximilian Egger,
Rawad Bitar
Abstract:
Federated Learning (FL) faces several challenges, such as the privacy of the clients data and security against Byzantine clients. Existing works treating privacy and security jointly make sacrifices on the privacy guarantee. In this work, we introduce LoByITFL, the first communication-efficient Information-Theoretic (IT) private and secure FL scheme that makes no sacrifices on the privacy guarante…
▽ More
Federated Learning (FL) faces several challenges, such as the privacy of the clients data and security against Byzantine clients. Existing works treating privacy and security jointly make sacrifices on the privacy guarantee. In this work, we introduce LoByITFL, the first communication-efficient Information-Theoretic (IT) private and secure FL scheme that makes no sacrifices on the privacy guarantees while ensuring security against Byzantine adversaries. The key ingredients are a small and representative dataset available to the federator, a careful transformation of the FLTrust algorithm and the use of a trusted third party only in a one-time preprocessing phase before the start of the learning algorithm. We provide theoretical guarantees on privacy and Byzantine-resilience, and provide convergence guarantee and experimental results validating our theoretical findings.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Byzantine-Resilient Secure Aggregation for Federated Learning Without Privacy Compromises
Authors:
Yue Xia,
Christoph Hofmeister,
Maximilian Egger,
Rawad Bitar
Abstract:
Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious…
▽ More
Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious users through trust scores (TS) that attenuate or amplify the users' gradients. The trust scores are based on the ReLU function, which we approximate by a polynomial. The distributed and privacy-preserving computation in ByITFL is designed using a combination of Lagrange coded computing, verifiable secret sharing and re-randomization steps. ByITFL is the first Byzantine resilient scheme for FL with full information-theoretic privacy.
△ Less
Submitted 8 July, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Secure Storage using Maximally Recoverable Locally Repairable Codes
Authors:
Tim Janz,
Hedongliang Liu,
Rawad Bitar,
Frank R. Kschischang
Abstract:
This paper considers data secrecy in distributed storage systems (DSSs) using maximally recoverable locally repairable codes (MR-LRCs). Conventional MR-LRCs are in general not secure against eavesdroppers who can observe the transmitted data during a global repair operation. This work enables nonzero secrecy dimension of DSSs encoded by MR-LRCs through a new repair framework. The key idea is to as…
▽ More
This paper considers data secrecy in distributed storage systems (DSSs) using maximally recoverable locally repairable codes (MR-LRCs). Conventional MR-LRCs are in general not secure against eavesdroppers who can observe the transmitted data during a global repair operation. This work enables nonzero secrecy dimension of DSSs encoded by MR-LRCs through a new repair framework. The key idea is to associate each local group with a central processing unit (CPU), which aggregates and transmits the contribution from the intact nodes of their group to the CPU of a group needing a global repair. The aggregation is enabled by so-called local polynomials that can be generated independently in each group. Two different schemes -- direct repair and forwarded repair -- are considered, and their secrecy dimension using MR-LRCs is derived. Positive secrecy dimension is enabled for several parameter regimes.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Interactive Byzantine-Resilient Gradient Coding for General Data Assignments
Authors:
Shreyas Jain,
Luis Maßny,
Christoph Hofmeister,
Eitan Yaakobi,
Rawad Bitar
Abstract:
We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of $s$ malicious workers with a data replication factor of only $s+1$. It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The s…
▽ More
We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of $s$ malicious workers with a data replication factor of only $s+1$. It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The scheme detects malicious workers through additional interactive communication and a small number of local computations at the main node, leveraging group-wise comparisons between workers with a provably optimal grouping strategy. The scheme requires at most $s$ interactive rounds that incur a total communication cost logarithmic in the number of data samples.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Achieving DNA Labeling Capacity with Minimum Labels through Extremal de Bruijn Subgraphs
Authors:
Christoph Hofmeister,
Anina Gruica,
Dganit Hanania,
Rawad Bitar,
Eitan Yaakobi
Abstract:
DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molecular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(\ell+1)$-ary sequence, where $\ell$ is the number of labels, in which any non-zero symbol indicates the appearance of the…
▽ More
DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molecular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(\ell+1)$-ary sequence, where $\ell$ is the number of labels, in which any non-zero symbol indicates the appearance of the corresponding label in the DNA molecule. The labeling capacity refers to the maximum information rate that can be achieved by the labeling process for any given set of labels. The main goal of this paper is to study the minimum number of labels of the same length required to achieve the maximum labeling capacity of 2 for DNA sequences or $\log_2q$ for an arbitrary alphabet of size $q$. The solution to this problem requires the study of path unique subgraphs of the de Bruijn graph with the largest number of edges and we provide upper and lower bounds on this value.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Maximal-Capacity Discrete Memoryless Channel Identification
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Deniz Gündüz,
Nir Weinberger
Abstract:
The problem of identifying the channel with the highest capacity among several discrete memoryless channels (DMCs) is considered. The problem is cast as a pure-exploration multi-armed bandit problem, which follows the practical use of training sequences to sense the communication channel statistics. A capacity estimator is proposed and tight confidence bounds on the estimator error are derived. Ba…
▽ More
The problem of identifying the channel with the highest capacity among several discrete memoryless channels (DMCs) is considered. The problem is cast as a pure-exploration multi-armed bandit problem, which follows the practical use of training sequences to sense the communication channel statistics. A capacity estimator is proposed and tight confidence bounds on the estimator error are derived. Based on this capacity estimator, a gap-elimination algorithm termed BestChanID is proposed, which is oblivious to the capacity-achieving input distribution and is guaranteed to output the DMC with the largest capacity, with a desired confidence. Furthermore, two additional algorithms NaiveChanSel and MedianChanEl, that output with certain confidence a DMC with capacity close to the maximal, are introduced. Each of those algorithms is beneficial in a different regime and can be used as a subroutine in BestChanID. The sample complexity of all algorithms is analyzed as a function of the desired confidence parameter, the number of channels, and the channels' input and output alphabet sizes. The cost of best channel identification is shown to scale quadratically with the alphabet size, and a fundamental lower bound for the required number of channel senses to identify the best channel with a certain confidence is derived.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Byzantine-Resilient Gradient Coding through Local Gradient Computations
Authors:
Christoph Hofmeister,
Luis Maßny,
Eitan Yaakobi,
Rawad Bitar
Abstract:
We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partia…
▽ More
We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we propose a way to reduce the replication to $s+1$ instead of $2s+1$ in the presence of $s$ malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of $s$ additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.
△ Less
Submitted 5 January, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Private Inference in Quantized Models
Authors:
Zirui Deng,
Vinayak Ramkumar,
Rawad Bitar,
Netanel Raviv
Abstract:
A typical setup in many machine learning scenarios involves a server that holds a model and a user that possesses data, and the challenge is to perform inference while safeguarding the privacy of both parties. Private Inference has been extensively explored in recent years, mainly from a cryptographic standpoint via techniques like homomorphic encryption and multiparty computation. These approache…
▽ More
A typical setup in many machine learning scenarios involves a server that holds a model and a user that possesses data, and the challenge is to perform inference while safeguarding the privacy of both parties. Private Inference has been extensively explored in recent years, mainly from a cryptographic standpoint via techniques like homomorphic encryption and multiparty computation. These approaches often come with high computational overhead and may degrade the accuracy of the model. In our work, we take a different approach inspired by the Private Information Retrieval literature. We view private inference as the task of retrieving inner products of parameter vectors with the data, a fundamental operation in many machine learning models. We introduce schemes that enable such retrieval of inner products for models with quantized (i.e., restricted to a finite set) weights; such models are extensively used in practice due to a wide range of benefits. In addition, our schemes uncover a fundamental tradeoff between user and server privacy. Our information-theoretic approach is applicable to a wide range of problems and robust in privacy guarantees for both the user and the server.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Sparsity and Privacy in Secret Sharing: A Fundamental Trade-Off
Authors:
Rawad Bitar,
Maximilian Egger,
Antonia Wachter-Zeh,
Marvin Xhemrishi
Abstract:
This work investigates the design of sparse secret sharing schemes that encode a sparse private matrix into sparse shares. This investigation is motivated by distributed computing, where the multiplication of sparse and private matrices is moved from a computationally weak main node to untrusted worker machines. Classical secret-sharing schemes produce dense shares. However, sparsity can help spee…
▽ More
This work investigates the design of sparse secret sharing schemes that encode a sparse private matrix into sparse shares. This investigation is motivated by distributed computing, where the multiplication of sparse and private matrices is moved from a computationally weak main node to untrusted worker machines. Classical secret-sharing schemes produce dense shares. However, sparsity can help speed up the computation. We show that, for matrices with i.i.d. entries, sparsity in the shares comes at a fundamental cost of weaker privacy. We derive a fundamental tradeoff between sparsity and privacy and construct optimal sparse secret sharing schemes that produce shares that leak the minimum amount of information for a desired sparsity of the shares. We apply our schemes to distributed sparse and private matrix multiplication schemes with no colluding workers while tolerating stragglers. For the setting of two non-communicating clusters of workers, we design a sparse one-time pad so that no private information is leaked to a cluster of untrusted and colluding workers, and the shares with bounded but non-zero leakage are assigned to a cluster of partially trusted workers. We conclude by discussing the necessity of using permutations for matrices with correlated entries.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Sparse and Private Distributed Matrix Multiplication with Straggler Tolerance
Authors:
Maximilian Egger,
Marvin Xhemrishi,
Antonia Wachter-Zeh,
Rawad Bitar
Abstract:
This paper considers the problem of outsourcing the multiplication of two private and sparse matrices to untrusted workers. Secret sharing schemes can be used to tolerate stragglers and guarantee information-theoretic privacy of the matrices. However, traditional secret sharing schemes destroy all sparsity in the offloaded computational tasks. Since exploiting the sparse nature of matrices was sho…
▽ More
This paper considers the problem of outsourcing the multiplication of two private and sparse matrices to untrusted workers. Secret sharing schemes can be used to tolerate stragglers and guarantee information-theoretic privacy of the matrices. However, traditional secret sharing schemes destroy all sparsity in the offloaded computational tasks. Since exploiting the sparse nature of matrices was shown to speed up the multiplication process, preserving the sparsity of the input matrices in the computational tasks sent to the workers is desirable. It was recently shown that sparsity can be guaranteed at the expense of a weaker privacy guarantee. Sparse secret sharing schemes with only two output shares were constructed. In this work, we construct sparse secret sharing schemes that generalize Shamir's secret sharing schemes for a fixed threshold $t=2$ and an arbitrarily large number of shares. We design our schemes to provide the strongest privacy guarantee given a desired sparsity of the shares under some mild assumptions. We show that increasing the number of shares, i.e., increasing straggler tolerance, incurs a degradation of the privacy guarantee. However, this degradation is negligible when the number of shares is comparably small to the cardinality of the input alphabet.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Private Aggregation in Hierarchical Wireless Federated Learning with Partial and Full Collusion
Authors:
Maximilian Egger,
Christoph Hofmeister,
Antonia Wachter-Zeh,
Rawad Bitar
Abstract:
In federated learning, a federator coordinates the training of a model, e.g., a neural network, on privately owned data held by several participating clients. The gradient descent algorithm, a well-known and popular iterative optimization procedure, is run to train the model. Every client computes partial gradients based on their local data and sends them to the federator, which aggregates the res…
▽ More
In federated learning, a federator coordinates the training of a model, e.g., a neural network, on privately owned data held by several participating clients. The gradient descent algorithm, a well-known and popular iterative optimization procedure, is run to train the model. Every client computes partial gradients based on their local data and sends them to the federator, which aggregates the results and updates the model. Privacy of the clients' data is a major concern. In fact, it is shown that observing the partial gradients can be enough to reveal the clients' data. Existing literature focuses on private aggregation schemes that tackle the privacy problem in federated learning in settings where all users are connected to each other and to the federator. In this paper, we consider a hierarchical wireless system architecture in which the clients are connected to base stations; the base stations are connected to the federator either directly or through relays. We examine settings with and without relays, and derive fundamental limits on the communication cost under information-theoretic privacy with different collusion assumptions. We introduce suitable private aggregation schemes tailored for these settings whose communication costs are multiplicative factors away from the derived bounds.
△ Less
Submitted 18 July, 2024; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load
Authors:
Maximilian Egger,
Serge Kas Hanna,
Rawad Bitar
Abstract:
In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for…
▽ More
In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. Previous works proposed to adapt the number of workers to wait for as the algorithm evolves to optimize the speed of convergence. In contrast, we model the communication and computation times using independent random variables. Considering this model, we construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm. Consequently, we improve the convergence speed of distributed SGD while significantly reducing the computation load, at the expense of a slight increase in communication load.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Trading Communication for Computation in Byzantine-Resilient Gradient Coding
Authors:
Christoph Hofmeister,
Luis Maßny,
Eitan Yaakobi,
Rawad Bitar
Abstract:
We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partia…
▽ More
We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we reduce replication by proposing a method that detects the erroneous inputs from the malicious workers, hence transforming them into erasures. For $s$ malicious workers, our solution can reduce the replication to $s+1$ instead of $2s+1$ for each partial gradient at the expense of only $s$ additional computations at the main node and additional rounds of light communication between the main node and the workers. We give fundamental limits of the general framework for fractional repetition data allocation. Our scheme is optimal in terms of replication and local computation but incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound.
△ Less
Submitted 5 June, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning
Authors:
Luis Maßny,
Christoph Hofmeister,
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh
Abstract:
We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such that the overall result can be recovered from only the non-straggling workers. Gradient codes are designed to tolerate a fixed number of stragglers. Since the num…
▽ More
We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such that the overall result can be recovered from only the non-straggling workers. Gradient codes are designed to tolerate a fixed number of stragglers. Since the number of stragglers in practice is random and unknown a priori, tolerating a fixed number of stragglers can yield a sub-optimal computation load and can result in higher latency. We propose a gradient coding scheme that can tolerate a flexible number of stragglers by carefully concatenating gradient codes for different straggler tolerance. By proper task scheduling and small additional signaling, our scheme adapts the computation load of the workers to the actual number of stragglers. We analyze the latency of our proposed scheme and show that it has a significantly lower latency than gradient codes.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Equivalence of Insertion/Deletion Correcting Codes for $d$-dimensional Arrays
Authors:
Evagoras Stylianou,
Lorenz Welter,
Rawad Bitar,
Antonia Wachter-Zeh,
Eitan Yaakobi
Abstract:
We consider the problem of correcting insertion and deletion errors in the $d$-dimensional space. This problem is well understood for vectors (one-dimensional space) and was recently studied for arrays (two-dimensional space). For vectors and arrays, the problem is motivated by several practical applications such as DNA-based storage and racetrack memories. From a theoretical perspective, it is in…
▽ More
We consider the problem of correcting insertion and deletion errors in the $d$-dimensional space. This problem is well understood for vectors (one-dimensional space) and was recently studied for arrays (two-dimensional space). For vectors and arrays, the problem is motivated by several practical applications such as DNA-based storage and racetrack memories. From a theoretical perspective, it is interesting to know whether the same properties of insertion/deletion correcting codes generalize to the $d$-dimensional space. In this work, we show that the equivalence between insertion and deletion correcting codes generalizes to the $d$-dimensional space. As a particular result, we show the following missing equivalence for arrays: a code that can correct $t_\mathrm{r}$ and $t_\mathrm{c}$ row/column deletions can correct any combination of $t_\mathrm{r}^{\mathrm{ins}}+t_\mathrm{r}^{\mathrm{del}}=t_\mathrm{r}$ and $t_\mathrm{c}^{\mathrm{ins}}+t_\mathrm{c}^{\mathrm{del}}=t_\mathrm{c}$ row/column insertions and deletions. The fundamental limit on the redundancy and a construction of insertion/deletion correcting codes in the $d$-dimensional space remain open for future work.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning
Authors:
Serge Kas Hanna,
Rawad Bitar,
Parimal Parag,
Venkat Dasari,
Salim El Rouayheb
Abstract:
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updatin…
▽ More
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Efficient Private Storage of Sparse Machine Learning Data
Authors:
Marvin Xhemrishi,
Maximilian Egger,
Rawad Bitar
Abstract:
We consider the problem of maintaining sparsity in private distributed storage of confidential machine learning data. In many applications, e.g., face recognition, the data used in machine learning algorithms is represented by sparse matrices which can be stored and processed efficiently. However, mechanisms maintaining perfect information-theoretic privacy require encoding the sparse matrices int…
▽ More
We consider the problem of maintaining sparsity in private distributed storage of confidential machine learning data. In many applications, e.g., face recognition, the data used in machine learning algorithms is represented by sparse matrices which can be stored and processed efficiently. However, mechanisms maintaining perfect information-theoretic privacy require encoding the sparse matrices into randomized dense matrices. It has been shown that, under some restrictions on the storage nodes, sparsity can be maintained at the expense of relaxing the perfect information-theoretic privacy requirement, i.e., allowing some information leakage. In this work, we lift the restrictions imposed on the storage nodes and show that there exists a trade-off between sparsity and the achievable privacy guarantees. We focus on the setting of non-colluding nodes and construct a coding scheme that encodes the sparse input matrices into matrices with the desired sparsity level while limiting the information leakage.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Distributed Matrix-Vector Multiplication with Sparsity and Privacy Guarantees
Authors:
Marvin Xhemrishi,
Rawad Bitar,
Antonia Wachter-Zeh
Abstract:
We consider the problem of designing a coding scheme that allows both sparsity and privacy for distributed matrix-vector multiplication. Perfect information-theoretic privacy requires encoding the input sparse matrices into matrices distributed uniformly at random from the considered alphabet; thus destroying the sparsity. Computing matrix-vector multiplication for sparse matrices is known to be f…
▽ More
We consider the problem of designing a coding scheme that allows both sparsity and privacy for distributed matrix-vector multiplication. Perfect information-theoretic privacy requires encoding the input sparse matrices into matrices distributed uniformly at random from the considered alphabet; thus destroying the sparsity. Computing matrix-vector multiplication for sparse matrices is known to be fast. Distributing the computation over the non-sparse encoded matrices maintains privacy, but introduces artificial computing delays. In this work, we relax the privacy constraint and show that a certain level of sparsity can be maintained in the encoded matrices. We consider the chief/worker setting while assuming the presence of two clusters of workers: one is completely untrusted in which all workers collude to eavesdrop on the input matrix and in which perfect privacy must be satisfied; in the partly trusted cluster, only up to $z$ workers may collude and to which revealing small amount of information about the input matrix is allowed. We design a scheme that trades sparsity for privacy while achieving the desired constraints. We use cyclic task assignments of the encoded matrices to tolerate partial and full stragglers.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Deniz Gündüz
Abstract:
We consider the distributed SGD problem, where a main node distributes gradient calculations among $n$ workers. By assigning tasks to all the workers and waiting only for the $k$ fastest ones, the main node can trade-off the algorithm's error with its runtime by gradually increasing $k$ as the algorithm evolves. However, this strategy, referred to as adaptive $k$-sync, neglects the cost of unused…
▽ More
We consider the distributed SGD problem, where a main node distributes gradient calculations among $n$ workers. By assigning tasks to all the workers and waiting only for the $k$ fastest ones, the main node can trade-off the algorithm's error with its runtime by gradually increasing $k$ as the algorithm evolves. However, this strategy, referred to as adaptive $k$-sync, neglects the cost of unused computations and of communicating models to workers that reveal a straggling behavior. We propose a cost-efficient scheme that assigns tasks only to $k$ workers, and gradually increases $k$. We introduce the use of a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations. Assuming workers with exponentially distributed response times parameterized by different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Furthermore, we propose and analyze a strategy applicable to a large class of response time distributions. Compared to adaptive $k$-sync, our scheme achieves significantly lower errors with the same computational efforts and less downlink communication while being inferior in terms of speed.
△ Less
Submitted 28 June, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Secure Private and Adaptive Matrix Multiplication Beyond the Singleton Bound
Authors:
Christoph Hofmeister,
Rawad Bitar,
Marvin Xhemrishi,
Antonia Wachter-Zeh
Abstract:
We consider the problem of designing secure and private codes for distributed matrix-matrix multiplication. A master server owns two private matrices and hires worker nodes to help compute their product. The matrices should remain information-theoretically private from the workers. Some of the workers are malicious and return corrupted results to the master. We design a framework for security agai…
▽ More
We consider the problem of designing secure and private codes for distributed matrix-matrix multiplication. A master server owns two private matrices and hires worker nodes to help compute their product. The matrices should remain information-theoretically private from the workers. Some of the workers are malicious and return corrupted results to the master. We design a framework for security against malicious workers in private matrix-matrix multiplication. The main idea is a careful use of Freivalds' algorithm to detect erroneous matrix multiplications. Our main goal is to apply this security framework to schemes with adaptive rates. Adaptive schemes divide the workers into clusters and thus provide flexibility in trading decoding complexity for efficiency. Our new scheme, SRPM3, provides a computationally efficient security check per cluster that detects the presence of one or more malicious workers with high probability. An additional per worker check is used to identify the malicious nodes. SRPM3 can tolerate the presence of an arbitrary number of malicious workers. We provide theoretical guarantees on the complexity of the security checks and simulation results on both, the missed detection rate as well as on the time needed for the integrity check.
△ Less
Submitted 14 February, 2022; v1 submitted 12 August, 2021;
originally announced August 2021.
-
Optimal Codes Correcting Localized Deletions
Authors:
Rawad Bitar,
Serge Kas Hanna,
Nikita Polyanskii,
Ilya Vorobyev
Abstract:
We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of size $k$, where the positions of the deletions within this window are not necessarily consecutive. Localized deletions are thus a generalization of burst deletio…
▽ More
We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of size $k$, where the positions of the deletions within this window are not necessarily consecutive. Localized deletions are thus a generalization of burst deletions that occur in consecutive positions. We present novel explicit codes that are efficiently encodable and decodable and can correct up to $k$ localized deletions. Furthermore, these codes have $\log n+\mathcal{O}(k \log^2 (k\log n))$ redundancy, where $n$ is the length of the information message, which is asymptotically optimal in $n$ for $k=o(\log n/(\log \log n)^2)$.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Detecting Deletions and Insertions in Concatenated Strings with Optimal Redundancy
Authors:
Serge Kas Hanna,
Rawad Bitar
Abstract:
We study codes that can detect the exact number of deletions and insertions in concatenated binary strings. We construct optimal codes for the case of detecting up to $\del$ deletions. We prove the optimality of these codes by deriving a converse result which shows that the redundancy of our codes is asymptotically optimal in $\del$ among all families of deletion detecting codes, and particularly…
▽ More
We study codes that can detect the exact number of deletions and insertions in concatenated binary strings. We construct optimal codes for the case of detecting up to $\del$ deletions. We prove the optimality of these codes by deriving a converse result which shows that the redundancy of our codes is asymptotically optimal in $\del$ among all families of deletion detecting codes, and particularly optimal among all block-by-block decodable codes. For the case of insertions, we construct codes that can detect up to $2$ insertions in each concatenated binary string.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Network Coding with Myopic Adversaries
Authors:
Sijie Li,
Rawad Bitar,
Sidharth Jaggi,
Yihan Zhang
Abstract:
We consider the problem of reliable communication over a network containing a hidden {\it myopic} adversary who can eavesdrop on some $z_{ro}$ links, jam some $z_{wo}$ links, and do both on some $z_{rw}$ links. We provide the first information-theoretically tight characterization of the optimal rate of communication possible under all possible settings of the tuple $(z_{ro},z_{wo},z_{rw})$ by prov…
▽ More
We consider the problem of reliable communication over a network containing a hidden {\it myopic} adversary who can eavesdrop on some $z_{ro}$ links, jam some $z_{wo}$ links, and do both on some $z_{rw}$ links. We provide the first information-theoretically tight characterization of the optimal rate of communication possible under all possible settings of the tuple $(z_{ro},z_{wo},z_{rw})$ by providing a novel coding scheme/analysis for a subset of parameter regimes. In particular, our vanishing-error schemes bypass the Network Singleton Bound (which requires a zero-error recovery criteria) in a certain parameter regime where the capacity had been heretofore open. As a direct corollary we also obtain the capacity of the corresponding problem where information-theoretic secrecy against eavesdropping is required in addition to reliable communication.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Function-Correcting Codes
Authors:
Andreas Lenz,
Rawad Bitar,
Antonia Wachter-Zeh,
Eitan Yaakobi
Abstract:
In this paper we study function-correcting codes, a new class of codes designed to protect the function evaluation of a message against errors. We show that FCCs are equivalent to irregular-distance codes, i.e., codes that obey some given distance requirement between each pair of codewords. Using these connections, we study irregular-distance codes and derive general upper and lower bounds on thei…
▽ More
In this paper we study function-correcting codes, a new class of codes designed to protect the function evaluation of a message against errors. We show that FCCs are equivalent to irregular-distance codes, i.e., codes that obey some given distance requirement between each pair of codewords. Using these connections, we study irregular-distance codes and derive general upper and lower bounds on their optimal redundancy. Since these bounds heavily depend on the specific function, we provide simplified, suboptimal bounds that are easier to evaluate. We further employ our general results to specific functions of interest and compare our results to standard error-correcting codes, which protect the whole message.
△ Less
Submitted 22 May, 2023; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Multiple Criss-Cross Insertion and Deletion Correcting Codes
Authors:
Lorenz Welter,
Rawad Bitar,
Antonia Wachter-Zeh,
Eitan Yaakobi
Abstract:
This paper investigates the problem of correcting multiple criss-cross insertions and deletions in arrays. More precisely, we study the unique recovery of $n \times n$ arrays affected by $t$-criss-cross deletions defined as any combination of $t_r$ row and $t_c$ column deletions such that $t_r + t_c = t$ for a given $t$. We show an equivalence between correcting $t$-criss-cross deletions and $t$-c…
▽ More
This paper investigates the problem of correcting multiple criss-cross insertions and deletions in arrays. More precisely, we study the unique recovery of $n \times n$ arrays affected by $t$-criss-cross deletions defined as any combination of $t_r$ row and $t_c$ column deletions such that $t_r + t_c = t$ for a given $t$. We show an equivalence between correcting $t$-criss-cross deletions and $t$-criss-cross insertions and show that a code correcting $t$-criss-cross insertions/deletions has redundancy at least $tn + t \log n - \log(t!)$. Then, we present an existential construction of $t$-criss-cross insertion/deletion correcting code with redundancy bounded from above by $tn + \mathcal{O}(t^2 \log^2 n)$. The main ingredients of the presented code construction are systematic binary $t$-deletion correcting codes and Gabidulin codes. The first ingredient helps locating the indices of the inserted/deleted rows and columns, thus transforming the insertion/deletion-correction problem into a row/column erasure-correction problem which is then solved using the second ingredient.
△ Less
Submitted 15 November, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Adaptive Private Distributed Matrix Multiplication
Authors:
Rawad Bitar,
Marvin Xhemrishi,
Antonia Wachter-Zeh
Abstract:
We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices $\mathbf{A}$ and $\mathbf{B}$ and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the mas…
▽ More
We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices $\mathbf{A}$ and $\mathbf{B}$ and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the master to assign tasks to the workers and then wait for a predetermined number of workers to finish their assigned tasks. The size of the tasks, hence the rate of the scheme, depends on the number of workers that the master waits for. We design a rateless private matrix-matrix multiplication scheme, called RPM3. In contrast to fixed-rate schemes, our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps sending tasks and receiving results until it can decode the multiplication; rendering the scheme flexible and adaptive to heterogeneous environments. Despite resulting in a smaller rate than known straggler-tolerant schemes, RPM3 provides a smaller mean waiting time of the master by leveraging the heterogeneity of the workers. The waiting time is studied under two different models for the workers' service time. We provide upper bounds for the mean waiting time under both models. In addition, we provide lower bounds on the mean waiting time under the worker-dependent fixed service time model.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Criss-Cross Insertion and Deletion Correcting Codes
Authors:
Rawad Bitar,
Lorenz Welter,
Ilia Smagloy,
Antonia Wachter-Zeh,
Eitan Yaakobi
Abstract:
This paper studies the problem of constructing codes correcting deletions in arrays. Under this model, it is assumed that an $n\times n$ array can experience deletions of rows and columns. These deletion errors are referred to as $(t_r,t_c)$-criss-cross deletions if $t_r$ rows and $t_c$ columns are deleted, while a code correcting these deletion patterns is called a $(t_r,t_c)$-criss-cross deletio…
▽ More
This paper studies the problem of constructing codes correcting deletions in arrays. Under this model, it is assumed that an $n\times n$ array can experience deletions of rows and columns. These deletion errors are referred to as $(t_r,t_c)$-criss-cross deletions if $t_r$ rows and $t_c$ columns are deleted, while a code correcting these deletion patterns is called a $(t_r,t_c)$-criss-cross deletion correction code. The definitions for criss-cross insertions are similar.
It is first shown that when $t_r=t_c$ the problems of correcting criss-cross deletions and criss-cross insertions are equivalent. The focus of this paper lies on the case of $(1,1)$-criss-cross deletions. A non-asymptotic upper bound on the cardinality of $(1,1)$-criss-cross deletion correction codes is shown which assures that the redundancy is at least $2n-3+2\log n$ bits. A code construction with an existential encoding and an explicit decoding algorithm is presented. The redundancy of the construction is at most $2n+4 \log n + 7 +2 \log e$. A construction with explicit encoder and decoder is presented. The explicit encoder adds an extra $5\log n + 5$ bits of redundancy to the construction.
△ Less
Submitted 1 June, 2021; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Rateless Codes for Private Distributed Matrix-Matrix Multiplication
Authors:
Rawad Bitar,
Marvin Xhemrishi,
Antonia Wachter-Zeh
Abstract:
We consider the problem of designing rateless coded private distributed matrix-matrix multiplication. A master server owns two private matrices $\mathbf{A}$ and $\mathbf{B}$ and wants to hire worker nodes to help compute the multiplication. The matrices should remain private from the workers, in an information-theoretic sense. This problem has been considered in the literature and codes with a pre…
▽ More
We consider the problem of designing rateless coded private distributed matrix-matrix multiplication. A master server owns two private matrices $\mathbf{A}$ and $\mathbf{B}$ and wants to hire worker nodes to help compute the multiplication. The matrices should remain private from the workers, in an information-theoretic sense. This problem has been considered in the literature and codes with a predesigned threshold are constructed. More precisely, the master assigns tasks to the workers and waits for a predetermined number of workers to finish their assigned tasks. The size of the tasks assigned to the workers depends on the designed threshold. We are interested in settings where the size of the task must be small and independent of the designed threshold. We design a rateless private matrix-matrix multiplications scheme, called RPM3. Our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps receiving results until it can decode the multiplication. Two main applications require this property: i) leverage the possible heterogeneity in the system and assign more tasks to workers that are faster; and ii) assign tasks adaptively to account for a possibly time-varying system.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Authors:
Serge Kas Hanna,
Rawad Bitar,
Parimal Parag,
Venkat Dasari,
Salim El Rouayheb
Abstract:
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating…
▽ More
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive $k$. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Communication Efficient Secret Sharing in the Presence of Malicious Adversary
Authors:
Rawad Bitar,
Sidharth Jaggi
Abstract:
Consider the communication efficient secret sharing problem. A dealer wants to share a secret with $n$ parties such that any $k\leq n$ parties can reconstruct the secret and any $z<k$ parties eavesdropping on their shares obtain no information about the secret. In addition, a legitimate user contacting any $d$, $k\leq d \leq n$, parties to decode the secret can do so by reading and downloading the…
▽ More
Consider the communication efficient secret sharing problem. A dealer wants to share a secret with $n$ parties such that any $k\leq n$ parties can reconstruct the secret and any $z<k$ parties eavesdropping on their shares obtain no information about the secret. In addition, a legitimate user contacting any $d$, $k\leq d \leq n$, parties to decode the secret can do so by reading and downloading the minimum amount of information needed. We are interested in communication efficient secret sharing schemes that tolerate the presence of malicious parties actively corrupting their shares and the data delivered to the users. The knowledge of the malicious parties about the secret is restricted to the shares they obtain. We characterize the capacity, i.e. maximum size of the secret that can be shared. We derive the minimum amount of information needed to to be read and communicated to a legitimate user to decode the secret from $d$ parties, $k\leq d \leq n$. Error-correcting codes do not achieve capacity in this setting. We construct codes that achieve capacity and achieve minimum read and communication costs for all possible values of $d$. Our codes are based on Staircase codes, previously introduced for communication efficient secret sharing, and on the use of a pairwise hashing scheme used in distributed data storage and network coding settings to detect errors inserted by a limited knowledge adversary.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Private and Rateless Adaptive Coded Matrix-Vector Multiplication
Authors:
Rawad Bitar,
Yuxuan Xing,
Yasaman Keshtkarjahromi,
Venkat Dasari,
Salim El Rouayheb,
Hulya Seferoglu
Abstract:
Edge computing is emerging as a new paradigm to allow processing data near the edge of the network, where the data is typically generated and collected. This enables critical computations at the edge in applications such as Internet of Things (IoT), in which an increasing number of devices (sensors, cameras, health monitoring devices, etc.) collect data that needs to be processed through computati…
▽ More
Edge computing is emerging as a new paradigm to allow processing data near the edge of the network, where the data is typically generated and collected. This enables critical computations at the edge in applications such as Internet of Things (IoT), in which an increasing number of devices (sensors, cameras, health monitoring devices, etc.) collect data that needs to be processed through computationally intensive algorithms with stringent reliability, security and latency constraints.
Our key tool is the theory of coded computation, which advocates mixing data in computationally intensive tasks by employing erasure codes and offloading these tasks to other devices for computation. Coded computation is recently gaining interest, thanks to its higher reliability, smaller delay, and lower communication costs. In this paper, we develop a private and rateless adaptive coded computation (PRAC) algorithm for distributed matrix-vector multiplication by taking into account (i) the privacy requirements of IoT applications and devices, and (ii) the heterogeneous and time-varying resources of edge devices. We show that PRAC outperforms known secure coded computing methods when resources are heterogeneous. We provide theoretical guarantees on the performance of PRAC and its comparison to baselines. Moreover, we confirm our theoretical results through simulations and implementations on Android-based smartphones.
△ Less
Submitted 27 September, 2019;
originally announced September 2019.
-
Secure Coded Cooperative Computation at the Heterogeneous Edge against Byzantine Attacks
Authors:
Yasaman Keshtkarjahromi,
Rawad Bitar,
Venkat Dasari,
Salim El Rouayheb,
Hulya Seferoglu
Abstract:
Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, offloading tasks to other devices leaves the edge computing applications at the complete mercy of an attacker. One of the attacks, which is also the focus of this work, is Byzantine attac…
▽ More
Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, offloading tasks to other devices leaves the edge computing applications at the complete mercy of an attacker. One of the attacks, which is also the focus of this work, is Byzantine attacks, where one or more devices can corrupt the offloaded tasks. Furthermore, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of the devices at the edge. In this paper, we develop a secure coded cooperative computation mechanism (SC3) that provides both security and computation efficiency guarantees by gracefully combining homomorphic hash functions and coded cooperative computation. Homomorphic hash functions are used against Byzantine attacks and coded cooperative computation is used to improve computation efficiency when edge resources are heterogeneous and time-varying. Simulations results show that SC3 improves task completion delay significantly.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning
Authors:
Rawad Bitar,
Mary Wootters,
Salim El Rouayheb
Abstract:
We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are \em stragglers\em---that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called \em Stochast…
▽ More
We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are \em stragglers\em---that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called \em Stochastic Gradient Coding \em (SGC), which works when the stragglers are random. SGC distributes data points redundantly to workers according to a pair-wise balanced design, and then simply ignores the stragglers. We prove that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent (SGD) for the $\ell_2$ loss function, and show how the convergence rate can improve with the redundancy. We also provide bounds for more general convex loss functions. We show empirically that SGC requires a small amount of redundancy to handle a large number of stragglers and that it can outperform existing approximate gradient codes when the number of stragglers is large.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Staircase-PIR: Universally Robust Private Information Retrieval
Authors:
Rawad Bitar,
Salim El Rouayheb
Abstract:
We consider the problem of designing private information retrieval (PIR) schemes on data of $m$ files replicated on $n$ servers that can possibly collude. We focus on devising robust PIR schemes that can tolerate stragglers, i.e., slow or unresponsive servers. In many settings, the number of stragglers is not known a priori or may change with time. We define universally robust PIR as schemes that…
▽ More
We consider the problem of designing private information retrieval (PIR) schemes on data of $m$ files replicated on $n$ servers that can possibly collude. We focus on devising robust PIR schemes that can tolerate stragglers, i.e., slow or unresponsive servers. In many settings, the number of stragglers is not known a priori or may change with time. We define universally robust PIR as schemes that achieve PIR capacity asymptotically in $m$ and simultaneously for any number of stragglers up to a given threshold. We introduce Staircase-PIR schemes and prove that they are universally robust. Towards that end, we establish an equivalence between robust PIR and communication efficient secret sharing.
△ Less
Submitted 5 September, 2018; v1 submitted 22 June, 2018;
originally announced June 2018.
-
Minimizing Latency for Secure Coded Computing Using Secret Sharing via Staircase Codes
Authors:
Rawad Bitar,
Parimal Parag,
Salim El Rouayheb
Abstract:
We consider the setting of a Master server, M, who possesses confidential data (e.g., personal, genomic or medical data) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The Master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private and…
▽ More
We consider the setting of a Master server, M, who possesses confidential data (e.g., personal, genomic or medical data) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The Master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private and not revealed to the individual workers. Some of the workers may be stragglers, e.g., slow or busy, and will take a random time to finish the task assigned to them. We are interested in reducing the delays experienced by the Master. We focus on linear computations as an essential operation in many iterative algorithms such as principal component analysis, support vector machines and other gradient-descent based algorithms. A classical solution is to use a linear secret sharing scheme, such as Shamir's scheme, to divide the data into secret shares on which the workers can perform linear computations. However, classical codes can provide straggler mitigation assuming a worst-case scenario of a fixed number of stragglers. We propose a solution based on new secure codes, called Staircase codes, introduced previously by two of the authors. Staircase codes allow flexibility in the number of stragglers up to a given maximum, and universally achieve the information theoretic limit on the download cost by the Master, leading to latency reduction. Under the shifted exponential model, we find upper and lower bounds on the Master's mean waiting time. We derive the distribution of the Master's waiting time, and its mean, for systems with up to two stragglers. For systems with any number of stragglers, we derive an expression that can give the exact distribution, and the mean, of the waiting time of the Master. We show that Staircase codes always outperform classical secret sharing codes.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
Minimizing Latency for Secure Distributed Computing
Authors:
Rawad Bitar,
Parimal Parag,
Salim El Rouayheb
Abstract:
We consider the setting of a master server who possesses confidential data (genomic, medical data, etc.) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private (in an information t…
▽ More
We consider the setting of a master server who possesses confidential data (genomic, medical data, etc.) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private (in an information theoretic sense) and not revealed to the individual workers. The workers may be busy, or even unresponsive, and will take a random time to finish the task assigned to them. We are interested in reducing the aggregate delay experienced by the master. We focus on linear computations as an essential operation in many iterative algorithms. A known solution is to use a linear secret sharing scheme to divide the data into secret shares on which the workers can compute. We propose to use instead new secure codes, called Staircase codes, introduced previously by two of the authors. We study the delay induced by Staircase codes which is always less than that of secret sharing. The reason is that secret sharing schemes need to wait for the responses of a fixed fraction of the workers, whereas Staircase codes offer more flexibility in this respect. For instance, for codes with rate $R=1/2$ Staircase codes can lead to up to $40\%$ reduction in delay compared to secret sharing.
△ Less
Submitted 4 March, 2017;
originally announced March 2017.
-
Staircase Codes for Secret Sharing with Optimal Communication and Read Overheads
Authors:
Rawad Bitar,
Salim El Rouayheb
Abstract:
We study the communication efficient secret sharing (CESS) problem introduced by Huang, Langberg, Kliewer and Bruck. A classical threshold secret sharing scheme randomly encodes a secret into $n$ shares given to $n$ parties, such that any set of at least $t$, $t<n$, parties can reconstruct the secret, and any set of at most $z$, $z<t$, parties cannot obtain any information about the secret. Recent…
▽ More
We study the communication efficient secret sharing (CESS) problem introduced by Huang, Langberg, Kliewer and Bruck. A classical threshold secret sharing scheme randomly encodes a secret into $n$ shares given to $n$ parties, such that any set of at least $t$, $t<n$, parties can reconstruct the secret, and any set of at most $z$, $z<t$, parties cannot obtain any information about the secret. Recently, Huang et al. characterized the achievable minimum communication overhead (CO) necessary for a legitimate user to decode the secret when contacting $d\geq t$ parties and presented explicit code constructions achieving minimum CO for $d=n$. The intuition behind the possible savings on CO is that the user is only interested in decoding the secret and does not have to decode the random keys involved in the encoding process. In this paper, we introduce a new class of linear CESS codes called Staircase Codes over any field $GF(q)$, for any prime power $q> n$. We describe two explicit constructions of Staircase codes that achieve minimum communication and read overheads respectively for a fixed $d$, and universally for all possible values of $d, t\leq d\leq n$.
△ Less
Submitted 3 November, 2016; v1 submitted 9 December, 2015;
originally announced December 2015.
-
Securing Data against Limited-Knowledge Adversaries in Distributed Storage Systems
Authors:
Rawad Bitar,
Salim El Rouayheb
Abstract:
We study the problem of constructing secure regenerating codes that protect data integrity in distributed storage systems (DSS) in which some nodes may be compromised by a malicious adversary. The adversary can corrupt the data stored on and transmitted by the nodes under its control. The "damage" incurred by the actions of the adversary depends on how much information it knows about the data in t…
▽ More
We study the problem of constructing secure regenerating codes that protect data integrity in distributed storage systems (DSS) in which some nodes may be compromised by a malicious adversary. The adversary can corrupt the data stored on and transmitted by the nodes under its control. The "damage" incurred by the actions of the adversary depends on how much information it knows about the data in the whole DSS. We focus on the limited-knowledge model in which the adversary knows only the data on the nodes under its control. The only secure capacity-achieving codes known in the literature for this model are for the bandwidth-limited regime and repair degree $d=n-1$, i.e., when a node fails in a DSS with $n$ nodes all the remaining $n-1$ nodes are contacted for repair. We extend these results to the more general case of $d\leq n-1$ in the bandwidth-limited regime. Our capacity-achieving scheme is based on the use of product-matrix codes with special hashing functions and allow the identification of the compromised nodes and their elimination from the DSS while preserving the data integrity.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.