Heterogeneity Matters even More in Distributed Learning: Study from Generalization Perspective
Authors:
Masoud Kavian,
Romain Chor,
Milad Sefidgaran,
Abdellatif Zaidi
Abstract:
In this paper, we investigate the effect of data heterogeneity across clients on the performance of distributed learning systems, i.e., one-round Federated Learning, as measured by the associated generalization error. Specifically, $K$ clients have each $n$ training samples generated independently according to a possibly different data distribution, and their individually chosen models are aggrega…
▽ More
In this paper, we investigate the effect of data heterogeneity across clients on the performance of distributed learning systems, i.e., one-round Federated Learning, as measured by the associated generalization error. Specifically, $K$ clients have each $n$ training samples generated independently according to a possibly different data distribution, and their individually chosen models are aggregated by a central server. We study the effect of the discrepancy between the clients' data distributions on the generalization error of the aggregated model. First, we establish in-expectation and tail upper bounds on the generalization error in terms of the distributions. In part, the bounds extend the popular Conditional Mutual Information (CMI) bound, which was developed for the centralized learning setting, i.e., $K=1$, to the distributed learning setting with an arbitrary number of clients $K \geq 1$. Then, we connect with information-theoretic rate-distortion theory to derive possibly tighter \textit{lossy} versions of these bounds. Next, we apply our lossy bounds to study the effect of data heterogeneity across clients on the generalization error for the distributed classification problem in which each client uses Support Vector Machines (DSVM). In this case, we establish explicit generalization error bounds that depend explicitly on the data heterogeneity degree. It is shown that the bound gets smaller as the degree of data heterogeneity across clients increases, thereby suggesting that DSVM generalizes better when the dissimilarity between the clients' training samples is bigger. This finding, which goes beyond DSVM, is validated experimentally through several experiments.
△ Less
Submitted 20 May, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
Output Statistics of Random Binning: Tsallis Divergence and Its Applications
Authors:
Masoud Kavian,
Mohammad Mahdi Mojahedian,
Mohammad Hossein Yassaee,
Mahtab Mirmohseni,
Mohammad Reza Aref
Abstract:
Random binning is a widely used technique in information theory with diverse applications. In this paper, we focus on the output statistics of random binning (OSRB) using the Tsallis divergence $T_α$. We analyze all values of $α\in (0, \infty)\cup\{\infty\}$ and consider three scenarios: (i) the binned sequence is generated i.i.d., (ii) the sequence is randomly chosen from an $ε$-typical set, and…
▽ More
Random binning is a widely used technique in information theory with diverse applications. In this paper, we focus on the output statistics of random binning (OSRB) using the Tsallis divergence $T_α$. We analyze all values of $α\in (0, \infty)\cup\{\infty\}$ and consider three scenarios: (i) the binned sequence is generated i.i.d., (ii) the sequence is randomly chosen from an $ε$-typical set, and (iii) the sequence originates from an $ε$-typical set and is passed through a non-memoryless virtual channel. Our proofs cover both achievability and converse results. To address the unbounded nature of $T_\infty$, we extend the OSRB framework using Rényi's divergence with order infinity, denoted $D_\infty$. As part of our exploration, we analyze a specific form of Rényi's conditional entropy and its properties. Additionally, we demonstrate the application of this framework in deriving achievability results for the wiretap channel, where Tsallis divergence serves as a security measure. The secure rate we obtain through the OSRB analysis matches the secure capacity for $α\in (0, 2]\cup\{{\infty}\}$ and serves as a potential candidate for the secure capacity when $α\in (2, \infty)$.
△ Less
Submitted 22 November, 2024; v1 submitted 25 April, 2023;
originally announced April 2023.