-
On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Authors:
Safwan Labbi,
Paul Mangold,
Daniil Tiapkin,
Eric Moulines
Abstract:
Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federa…
▽ More
Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federated policy gradient (FedPG) algorithms employing local updates, under a Łojasiewicz condition that holds only for each individual agent, in both entropy-regularized and non-regularized scenarios. Crucially, our theoretical analysis shows that FedPG attains linear speed-up with respect to the number of agents, a property central to efficient federated learning. Leveraging insights from our theoretical findings, we introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization coupled with an appropriate regularization scheme. We further demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies. Finally, we demonstrate that empirically both FedPG and b-RS-FedPG consistently outperform federated Q-learning on heterogeneous settings.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Authors:
Safwan Labbi,
Daniil Tiapkin,
Lorenzo Mancini,
Paul Mangold,
Eric Moulines
Abstract:
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to hete…
▽ More
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$'s communication complexity only marginally increases with the number of agents.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Joint Channel Selection using FedDRL in V2X
Authors:
Lorenzo Mancini,
Safwan Labbi,
Karim Abed Meraim,
Fouzi Boukhalfa,
Alain Durmus,
Paul Mangold,
Eric Moulines
Abstract:
Vehicle-to-everything (V2X) communication technology is revolutionizing transportation by enabling interactions between vehicles, devices, and infrastructures. This connectivity enhances road safety, transportation efficiency, and driver assistance systems. V2X benefits from Machine Learning, enabling real-time data analysis, better decision-making, and improved traffic predictions, making transpo…
▽ More
Vehicle-to-everything (V2X) communication technology is revolutionizing transportation by enabling interactions between vehicles, devices, and infrastructures. This connectivity enhances road safety, transportation efficiency, and driver assistance systems. V2X benefits from Machine Learning, enabling real-time data analysis, better decision-making, and improved traffic predictions, making transportation safer and more efficient. In this paper, we study the problem of joint channel selection, where vehicles with different technologies choose one or more Access Points (APs) to transmit messages in a network. In this problem, vehicles must learn a strategy for channel selection, based on observations that incorporate vehicles' information (position and speed), network and communication data (Signal-to-Interference-plus-Noise Ratio from past communications), and environmental data (road type). We propose an approach based on Federated Deep Reinforcement Learning (FedDRL), which enables each vehicle to benefit from other vehicles' experiences. Specifically, we apply the federated Proximal Policy Optimization (FedPPO) algorithm to this task. We show that this method improves communication reliability while minimizing transmission costs and channel switches. The efficiency of the proposed solution is assessed via realistic simulations, highlighting the potential of FedDRL to advance V2X technology.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning
Authors:
Paul Mangold,
Sergey Samsonov,
Safwan Labbi,
Ilya Levin,
Reda Alami,
Alexey Naumov,
Eric Moulines
Abstract:
In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that u…
▽ More
In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.
△ Less
Submitted 27 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.