Search | arXiv e-print repository

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim

Abstract: As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to rea… ▽ More As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only $\sim700$ examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model sizes and families. Importantly, improvements transfer from this synthetic dataset to established CI benchmarks such as PrivacyLens that has human annotations and evaluates privacy leakage of AI assistants in actions and tool calls. △ Less

Submitted 29 May, 2025; originally announced June 2025.

ACM Class: I.2.6; I.2.7

arXiv:2506.00374 [pdf, ps, other]

Physics-based Generative Models for Geometrically Consistent and Interpretable Wireless Channel Synthesis

Authors: Satyavrat Wagle, Akshay Malhotra, Shahab Hamidi-Rad, Aditya Sant, David J. Love, Christopher G. Brinton

Abstract: In recent years, machine learning (ML) methods have become increasingly popular in wireless communication systems for several applications. A critical bottleneck for designing ML systems for wireless communications is the availability of realistic wireless channel datasets, which are extremely resource-intensive to produce. To this end, the generation of realistic wireless channels plays a key rol… ▽ More In recent years, machine learning (ML) methods have become increasingly popular in wireless communication systems for several applications. A critical bottleneck for designing ML systems for wireless communications is the availability of realistic wireless channel datasets, which are extremely resource-intensive to produce. To this end, the generation of realistic wireless channels plays a key role in the subsequent design of effective ML algorithms for wireless communication systems. Generative models have been proposed to synthesize channel matrices, but outputs produced by such methods may not correspond to geometrically viable channels and do not provide any insight into the scenario being generated. In this work, we aim to address both these issues by integrating established parametric, physics-based geometric channel (PPGC) modeling frameworks with generative methods to produce realistic channel matrices with interpretable representations in the parameter domain. We show that generative models converge to prohibitively suboptimal stationary points when learning the underlying prior directly over the parameters due to the non-convex PPGC model. To address this limitation, we propose a linearized reformulation of the problem to ensure smooth gradient flow during generative model training, while also providing insights into the underlying physical environment. We evaluate our model against prior baselines by comparing the generated, scenario-specific samples in terms of the 2-Wasserstein distance and through its utility when used for downstream compression tasks. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.24149 [pdf, ps, other]

RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget

Authors: Adam Piaseczny, Md Kamran Chowdhury Shisher, Shiqiang Wang, Christopher G. Brinton

Abstract: Machine learning (ML) algorithms deployed in real-world environments are often faced with the challenge of adapting models to concept drift, where the task data distributions are shifting over time. The problem becomes even more difficult when model performance must be maintained under adherence to strict resource constraints. Existing solutions often depend on drift-detection methods that produce… ▽ More Machine learning (ML) algorithms deployed in real-world environments are often faced with the challenge of adapting models to concept drift, where the task data distributions are shifting over time. The problem becomes even more difficult when model performance must be maintained under adherence to strict resource constraints. Existing solutions often depend on drift-detection methods that produce high computational overhead for resource-constrained environments, and fail to provide strict guarantees on resource usage or theoretical performance assurances. To address these shortcomings, we propose RCCDA: a dynamic model update policy that optimizes ML training dynamics while ensuring strict compliance to predefined resource constraints, utilizing only past loss information and a tunable drift threshold. In developing our policy, we analytically characterize the evolution of model loss under concept drift with arbitrary training update decisions. Integrating these results into a Lyapunov drift-plus-penalty framework produces a lightweight policy based on a measurable accumulated loss threshold that provably limits update frequency and cost. Experimental results on three domain generalization datasets demonstrate that our policy outperforms baseline methods in inference accuracy while adhering to strict resource constraints under several schedules of concept drift, making our solution uniquely suited for real-time ML deployments. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.04873 [pdf, other]

doi 10.1109/COMST.2025.3570288

Federated Learning for Cyber Physical Systems: A Comprehensive Survey

Authors: Minh K. Quan, Pubudu N. Pathirana, Mayuri Wijayasundara, Sujeeva Setunge, Dinh C. Nguyen, Christopher G. Brinton, David J. Love, H. Vincent Poor

Abstract: The integration of machine learning (ML) in cyber physical systems (CPS) is a complex task due to the challenges that arise in terms of real-time decision making, safety, reliability, device heterogeneity, and data privacy. There are also open research questions that must be addressed in order to fully realize the potential of ML in CPS. Federated learning (FL), a distributed approach to ML, has b… ▽ More The integration of machine learning (ML) in cyber physical systems (CPS) is a complex task due to the challenges that arise in terms of real-time decision making, safety, reliability, device heterogeneity, and data privacy. There are also open research questions that must be addressed in order to fully realize the potential of ML in CPS. Federated learning (FL), a distributed approach to ML, has become increasingly popular in recent years. It allows models to be trained using data from decentralized sources. This approach has been gaining popularity in the CPS field, as it integrates computer, communication, and physical processes. Therefore, the purpose of this work is to provide a comprehensive analysis of the most recent developments of FL-CPS, including the numerous application areas, system topologies, and algorithms developed in recent years. The paper starts by discussing recent advances in both FL and CPS, followed by their integration. Then, the paper compares the application of FL in CPS with its applications in the internet of things (IoT) in further depth to show their connections and distinctions. Furthermore, the article scrutinizes how FL is utilized in critical CPS applications, e.g., intelligent transportation systems, cybersecurity services, smart cities, and smart healthcare solutions. The study also includes critical insights and lessons learned from various FL-CPS implementations. The paper's concluding section delves into significant concerns and suggests avenues for further research in this fast-paced and dynamic era. △ Less

Submitted 7 May, 2025; originally announced May 2025.

Comments: This work has been accepted by IEEE Communications Surveys & Tutorials

Report number: 1553-877X

Journal ref: IEEE Communications Surveys & Tutorials 2025

arXiv:2504.15514 [pdf, other]

Learning-Based Two-Way Communications: Algorithmic Framework and Comparative Analysis

Authors: David R. Nickel, Anindya Bijoy Das, David J. Love, Christopher G. Brinton

Abstract: Machine learning (ML)-based feedback channel coding has garnered significant research interest in the past few years. However, there has been limited research exploring ML approaches in the so-called "two-way" setting where two users jointly encode messages and feedback for each other over a shared channel. In this work, we present a general architecture for ML-based two-way feedback coding, and s… ▽ More Machine learning (ML)-based feedback channel coding has garnered significant research interest in the past few years. However, there has been limited research exploring ML approaches in the so-called "two-way" setting where two users jointly encode messages and feedback for each other over a shared channel. In this work, we present a general architecture for ML-based two-way feedback coding, and show how several popular one-way schemes can be converted to the two-way setting through our framework. We compare such schemes against their one-way counterparts, revealing error-rate benefits of ML-based two-way coding in certain signal-to-noise ratio (SNR) regimes. We then analyze the tradeoffs between error performance and computational overhead for three state-of-the-art neural network coding models instantiated in the two-way paradigm. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: Currently under review for IEEE Communications Letters. 5 pages

arXiv:2504.08135 [pdf, other]

Communication-Efficient Cooperative Localization: A Graph Neural Network Approach

Authors: Yinan Zou, Christopher G. Brinton, Vishrant Tripathi

Abstract: Cooperative localization leverages noisy inter-node distance measurements and exchanged wireless messages to estimate node positions in a wireless network. In communication-constrained environments, however, transmitting large messages becomes problematic. In this paper, we propose an approach for communication-efficient cooperative localization that addresses two main challenges. First, cooperati… ▽ More Cooperative localization leverages noisy inter-node distance measurements and exchanged wireless messages to estimate node positions in a wireless network. In communication-constrained environments, however, transmitting large messages becomes problematic. In this paper, we propose an approach for communication-efficient cooperative localization that addresses two main challenges. First, cooperative localization often needs to be performed over wireless networks with loopy graph topologies. Second is the need for designing an algorithm that has low localization error while simultaneously requiring a much lower communication overhead. Existing methods fall short of addressing these two challenges concurrently. To achieve this, we propose a vector quantized message passing neural network (VQ-MPNN) for cooperative localization. Through end-to-end neural network training, VQ-MPNN enables the co-design of node localization and message compression. Specifically, VQ-MPNN treats prior node positions and distance measurements as node and edge features, respectively, which are encoded as node and edge states using a graph neural network. To find an efficient representation for the node state, we construct a vector quantized codebook for all node states such that instead of sending long messages, each node only needs to transmit a codeword index. Numerical evaluations demonstrates that our proposed VQ-MPNN approach can deliver localization errors that are similar to existing approaches while reducing the overall communication overhead by an order of magnitude. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.06235 [pdf, other]

Decentralized Federated Domain Generalization with Style Sharing: A Formal Modeling and Convergence Analysis

Authors: Shahryar Zehtabi, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

Abstract: Much of the federated learning (FL) literature focuses on settings where local dataset statistics remain the same between training and testing time. Recent advances in domain generalization (DG) aim to use data from source (training) domains to train a model that generalizes well to data from unseen target (testing) domains. In this paper, we are motivated by two major gaps in existing work on FL… ▽ More Much of the federated learning (FL) literature focuses on settings where local dataset statistics remain the same between training and testing time. Recent advances in domain generalization (DG) aim to use data from source (training) domains to train a model that generalizes well to data from unseen target (testing) domains. In this paper, we are motivated by two major gaps in existing work on FL and DG: (1) the lack of formal mathematical analysis of DG objectives and training processes; and (2) DG research in FL being limited to the conventional star-topology architecture. Addressing the second gap, we develop $\textit{Decentralized Federated Domain Generalization with Style Sharing}$ ($\texttt{StyleDDG}$), a fully decentralized DG algorithm designed to allow devices in a peer-to-peer network to achieve DG based on sharing style information inferred from their datasets. Additionally, we fill the first gap by providing the first systematic approach to mathematically analyzing style-based DG training optimization. We cast existing centralized DG algorithms within our framework, and employ their formalisms to model $\texttt{StyleDDG}$. Based on this, we obtain analytical conditions under which a sub-linear convergence rate of $\texttt{StyleDDG}$ can be obtained. Through experiments on two popular DG datasets, we demonstrate that $\texttt{StyleDDG}$ can obtain significant improvements in accuracy across target domains with minimal added communication overhead compared to decentralized gradient methods that do not employ style sharing. △ Less

Submitted 17 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.00849 [pdf, other]

Timely Trajectory Reconstruction in Finite Buffer Remote Tracking Systems

Authors: Sunjung Kang, Vishrant Tripathi, Christopher G. Brinton

Abstract: Remote tracking systems play a critical role in applications such as IoT, monitoring, surveillance and healthcare. In such systems, maintaining both real-time state awareness (for online decision making) and accurate reconstruction of historical trajectories (for offline post-processing) are essential. While the Age of Information (AoI) metric has been extensively studied as a measure of freshness… ▽ More Remote tracking systems play a critical role in applications such as IoT, monitoring, surveillance and healthcare. In such systems, maintaining both real-time state awareness (for online decision making) and accurate reconstruction of historical trajectories (for offline post-processing) are essential. While the Age of Information (AoI) metric has been extensively studied as a measure of freshness, it does not capture the accuracy with which past trajectories can be reconstructed. In this work, we investigate reconstruction error as a complementary metric to AoI, addressing the trade-off between timely updates and historical accuracy. Specifically, we consider three policies, each prioritizing different aspects of information management: Keep-Old, Keep-Fresh, and our proposed Inter-arrival-Aware dropping policy. We compare these policies in terms of impact on both AoI and reconstruction error in a remote tracking system with a finite buffer. Through theoretical analysis and numerical simulations of queueing behavior, we demonstrate that while the Keep-Fresh policy minimizes AoI, it does not necessarily minimize reconstruction error. In contrast, our proposed Inter-arrival-Aware dropping policy dynamically adjusts packet retention decisions based on generation times, achieving a balance between AoI and reconstruction error. Our results provide key insights into the design of efficient buffer management policies for resource-constrained IoT networks. △ Less

Submitted 18 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

arXiv:2503.23218 [pdf, other]

Multi-Agent Reinforcement Learning for Graph Discovery in D2D-Enabled Federated Learning

Authors: Satyavrat Wagle, Anindya Bijoy Das, David J. Love, Christopher G. Brinton

Abstract: Augmenting federated learning (FL) with device-to-device (D2D) communications can help improve convergence speed and reduce model bias through local information exchange. However, data privacy concerns, trust constraints between devices, and unreliable wireless channels each pose challenges in finding an effective yet resource efficient D2D graph structure. In this paper, we develop a decentralize… ▽ More Augmenting federated learning (FL) with device-to-device (D2D) communications can help improve convergence speed and reduce model bias through local information exchange. However, data privacy concerns, trust constraints between devices, and unreliable wireless channels each pose challenges in finding an effective yet resource efficient D2D graph structure. In this paper, we develop a decentralized reinforcement learning (RL) method for D2D graph discovery that promotes communication of impactful datapoints over reliable links for multiple learning paradigms, while following both data and device-specific trust constraints. An independent RL agent at each device trains a policy to predict the impact of incoming links in a decentralized manner without exposure of local data or significant communication overhead. For supervised settings, the D2D graph aims to improve device-specific label diversity without compromising system-level performance. For semi-supervised settings, we enable this by incorporating distributed label propagation. For unsupervised settings, we develop a variation-based diversity metric which estimates data diversity in terms of occupied latent space. Numerical experiments on five widely used datasets confirm that the data diversity improvements induced by our method increase convergence speed by up to 3 times while reducing energy consumption by up to 5 times. They also show that our method is resilient to stragglers and changes in the aggregation interval. Finally, we show that our method offers scalability benefits for larger system sizes without increases in relative overhead, and adaptability to various downstream FL architectures and to dynamic wireless environments. △ Less

Submitted 5 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

arXiv:2503.05988 [pdf, other]

Physics-Informed Generative Approaches for Wireless Channel Modeling

Authors: Satyavrat Wagle, Akshay Malhotra, Shahab Hamidi-Rad, Aditya Sant, David J. Love, Christopher G. Brinton

Abstract: In recent years, machine learning (ML) methods have become increasingly popular in wireless communication systems for several applications. A critical bottleneck for designing ML systems for wireless communications is the availability of realistic wireless channel datasets, which are extremely resource intensive to produce. To this end, the generation of realistic wireless channels plays a key rol… ▽ More In recent years, machine learning (ML) methods have become increasingly popular in wireless communication systems for several applications. A critical bottleneck for designing ML systems for wireless communications is the availability of realistic wireless channel datasets, which are extremely resource intensive to produce. To this end, the generation of realistic wireless channels plays a key role in the subsequent design of effective ML algorithms for wireless communication systems. Generative models have been proposed to synthesize channel matrices, but outputs produced by such methods may not correspond to geometrically viable channels and do not provide any insight into the scenario of interest. In this work, we aim to address both these issues by integrating a parametric, physics-based geometric channel (PBGC) modeling framework with generative methods. To address limitations with gradient flow through the PBGC model, a linearized reformulation is presented, which ensures smooth gradient flow during generative model training, while also capturing insights about the underlying physical environment. We evaluate our model against prior baselines by comparing the generated samples in terms of the 2-Wasserstein distance and through the utility of generated data when used for downstream compression tasks. △ Less

Submitted 26 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

arXiv:2502.20565 [pdf, other]

DPZV: Elevating the Tradeoff between Privacy and Utility in Zeroth-Order Vertical Federated Learning

Authors: Jianing Zhang, Evan Chen, Chaoyue Liu, Christopher G. Brinton

Abstract: Vertical Federated Learning (VFL) enables collaborative training with feature-partitioned data, yet remains vulnerable to privacy leakage through gradient transmissions. Standard differential privacy (DP) techniques such as DP-SGD are difficult to apply in this setting due to VFL's distributed nature and the high variance incurred by vector-valued noise. On the other hand, zeroth-order (ZO) optimi… ▽ More Vertical Federated Learning (VFL) enables collaborative training with feature-partitioned data, yet remains vulnerable to privacy leakage through gradient transmissions. Standard differential privacy (DP) techniques such as DP-SGD are difficult to apply in this setting due to VFL's distributed nature and the high variance incurred by vector-valued noise. On the other hand, zeroth-order (ZO) optimization techniques can avoid explicit gradient exposure but lack formal privacy guarantees. In this work, we propose DPZV, the first ZO optimization framework for VFL that achieves tunable DP with performance guarantees. DPZV overcomes these limitations by injecting low-variance scalar noise at the server, enabling controllable privacy with reduced memory overhead. We conduct a comprehensive theoretical analysis showing that DPZV matches the convergence rate of first-order optimization methods while satisfying formal ($ε, δ$)-DP guarantees. Experiments on image and language benchmarks demonstrate that DPZV outperforms several baselines in terms of accuracy under a wide range of privacy constraints ($ε\le 10$), thereby elevating the privacy-utility tradeoff in VFL. △ Less

Submitted 19 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.11007 [pdf, other]

Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings

Authors: Liangqi Yuan, Dong-Jun Han, Shiqiang Wang, Christopher G. Brinton

Abstract: Compared to traditional machine learning models, recent large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources. These unique characteristics of LLMs, together with their large model size, make their deployment more challenging. Specifically, (i) deploying LLMs on local devices faces computational, memory, and energy resource… ▽ More Compared to traditional machine learning models, recent large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources. These unique characteristics of LLMs, together with their large model size, make their deployment more challenging. Specifically, (i) deploying LLMs on local devices faces computational, memory, and energy resource issues, while (ii) deploying them in the cloud cannot guarantee real-time service and incurs communication/usage costs. In this paper, we design TMO, a local-cloud LLM inference system with Three-M Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO incorporates (i) a lightweight local LLM that can process simple tasks at high speed and (ii) a large-scale cloud LLM that can handle multi-modal data sources. We develop a resource-constrained reinforcement learning (RCRL) strategy for TMO that optimizes the inference location (i.e., local vs. cloud) and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward (response quality, latency, and usage cost) while adhering to resource constraints. We also contribute M4A1, a new dataset we curated that contains reward and cost metrics across multiple modality, task, dialogue, and LLM configurations, enabling evaluation of offloading decisions. We demonstrate the effectiveness of TMO compared to several exploration-decision and LLM-as-Agent baselines, showing significant improvements in latency, cost, and response quality. △ Less

Submitted 7 April, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

arXiv:2502.02877 [pdf, other]

Differentially-Private Multi-Tier Federated Learning: A Formal Analysis and Evaluation

Authors: Evan Chen, Frank Po-Chen Lin, Dong-Jun Han, Christopher G. Brinton

Abstract: While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. Differential privacy (DP) is often employed to address such issues. However, the impact of DP on FL in multi-tier networks -- where hierarchical aggregations couple noise injection decisions at different tiers, and trust models are… ▽ More While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. Differential privacy (DP) is often employed to address such issues. However, the impact of DP on FL in multi-tier networks -- where hierarchical aggregations couple noise injection decisions at different tiers, and trust models are heterogeneous across subnetworks -- is not well understood. To fill this gap, we develop \underline{M}ulti-Tier \underline{F}ederated Learning with \underline{M}ulti-Tier \underline{D}ifferential \underline{P}rivacy ({\tt M$^2$FDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance over such networks. One of the key principles of {\tt M$^2$FDP} is to adapt DP noise injection across the established edge/fog computing hierarchy (e.g., edge devices, intermediate nodes, and other tiers up to cloud servers) according to the trust models in different subnetworks. We conduct a comprehensive analysis of the convergence behavior of {\tt M$^2$FDP} under non-convex problem settings, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. We show how these relationships can be employed to develop an adaptive control algorithm for {\tt M$^2$FDP} that tunes properties of local model training to minimize energy, latency, and the stationarity gap while meeting desired convergence and privacy criterion. Subsequent numerical evaluations demonstrate that {\tt M$^2$FDP} obtains substantial improvements in these metrics over baselines for different privacy budgets and system configurations. △ Less

Submitted 4 February, 2025; originally announced February 2025.

Comments: This paper is under review in IEEE/ACM Transactions on Networking Special Issue on AI and Networking

arXiv:2501.19389 [pdf, other]

Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models

Authors: Wenzhi Fang, Dong-Jun Han, Liangqi Yuan, Seyyedali Hosseinalipour, Christopher G. Brinton

Abstract: Fine-tuning large language models (LLMs) on devices remains a challenging problem. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with device model sizes and data scarcity. Still, the heterogeneity of resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying device capabiliti… ▽ More Fine-tuning large language models (LLMs) on devices remains a challenging problem. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with device model sizes and data scarcity. Still, the heterogeneity of resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying device capabilities constrain LoRA's feasible rank range. Existing approaches attempting to resolve this issue either lack analytical justification or impose additional computational overhead, leaving a wide gap for efficient and theoretically-grounded solutions. To address these challenges, we propose federated sketching LoRA (FSLoRA), which leverages a sketching mechanism to enable devices to selectively update submatrices of global LoRA modules maintained by the server. By adjusting the sketching ratios, which determine the ranks of the submatrices on the devices, FSLoRA flexibly adapts to device-specific communication and computational constraints. We provide a rigorous convergence analysis of FSLoRA that characterizes how the sketching ratios affect the convergence rate. Through comprehensive experiments on multiple datasets and LLM models, we demonstrate FSLoRA's performance improvements compared to various baselines. The code is available at https://github.com/wenzhifang/Federated-Sketching-LoRA-Implementation. △ Less

Submitted 17 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: 27 pages

arXiv:2501.14205 [pdf, other]

Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading

Authors: Minrui Xu, Dusit Niyato, Christopher G. Brinton

Abstract: Large Language Models (LLMs) can perform zero-shot learning on unseen tasks and few-shot learning on complex reasoning tasks. However, resource-limited mobile edge networks struggle to support long-context LLM serving for LLM agents during multi-round interactions with users. Unlike stateless computation offloading and static service offloading in edge computing, optimizing LLM serving at edge ser… ▽ More Large Language Models (LLMs) can perform zero-shot learning on unseen tasks and few-shot learning on complex reasoning tasks. However, resource-limited mobile edge networks struggle to support long-context LLM serving for LLM agents during multi-round interactions with users. Unlike stateless computation offloading and static service offloading in edge computing, optimizing LLM serving at edge servers is challenging because LLMs continuously learn from context which raises accuracy, latency, and resource consumption dynamics. In this paper, we propose a joint model caching and inference offloading framework that utilizes test-time deep reinforcement learning (T2DRL) to optimize deployment and execution strategies for long-context LLM serving. In this framework, we analyze the performance convergence and design an optimization problem considering the utilization of context windows in LLMs. Furthermore, the T2DRL algorithm can learn in both the training phase and the testing phase to proactively manage cached models and service requests and adapt to context changes and usage patterns during execution. To further enhance resource allocation efficiency, we propose a double Dutch auction (DDA) mechanism, which dynamically matches supply and demand while maximizing social welfare. Finally, experimental results demonstrate that the T2DRL algorithm can reduce system costs by at least 30% compared to baselines while guaranteeing the performance of LLM agents in real-world perception and reasoning tasks. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.09320 [pdf, other]

Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning

Authors: Seohyun Lee, Wenzhi Fang, Anindya Bijoy Das, Seyyedali Hosseinalipour, David J. Love, Christopher G. Brinton

Abstract: Federated learning (FL) is vulnerable to backdoor attacks, where adversaries alter model behavior on target classification labels by embedding triggers into data samples. While these attacks have received considerable attention in horizontal FL, they are less understood for vertical FL (VFL), where devices hold different features of the samples, and only the server holds the labels. In this work,… ▽ More Federated learning (FL) is vulnerable to backdoor attacks, where adversaries alter model behavior on target classification labels by embedding triggers into data samples. While these attacks have received considerable attention in horizontal FL, they are less understood for vertical FL (VFL), where devices hold different features of the samples, and only the server holds the labels. In this work, we propose a novel backdoor attack on VFL which (i) does not rely on gradient information from the server and (ii) considers potential collusion among multiple adversaries for sample selection and trigger embedding. Our label inference model augments variational autoencoders with metric learning, which adversaries can train locally. A consensus process over the adversary graph topology determines which datapoints to poison. We further propose methods for trigger splitting across the adversaries, with an intensity-based implantation scheme skewing the server towards the trigger. Our convergence analysis reveals the impact of backdoor perturbations on VFL indicated by a stationarity gap for the trained model, which we verify empirically as well. We conduct experiments comparing our attack with recent backdoor VFL approaches, finding that ours obtains significantly higher success rates for the same main task performance despite not using server information. Additionally, our results verify the impact of collusion on attack performance. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: This paper is currently under review in the IEEE/ACM Transactions on Networking Special Issue on AI and Networking

arXiv:2501.04231 [pdf, other]

Computation and Communication Co-scheduling for Timely Multi-Task Inference at the Wireless Edge

Authors: Md Kamran Chowdhury Shisher, Adam Piaseczny, Yin Sun, Christopher G. Brinton

Abstract: In multi-task remote inference systems, an intelligent receiver (e.g., command center) performs multiple inference tasks (e.g., target detection) using data features received from several remote sources (e.g., edge sensors). Key challenges to facilitating timely inference in these systems arise from (i) limited computational power of the sources to produce features from their inputs, and (ii) limi… ▽ More In multi-task remote inference systems, an intelligent receiver (e.g., command center) performs multiple inference tasks (e.g., target detection) using data features received from several remote sources (e.g., edge sensors). Key challenges to facilitating timely inference in these systems arise from (i) limited computational power of the sources to produce features from their inputs, and (ii) limited communication resources of the channels to carry simultaneous feature transmissions to the receiver. We develop a novel computation and communication co-scheduling methodology which determines feature generation and transmission scheduling to minimize inference errors subject to these resource constraints. Specifically, we formulate the co-scheduling problem as a weakly-coupled Markov decision process with Age of Information (AoI)-based timeliness gauging the inference errors. To overcome its PSPACE-hard complexity, we analyze a Lagrangian relaxation of the problem, which yields gain indices assessing the improvement in inference error for each potential feature generation-transmission scheduling action. Based on this, we develop a maximum gain first (MGF) policy which we show is asymptotically optimal for the original problem as the number of inference tasks increases. Experiments demonstrate that MGF obtains significant improvements over baseline policies for varying tasks, channels, and sources. △ Less

Submitted 7 January, 2025; originally announced January 2025.

arXiv:2412.07029 [pdf, other]

Key Focus Areas and Enabling Technologies for 6G

Authors: Christopher G. Brinton, Mung Chiang, Kwang Taik Kim, David J. Love, Michael Beesley, Morris Repeta, John Roese, Per Beming, Erik Ekudden, Clara Li, Geng Wu, Nishant Batra, Amitava Ghosh, Volker Ziegler, Tingfang Ji, Rajat Prakash, John Smee

Abstract: We provide a taxonomy of a dozen enabling network architectures, protocols, and technologies that will define the evolution from 5G to 6G. These technologies span the network protocol stack, different target deployment environments, and various perceived levels of technical maturity. We outline four areas of societal focus that will be impacted by these technologies, and overview several research… ▽ More We provide a taxonomy of a dozen enabling network architectures, protocols, and technologies that will define the evolution from 5G to 6G. These technologies span the network protocol stack, different target deployment environments, and various perceived levels of technical maturity. We outline four areas of societal focus that will be impacted by these technologies, and overview several research directions that hold the potential to address the problems in these important focus areas. △ Less

Submitted 16 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: This paper has been accepted for publication in the IEEE Communications Magazine. Portions were released online as a report titled 6G Roadmap: A Global Taxonomy in November 2023

arXiv:2411.06618 [pdf, other]

Using Diffusion Models as Generative Replay in Continual Federated Learning -- What will Happen?

Authors: Yongsheng Mei, Liangqi Yuan, Dong-Jun Han, Kevin S. Chan, Christopher G. Brinton, Tian Lan

Abstract: Federated learning (FL) has become a cornerstone in decentralized learning, where, in many scenarios, the incoming data distribution will change dynamically over time, introducing continuous learning (CL) problems. This continual federated learning (CFL) task presents unique challenges, particularly regarding catastrophic forgetting and non-IID input data. Existing solutions include using a replay… ▽ More Federated learning (FL) has become a cornerstone in decentralized learning, where, in many scenarios, the incoming data distribution will change dynamically over time, introducing continuous learning (CL) problems. This continual federated learning (CFL) task presents unique challenges, particularly regarding catastrophic forgetting and non-IID input data. Existing solutions include using a replay buffer to store historical data or leveraging generative adversarial networks. Nevertheless, motivated by recent advancements in the diffusion model for generative tasks, this paper introduces DCFL, a novel framework tailored to address the challenges of CFL in dynamic distributed learning environments. Our approach harnesses the power of the conditional diffusion model to generate synthetic historical data at each local device during communication, effectively mitigating latent shifts in dynamic data distribution inputs. We provide the convergence bound for the proposed CFL framework and demonstrate its promising performance across multiple datasets, showcasing its effectiveness in tackling the complexities of CFL tasks. △ Less

Submitted 10 November, 2024; originally announced November 2024.

arXiv:2411.03365 [pdf, other]

Enhanced Real-Time Threat Detection in 5G Networks: A Self-Attention RNN Autoencoder Approach for Spectral Intrusion Analysis

Authors: Mohammadreza Kouchaki, Minglong Zhang, Aly S. Abdalla, Guangchen Lan, Christopher G. Brinton, Vuk Marojevic

Abstract: In the rapidly evolving landscape of 5G technology, safeguarding Radio Frequency (RF) environments against sophisticated intrusions is paramount, especially in dynamic spectrum access and management. This paper presents an enhanced experimental model that integrates a self-attention mechanism with a Recurrent Neural Network (RNN)-based autoencoder for the detection of anomalous spectral activities… ▽ More In the rapidly evolving landscape of 5G technology, safeguarding Radio Frequency (RF) environments against sophisticated intrusions is paramount, especially in dynamic spectrum access and management. This paper presents an enhanced experimental model that integrates a self-attention mechanism with a Recurrent Neural Network (RNN)-based autoencoder for the detection of anomalous spectral activities in 5G networks at the waveform level. Our approach, grounded in time-series analysis, processes in-phase and quadrature (I/Q) samples to identify irregularities that could indicate potential jamming attacks. The model's architecture, augmented with a self-attention layer, extends the capabilities of RNN autoencoders, enabling a more nuanced understanding of temporal dependencies and contextual relationships within the RF spectrum. Utilizing a simulated 5G Radio Access Network (RAN) test-bed constructed with srsRAN 5G and Software Defined Radios (SDRs), we generated a comprehensive stream of data that reflects real-world RF spectrum conditions and attack scenarios. The model is trained to reconstruct standard signal behavior, establishing a normative baseline against which deviations, indicative of security threats, are identified. The proposed architecture is designed to balance between detection precision and computational efficiency, so the LSTM network, enriched with self-attention, continues to optimize for minimal execution latency and power consumption. Conducted on a real-world SDR-based testbed, our results demonstrate the model's improved performance and accuracy in threat detection. Keywords: self-attention, real-time intrusion detection, RNN autoencoder, Transformer architecture, LSTM, time series anomaly detection, 5G Security, spectrum access security. △ Less

Submitted 5 November, 2024; originally announced November 2024.

Comments: This article has been accepted for publication in WiOpt 2024

arXiv:2410.05662 [pdf, ps, other]

Federated Learning with Dynamic Client Arrival and Departure: Convergence and Rapid Adaptation via Initial Model Construction

Authors: Zhan-Lun Chang, Dong-Jun Han, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton

Abstract: Most federated learning (FL) approaches assume a fixed client set. However, real-world scenarios often involve clients dynamically joining or leaving the system based on their needs or interest in specific tasks. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active client set, unlike traditional FL with a static objective; and (2) the current gl… ▽ More Most federated learning (FL) approaches assume a fixed client set. However, real-world scenarios often involve clients dynamically joining or leaving the system based on their needs or interest in specific tasks. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active client set, unlike traditional FL with a static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation. To address these challenges, we first provide a convergence analysis under a non-convex loss with a dynamic client set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity. Building on this analysis, we propose a model initialization algorithm that enables rapid adaptation to new client sets whenever clients join or leave the system. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current client set, thereby accelerating recovery from distribution shifts. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability in practice. Experimental results on diverse datasets including both image and text domains, varied label distributions, and multiple FL algorithms demonstrate the effectiveness of the proposed approach across a range of scenarios. △ Less

Submitted 30 May, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.18448 [pdf, other]

Hierarchical Federated Learning with Multi-Timescale Gradient Correction

Authors: Wenzhi Fang, Dong-Jun Han, Evan Chen, Shiqiang Wang, Christopher G. Brinton

Abstract: While traditional federated learning (FL) typically focuses on a star topology where clients are directly connected to a central server, real-world distributed systems often exhibit hierarchical architectures. Hierarchical FL (HFL) has emerged as a promising solution to bridge this gap, leveraging aggregation points at multiple levels of the system. However, existing algorithms for HFL encounter c… ▽ More While traditional federated learning (FL) typically focuses on a star topology where clients are directly connected to a central server, real-world distributed systems often exhibit hierarchical architectures. Hierarchical FL (HFL) has emerged as a promising solution to bridge this gap, leveraging aggregation points at multiple levels of the system. However, existing algorithms for HFL encounter challenges in dealing with multi-timescale model drift, i.e., model drift occurring across hierarchical levels of data heterogeneity. In this paper, we propose a multi-timescale gradient correction (MTGC) methodology to resolve this issue. Our key idea is to introduce distinct control variables to (i) correct the client gradient towards the group gradient, i.e., to reduce client model drift caused by local updates based on individual datasets, and (ii) correct the group gradient towards the global gradient, i.e., to reduce group model drift caused by FL over clients within the group. We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms. We show that our convergence bound is immune to the extent of data heterogeneity, confirming the stability of the proposed algorithm against multi-level non-i.i.d. data. Through extensive experiments on various datasets and models, we validate the effectiveness of MTGC in diverse HFL settings. The code for this project is available at \href{https://github.com/wenzhifang/MTGC}{https://github.com/wenzhifang/MTGC}. △ Less

Submitted 16 December, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

Comments: Accepted to NeurIPS 2024

arXiv:2409.17430 [pdf, other]

A Hierarchical Gradient Tracking Algorithm for Mitigating Subnet-Drift in Fog Learning Networks

Authors: Evan Chen, Shiqiang Wang, Christopher G. Brinton

Abstract: Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetwor… ▽ More Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: This paper is under review in IEEE/ACM Transactions on Networking

arXiv:2408.09522 [pdf, other]

Orchestrating Federated Learning in Space-Air-Ground Integrated Networks: Adaptive Data Offloading and Seamless Handover

Authors: Dong-Jun Han, Wenzhi Fang, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton

Abstract: Devices located in remote regions often lack coverage from well-developed terrestrial communication infrastructure. This not only prevents them from experiencing high quality communication services but also hinders the delivery of machine learning services in remote regions. In this paper, we propose a new federated learning (FL) methodology tailored to space-air-ground integrated networks (SAGINs… ▽ More Devices located in remote regions often lack coverage from well-developed terrestrial communication infrastructure. This not only prevents them from experiencing high quality communication services but also hinders the delivery of machine learning services in remote regions. In this paper, we propose a new federated learning (FL) methodology tailored to space-air-ground integrated networks (SAGINs) to tackle this issue. Our approach strategically leverages the nodes within space and air layers as both (i) edge computing units and (ii) model aggregators during the FL process, addressing the challenges that arise from the limited computation powers of ground devices and the absence of terrestrial base stations in the target region. The key idea behind our methodology is the adaptive data offloading and handover procedures that incorporate various network dynamics in SAGINs, including the mobility, heterogeneous computation powers, and inconsistent coverage times of incoming satellites. We analyze the latency of our scheme and develop an adaptive data offloading optimizer, and also characterize the theoretical convergence bound of our proposed algorithm. Experimental results confirm the advantage of our SAGIN-assisted FL methodology in terms of training time and test accuracy compared with various baselines. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: This paper is accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC)

arXiv:2408.05152 [pdf, other]

Sparsity-Preserving Encodings for Straggler-Optimal Distributed Matrix Computations at the Edge

Authors: Anindya Bijoy Das, Aditya Ramamoorthy, David J. Love, Christopher G. Brinton

Abstract: Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to slower nodes, called stragglers. In the edge learning context, however,… ▽ More Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to slower nodes, called stragglers. In the edge learning context, however, these approaches will compromise sparsity properties that are often present in the original matrices found at the edge server. In this study, we consider the challenge of augmenting such approaches to preserve input sparsity when distributing the task across edge devices, thereby retaining the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices, to provide the resilience to the maximum possible number of straggler devices (for given number of devices and their storage constraints). Next we propose distributed matrix computation schemes which meet the exact lower bound on the weight of the coding. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.04331

arXiv:2404.15374 [pdf, other]

doi 10.1109/JSAC.2024.3413977

Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning

Authors: Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neura… ▽ More Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features. △ Less

Submitted 18 August, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for the publication in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2402.09580

arXiv:2404.14319

Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs

Authors: David R. Nickel, Anindya Bijoy Das, David J. Love, Christopher G. Brinton

Abstract: Opportunistic spectrum access has the potential to increase the efficiency of spectrum utilization in cognitive radio networks (CRNs). In CRNs, both spectrum sensing and resource allocation (SSRA) are critical to maximizing system throughput while minimizing collisions of secondary users with the primary network. However, many works in dynamic spectrum access do not consider the impact of imperfec… ▽ More Opportunistic spectrum access has the potential to increase the efficiency of spectrum utilization in cognitive radio networks (CRNs). In CRNs, both spectrum sensing and resource allocation (SSRA) are critical to maximizing system throughput while minimizing collisions of secondary users with the primary network. However, many works in dynamic spectrum access do not consider the impact of imperfect sensing information such as mis-detected channels, which the additional information available in joint SSRA can help remediate. In this work, we examine joint SSRA as an optimization which seeks to maximize a CRN's net communication rate subject to constraints on channel sensing, channel access, and transmit power. Given the non-trivial nature of the problem, we leverage multi-agent reinforcement learning to enable a network of secondary users to dynamically access unoccupied spectrum via only local test statistics, formulated under the energy detection paradigm of spectrum sensing. In doing so, we develop a novel multi-agent implementation of hybrid soft actor critic, MHSAC, based on the QMIX mixing scheme. Through experiments, we find that our SSRA algorithm, HySSRA, is successful in maximizing the CRN's utilization of spectrum resources while also limiting its interference with the primary network, and outperforms the current state-of-the-art by a wide margin. We also explore the impact of wireless variations such as coherence time on the efficacy of the system. △ Less

Submitted 9 December, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Upon further exploration, model is not converging as expected under current formulation. We are working to update the inputs and objective so that it performs in an expected manner

arXiv:2404.09861 [pdf, other]

Unsupervised Federated Optimization at the Edge: D2D-Enabled Learning without Labels

Authors: Satyavrat Wagle, Seyyedali Hosseinalipour, Naji Khosravan, Christopher G. Brinton

Abstract: Federated learning (FL) is a popular solution for distributed machine learning (ML). While FL has traditionally been studied for supervised ML tasks, in many applications, it is impractical to assume availability of labeled data across devices. To this end, we develop Cooperative Federated unsupervised Contrastive Learning ({\tt CF-CL)} to facilitate FL across edge devices with unlabeled datasets.… ▽ More Federated learning (FL) is a popular solution for distributed machine learning (ML). While FL has traditionally been studied for supervised ML tasks, in many applications, it is impractical to assume availability of labeled data across devices. To this end, we develop Cooperative Federated unsupervised Contrastive Learning ({\tt CF-CL)} to facilitate FL across edge devices with unlabeled datasets. {\tt CF-CL} employs local device cooperation where either explicit (i.e., raw data) or implicit (i.e., embeddings) information is exchanged through device-to-device (D2D) communications to improve local diversity. Specifically, we introduce a \textit{smart information push-pull} methodology for data/embedding exchange tailored to FL settings with either soft or strict data privacy restrictions. Information sharing is conducted through a probabilistic importance sampling technique at receivers leveraging a carefully crafted reserve dataset provided by transmitters. In the implicit case, embedding exchange is further integrated into the local ML training at the devices via a regularization term incorporated into the contrastive loss, augmented with a dynamic contrastive margin to adjust the volume of latent space explored. Numerical evaluations demonstrate that {\tt CF-CL} leads to alignment of latent spaces learned across devices, results in faster and more efficient global model training, and is effective in extreme non-i.i.d. data distribution settings across devices. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 16 pages, 11 figures

arXiv:2404.08003 [pdf, other]

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton

Abstract: To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically… ▽ More To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically for FedRL} that can effectively handle heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $O(\frac{ε^{-2.5}}{N})$ sample complexity for global convergence at each agent on average. Compared to the single agent setting with $O(ε^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$, where $t_{i}$ denotes the time consumption in each iteration at agent $i$, and $t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performance of AFedPG in four widely used MuJoCo environments with varying numbers of agents. We also demonstrate the advantages of AFedPG in various computing heterogeneity scenarios. △ Less

Submitted 23 January, 2025; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Published as a conference paper at ICLR 2025

ACM Class: I.2.6; I.2.11

arXiv:2402.09629 [pdf, other]

Smart Information Exchange for Unsupervised Federated Learning via Reinforcement Learning

Authors: Seohyun Lee, Anindya Bijoy Das, Satyavrat Wagle, Christopher G. Brinton

Abstract: One of the main challenges of decentralized machine learning paradigms such as Federated Learning (FL) is the presence of local non-i.i.d. datasets. Device-to-device transfers (D2D) between distributed devices has been shown to be an effective tool for dealing with this problem and robust to stragglers. In an unsupervised case, however, it is not obvious how data exchanges should take place due to… ▽ More One of the main challenges of decentralized machine learning paradigms such as Federated Learning (FL) is the presence of local non-i.i.d. datasets. Device-to-device transfers (D2D) between distributed devices has been shown to be an effective tool for dealing with this problem and robust to stragglers. In an unsupervised case, however, it is not obvious how data exchanges should take place due to the absence of labels. In this paper, we propose an approach to create an optimal graph for data transfer using Reinforcement Learning. The goal is to form links that will provide the most benefit considering the environment's constraints and improve convergence speed in an unsupervised FL environment. Numerical analysis shows the advantages in terms of convergence speed and straggler resilience of the proposed method to different available FL schemes and benchmark datasets. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.09580 [pdf, other]

doi 10.1109/ICC51166.2024.10622841

Complexity Reduction in Machine Learning-Based Wireless Positioning: Minimum Description Features

Authors: Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: A recent line of research has been investigating deep learning approaches to wireless positioning (WP). Although these WP algorithms have demonstrated high accuracy and robust performance against diverse channel conditions, they also have a major drawback: they require processing high-dimensional features, which can be prohibitive for mobile applications. In this work, we design a positioning neur… ▽ More A recent line of research has been investigating deep learning approaches to wireless positioning (WP). Although these WP algorithms have demonstrated high accuracy and robust performance against diverse channel conditions, they also have a major drawback: they require processing high-dimensional features, which can be prohibitive for mobile applications. In this work, we design a positioning neural network (P-NN) that substantially reduces the complexity of deep learning-based WP through carefully crafted minimum description features. Our feature selection is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We also develop a novel methodology for adaptively selecting the size of feature space, which optimizes over balancing the expected amount of useful information and classification capability, quantified using information-theoretic measures on the signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: This paper has been accepted in IEEE International Conference on Communications (ICC) 2024

arXiv:2402.03448 [pdf, other]

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Authors: Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton

Abstract: Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabi… ▽ More Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology built on a generalized notion of $\textit{sporadicity}$ in both local gradient and aggregation processes. $\texttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $\textit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $\texttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings. △ Less

Submitted 6 March, 2025; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02225 [pdf, other]

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

Abstract: A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) res… ▽ More A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) result in significant accuracy variance, failing to provide a balanced performance across clients. To address these challenges, we propose CoPreFL, a collaborative/distributed pre-training approach which provides a robust initialization for downstream FL tasks. The key idea of CoPreFL is a model-agnostic meta-learning (MAML) procedure that tailors the global model to closely mimic heterogeneous and unseen FL scenarios, resulting in a pre-trained model that is rapidly adaptable to arbitrary FL tasks. Our MAML procedure incorporates performance variance into the meta-objective function, balancing performance across clients rather than solely optimizing for accuracy. Through extensive experiments, we demonstrate that CoPreFL obtains significant improvements in both average accuracy and variance across arbitrary downstream FL tasks with unseen/seen labels, compared with various pre-training baselines. We also show how CoPreFL is compatible with different well-known FL algorithms applied by the downstream tasks, enhancing performance in each case. △ Less

Submitted 11 December, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: AAAI 2025

arXiv:2401.16685 [pdf, other]

Communication-Efficient Multimodal Federated Learning: Joint Modality and Client Selection

Authors: Liangqi Yuan, Dong-Jun Han, Su Wang, Devesh Upadhyay, Christopher G. Brinton

Abstract: Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading a… ▽ More Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading all their locally trained modality models to the server. In this paper, we propose multimodal Federated learning with joint Modality and Client selection (mmFedMC), a new FL methodology that can tackle the above-mentioned challenges in multimodal settings. The joint selection algorithm incorporates two main components: (a) A modality selection methodology for each client, which weighs (i) the impact of the modality, gauged by Shapley value analysis, (ii) the modality model size as a gauge of communication overhead, against (iii) the frequency of modality model updates, denoted recency, to enhance generalizability. (b) A client selection strategy for the server based on the local loss of modality model at each client. Experiments on five real-world datasets demonstrate the ability of mmFedMC to achieve comparable accuracy to several baselines while reducing the communication overhead by over 20x. A demo video of our methodology is available at https://liangqiy.com/mmfedmc/. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.07048

arXiv:2401.11592 [pdf, other]

Differentially-Private Multi-Tier Federated Learning

Authors: Evan Chen, Frank Po-Chen Lin, Dong-Jun Han, Christopher G. Brinton

Abstract: While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose Multi-Tier Federated Learning with Multi-Tier Differential Privacy (M^2FDP), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. One of the key concepts of… ▽ More While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose Multi-Tier Federated Learning with Multi-Tier Differential Privacy (M^2FDP), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. One of the key concepts of M^2FDP is to extend the concept of HDP towards Multi-Tier Differential Privacy (MDP), while also adapting DP noise injection at different layers of an established FL hierarchy -- edge devices, edge servers, and cloud servers -- according to the trust models within particular subnetworks. We conduct a comprehensive analysis of the convergence behavior of M^2FDP, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. Subsequent numerical evaluations demonstrate that M^2FDP obtains substantial improvements in these metrics over baselines for different privacy budgets, and validate the impact of different system configurations. △ Less

Submitted 7 November, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.07456 [pdf, other]

Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

Authors: Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton

Abstract: Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints… ▽ More Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget. △ Less

Submitted 18 April, 2025; v1 submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.00477 [pdf, other]

Coding for Gaussian Two-Way Channels: Linear and Learning-Based Approaches

Authors: Junghoon Kim, Taejoon Kim, Anindya Bijoy Das, Seyyedali Hosseinalipour, David J. Love, Christopher G. Brinton

Abstract: Although user cooperation cannot improve the capacity of Gaussian two-way channels (GTWCs) with independent noises, it can improve communication reliability. In this work, we aim to enhance and balance the communication reliability in GTWCs by minimizing the sum of error probabilities via joint design of encoders and decoders at the users. We first formulate general encoding/decoding functions, wh… ▽ More Although user cooperation cannot improve the capacity of Gaussian two-way channels (GTWCs) with independent noises, it can improve communication reliability. In this work, we aim to enhance and balance the communication reliability in GTWCs by minimizing the sum of error probabilities via joint design of encoders and decoders at the users. We first formulate general encoding/decoding functions, where the user cooperation is captured by the coupling of user encoding processes. The coupling effect renders the encoder/decoder design non-trivial, requiring effective decoding to capture this effect, as well as efficient power management at the encoders within power constraints. To address these challenges, we propose two different two-way coding strategies: linear coding and learning-based coding. For linear coding, we propose optimal linear decoding and discuss new insights on encoding regarding user cooperation to balance reliability. We then propose an efficient algorithm for joint encoder/decoder design. For learning-based coding, we introduce a novel recurrent neural network (RNN)-based coding architecture, where we propose interactive RNNs and a power control layer for encoding, and we incorporate bi-directional RNNs with an attention mechanism for decoding. Through simulations, we show that our two-way coding methodologies outperform conventional channel coding schemes (that do not utilize user cooperation) significantly in sum-error performance. We also demonstrate that our linear coding excels at high signal-to-noise ratios (SNRs), while our RNN-based coding performs best at low SNRs. We further investigate our two-way coding strategies in terms of power distribution, two-way coding benefit, different coding rates, and block-length gain. △ Less

Submitted 23 April, 2025; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: This work has been accepted for publication in the IEEE Transactions on Information Theory

arXiv:2312.16638 [pdf, other]

Robust Collaborative Inference with Vertically Split Data Over Dynamic Device Environments

Authors: Surojit Ganguli, Zeyu Zhou, Christopher G. Brinton, David I. Inouye

Abstract: When each edge device of a network only perceives a local part of the environment, collaborative inference across multiple devices is often needed to predict global properties of the environment. In safety-critical applications, collaborative inference must be robust to significant network failures caused by environmental disruptions or extreme weather. Existing collaborative learning approaches,… ▽ More When each edge device of a network only perceives a local part of the environment, collaborative inference across multiple devices is often needed to predict global properties of the environment. In safety-critical applications, collaborative inference must be robust to significant network failures caused by environmental disruptions or extreme weather. Existing collaborative learning approaches, such as privacy-focused Vertical Federated Learning (VFL), typically assume a centralized setup or that one device never fails. However, these assumptions make prior approaches susceptible to significant network failures. To address this problem, we first formalize the problem of robust collaborative inference over a dynamic network of devices that could experience significant network faults. Then, we develop a minimalistic yet impactful method called Multiple Aggregation with Gossip Rounds and Simulated Faults (MAGS) that synthesizes simulated faults via dropout, replication, and gossiping to significantly improve robustness over baselines. We also theoretically analyze our proposed approach to explain why each component enhances robustness. Extensive empirical results validate that MAGS is robust across a range of fault rates-including extreme fault rates. △ Less

Submitted 25 April, 2025; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.15361 [pdf, other]

Cooperative Federated Learning over Ground-to-Satellite Integrated Networks: Joint Local Computation and Data Offloading

Authors: Dong-Jun Han, Seyyedali Hosseinalipour, David J. Love, Mung Chiang, Christopher G. Brinton

Abstract: While network coverage maps continue to expand, many devices located in remote areas remain unconnected to terrestrial communication infrastructures, preventing them from getting access to the associated data-driven services. In this paper, we propose a ground-to-satellite cooperative federated learning (FL) methodology to facilitate machine learning service management over remote regions. Our met… ▽ More While network coverage maps continue to expand, many devices located in remote areas remain unconnected to terrestrial communication infrastructures, preventing them from getting access to the associated data-driven services. In this paper, we propose a ground-to-satellite cooperative federated learning (FL) methodology to facilitate machine learning service management over remote regions. Our methodology orchestrates satellite constellations to provide the following key functions during FL: (i) processing data offloaded from ground devices, (ii) aggregating models within device clusters, and (iii) relaying models/data to other satellites via inter-satellite links (ISLs). Due to the limited coverage time of each satellite over a particular remote area, we facilitate satellite transmission of trained models and acquired data to neighboring satellites via ISL, so that the incoming satellite can continue conducting FL for the region. We theoretically analyze the convergence behavior of our algorithm, and develop a training latency minimizer which optimizes over satellite-specific network resources, including the amount of data to be offloaded from ground devices to satellites and satellites' computation speeds. Through experiments on three datasets, we show that our methodology can significantly speed up the convergence of FL compared with terrestrial-only and other satellite baseline approaches. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: This paper is accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC)

arXiv:2312.04728 [pdf, other]

Taming Subnet-Drift in D2D-Enabled Fog Learning: A Hierarchical Gradient Tracking Approach

Authors: Evan Chen, Shiqiang Wang, Christopher G. Brinton

Abstract: Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global mode… ▽ More Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets. △ Less

Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: This paper is accepted for publication in the proceedings of 2024 IEEE International Conference on Computer Communications (INFOCOM)

arXiv:2311.07946 [pdf, other]

The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks

Authors: Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton

Abstract: As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the rol… ▽ More As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between $9\%$ and $66.5\%$ for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks. △ Less

Submitted 19 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Accepted to ICC 2024 conference

arXiv:2311.04350 [pdf, other]

Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks

Authors: Su Wang, Roberto Morabito, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton

Abstract: The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps… ▽ More The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption. △ Less

Submitted 19 August, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Published in IEEE/ACM Transactions on Networking. arXiv admin note: substantial text overlap with arXiv:2101.00787

arXiv:2311.00227 [pdf, other]

StableFDG: Style and Attention Based Learning for Federated Domain Generalization

Authors: Jungwuk Park, Dong-Jun Han, Jinho Kim, Shiqiang Wang, Christopher G. Brinton, Jaekyun Moon

Abstract: Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of s… ▽ More Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: Accepted at NeurIPS 2023, 19 pages

arXiv:2310.17890 [pdf, other]

Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning

Authors: Wenzhi Fang, Dong-Jun Han, Christopher G. Brinton

Abstract: Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained wireless devices. In this paper, we propose hierarchical ind… ▽ More Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained wireless devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks. The key idea behind HIST is to divide the global model into disjoint partitions (or submodels) per round so that each group of clients (i.e., cells) is responsible for training only one partition of the model. We characterize the convergence behavior of HIST under mild assumptions, showing the impacts of several key attributes (e.g., submodel sizes, number of cells, edge and global aggregation frequencies) on the rate and stationarity gap. Building upon the theoretical results, we propose a submodel partitioning strategy to minimize the training latency depending on network resource availability and a target learning performance guarantee. We then demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells. Through numerical evaluations, we verify that HIST is able to save training time and communication costs by wide margins while achieving comparable accuracy as conventional HFL. Moreover, our experiments demonstrate that AirComp-assisted HIST provides further improvements in training latency. △ Less

Submitted 26 January, 2025; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: 26 pages, 9 figures, 3 tables

arXiv:2310.10804 [pdf, other]

Constant Modulus Waveform Design with Block-Level Interference Exploitation for DFRC Systems

Authors: Byunghyun Lee, Anindya Bijoy Das, David J. Love, Christopher G. Brinton, James V. Krogmeier

Abstract: Dual-functional radar-communication (DFRC) is a promising technology where radar and communication functions operate on the same spectrum and hardware. In this paper, we propose an algorithm for designing constant modulus waveforms for DFRC systems. Particularly, we jointly optimize the correlation properties and the spatial beam pattern. For communication, we employ constructive interference-base… ▽ More Dual-functional radar-communication (DFRC) is a promising technology where radar and communication functions operate on the same spectrum and hardware. In this paper, we propose an algorithm for designing constant modulus waveforms for DFRC systems. Particularly, we jointly optimize the correlation properties and the spatial beam pattern. For communication, we employ constructive interference-based block-level precoding (CI-BLP) to exploit distortion due to multi-user and radar transmission. We propose a majorization-minimization (MM)-based solution to the formulated problem. To accelerate convergence, we propose an improved majorizing function that leverages a novel diagonal matrix structure. We then evaluate the performance of the proposed algorithm through rigorous simulations. Simulation results demonstrate the effectiveness of the proposed approach and the proposed majorizer. △ Less

Submitted 6 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted to IEEE International Conference on Communications (ICC) 2024

arXiv:2310.07048 [pdf, other]

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

Authors: Liangqi Yuan, Dong-Jun Han, Vishnu Pandi Chellapandi, Stanislaw H. Żak, Christopher G. Brinton

Abstract: Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (… ▽ More Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x. △ Less

Submitted 19 August, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: ICC 2024

arXiv:2310.03178 [pdf, other]

Digital Ethics in Federated Learning

Authors: Liangqi Yuan, Ziran Wang, Christopher G. Brinton

Abstract: The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potentia… ▽ More The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development. △ Less

Submitted 18 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.00098 [pdf, ps, other]

Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping

Authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko

Abstract: While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform… ▽ More While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform gradient behavior observed in shallow models. As a result, prior works struggle to converge with standard optimization techniques, even in the absence of DP mechanisms. To the best of our knowledge, no existing work establishes a competitive, practical recipe for FL with DP in the context of ASR. To address this gap, we establish \textbf{the first benchmark for FL with DP in end-to-end ASR}. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis reveals that these techniques together mitigate clipping bias and gradient heterogeneity across layers in deeper models. Consistent with these theoretical insights, our empirical results show that FL with DP is viable under strong privacy guarantees, provided a population of at least several million users. Specifically, we achieve user-level (7.2, $10^{-9}$)-DP (resp. (4.5, $10^{-9}$)-DP) with only a 1.3% (resp. 4.6%) absolute drop in word error rate when extrapolating to high (resp. low) population scales for FL with DP in ASR. Although our experiments focus on ASR, the underlying principles we uncover - particularly those concerning gradient heterogeneity and layer-wise gradient normalization - offer broader guidance for designing scalable, privacy-preserving FL algorithms for large models across domains. △ Less

Submitted 29 May, 2025; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: Under review

arXiv:2308.10407 [pdf, other]

Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

Authors: Vishnu Pandi Chellapandi, Liangqi Yuan, Christopher G. Brinton, Stanislaw H Zak, Ziran Wang

Abstract: Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multi… ▽ More Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future investigation to further enhance the effectiveness and efficiency of FL in the context of CAV are discussed. △ Less

Submitted 11 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: IEEE Transactions on Intelligent Vehicles

arXiv:2308.04331 [pdf, ps, other]

Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

Authors: Anindya Bijoy Das, Aditya Ramamoorthy, David J. Love, Christopher G. Brinton

Abstract: Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the… ▽ More Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of stragglers (for given number of nodes and their storage constraints). Next we propose a distributed matrix computation scheme which meets this exact lower bound on the weight of the coding. Further, we develop controllable trade-off between worker computation time and the privacy constraint for sparse input matrices in settings where the worker nodes are honest but curious. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Showing 1–50 of 109 results for author: Brinton, C G