Search | arXiv e-print repository

CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Authors: Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty

Abstract: We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that client… ▽ More We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that clients in the same cluster share similar loss values, while those in different clusters exhibit distinct loss patterns. Based on these embeddings, CLoVE is able to iteratively identify and separate clients from different clusters and optimize cluster-specific models through federated aggregation. Key advantages of CLoVE over existing CFL algorithms are (1) its simplicity, (2) its applicability to both supervised and unsupervised settings, and (3) the fact that it eliminates the need for near-optimal model initialization, which makes it more robust and better suited for real-world applications. We establish theoretical convergence bounds, showing that CLoVE can recover clusters accurately with high probability in a single round and converges exponentially fast to optimal models in a linear setting. Our comprehensive experiments comparing with a variety of both CFL and generic Personalized Federated Learning (PFL) algorithms on different types of datasets and an extensive array of non-IID settings demonstrate that CLoVE achieves highly accurate cluster recovery in just a few rounds of training, along with state-of-the-art model accuracy, across a variety of both supervised and unsupervised PFL tasks. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 31 pages, 4 figures

arXiv:2504.20246 [pdf, other]

Tree embedding based mapping system for low-latency mobile applications in multi-access networks

Authors: Yu Mi, Randeep Bhatia, Fang Hao, An Wang, Steve Benno, Tv Lakshman

Abstract: Low-latency applications like AR/VR and online gaming need fast, stable connections. New technologies such as V2X, LEO satellites, and 6G bring unique challenges in mobility management. Traditional solutions based on centralized or distributed anchors often fall short in supporting rapid mobility due to inefficient routing, low versatility, and insufficient multi-access support. In this paper, we… ▽ More Low-latency applications like AR/VR and online gaming need fast, stable connections. New technologies such as V2X, LEO satellites, and 6G bring unique challenges in mobility management. Traditional solutions based on centralized or distributed anchors often fall short in supporting rapid mobility due to inefficient routing, low versatility, and insufficient multi-access support. In this paper, we design a new end-to-end system for tracking multi-connected mobile devices at scale and optimizing performance for latency-sensitive, highly dynamic applications. Our system, based on the locator/ID separation principle, extends to multi-access networks without requiring specialized routers or caching. Using a novel tree embedding-based overlay, we enable fast session setup while allowing endpoints to directly handle mobility between them. Evaluation with real network data shows our solution cuts connection latency to 7.42% inflation over the shortest path, compared to LISP's 359\% due to cache misses. It also significantly reduces location update overhead and disruption time during mobility. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: Accepted by IEEE INFOCOM 2025-IEEE Conference on Computer Communications

arXiv:2305.03165 [pdf, other]

Understanding the Benefits of Hardware-Accelerated Communication in Model-Serving Applications

Authors: Walid A. Hanafy, Limin Wang, Hyunseok Chang, Sarit Mukherjee, T. V. Lakshman, Prashant Shenoy

Abstract: It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-s… ▽ More It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-stage pipeline that spans across multiple compute nodes and proxies interconnected via a dedicated network fabric within a given edge computing facility. As the latest hardware-accelerated transport technologies such as RDMA and GPUDirect RDMA are adopted to build such network fabric, there is a need for good understanding of the full potential of these technologies in the context of computation offload and the effect of different factors such as GPU scheduling and characteristics of computation on the net performance gain achievable by these technologies. This paper unveils detailed insights into the latency overhead in typical machine learning (ML)-based computation pipelines and analyzes the potential benefits of adopting hardware-accelerated communication. To this end, we build a model-serving framework that supports various communication mechanisms. Using the framework, we identify performance bottlenecks in state-of-the-art model-serving pipelines and show how hardware-accelerated communication can alleviate them. For example, we show that GPUDirect RDMA can save 15--50\% of model-serving latency, which amounts to 70--160 ms. △ Less

Submitted 10 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2008.00905 [pdf, other]

Learning Based Methods for Traffic Matrix Estimation from Link Measurements

Authors: Shenghe Xu, Murali Kodialam, T. V. Lakshman, Shivendra Panwar

Abstract: Network traffic demand matrix is a critical input for capacity planning, anomaly detection and many other network management related tasks. The demand matrix is often computed from link load measurements. The traffic matrix (TM) estimation problem is the determination of the traffic demand matrix from link load measurements. The relationship between the link loads and the traffic matrix that gener… ▽ More Network traffic demand matrix is a critical input for capacity planning, anomaly detection and many other network management related tasks. The demand matrix is often computed from link load measurements. The traffic matrix (TM) estimation problem is the determination of the traffic demand matrix from link load measurements. The relationship between the link loads and the traffic matrix that generated the link load can be modeled as an under-determined linear system and has multiple feasible solutions. Therefore, prior knowledge of the traffic demand pattern has to be used in order to find a potentially feasible demand matrix. In this paper, we consider the TM estimation problem where we have information about the distribution of the demand sizes. This information can be obtained from the analysis of a few traffic matrices measured in the past or from operator experience. We develop an iterative projection based algorithm for the solution of this problem. If large number of past traffic matrices are accessible, we propose a Generative Adversarial Network (GAN) based approach for solving the problem. We compare the strengths of the two approaches and evaluate their performance for several networks using varying amounts of past data. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: 10 pages

arXiv:2007.09521 [pdf, other]

Tomography Based Learning for Load Distribution through Opaque Networks

Authors: Shenghe Xu, Murali Kodialam, T. V. Lakshman, Shivendra S. Panwar

Abstract: Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path c… ▽ More Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path characteristics of the underlying network from the ingresses to destinations is not explicitly available to OTT services. These can only be inferred from external probing. In this paper, we combine network tomography with machine learning to minimize delays. We consider this problem in a general setting where traffic sources can choose a set of ingresses through which their traffic enter a black box network. The problem in this setting can be viewed as a reinforcement learning problem with constraints on a continuous action space, which to the best of our knowledge have not been investigated by the machine learning community. Key technical challenges to solving this problem include the high dimensionality of the problem and handling constraints that are intrinsic to networks. Evaluation results show that our methods achieve up to 60% delay reductions in comparison to standard heuristics. Moreover, the methods we develop can be used in a centralized manner or in a distributed manner by multiple independent agents. △ Less

Submitted 18 July, 2020; originally announced July 2020.

Comments: 12 pages, 15 figures, submited to JSAC

Showing 1–5 of 5 results for author: Lakshman, T