-
Dynamically Weighted Federated k-Means
Authors:
Patrick Holzer,
Tania Jacob,
Shubham Kavane
Abstract:
Federated clustering, an integral aspect of federated machine learning, enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. In this paper, we introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering, to address the challenges associ…
▽ More
Federated clustering, an integral aspect of federated machine learning, enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. In this paper, we introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering, to address the challenges associated with distributed data sources and heterogeneous data. Our proposed algorithm combines the benefits of traditional clustering techniques with the privacy and scalability benefits offered by federated learning. The algorithm facilitates collaborative clustering among multiple data owners, allowing them to cluster their local data collectively while exchanging minimal information with the central coordinator. The algorithm optimizes the clustering process by adaptively aggregating cluster assignments and centroids from each data source, thereby learning a global clustering solution that reflects the collective knowledge of the entire federated network. We address the issue of empty clusters, which commonly arises in the context of federated clustering. We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure. The results demonstrate that our approach can match the performance of the centralized classical k-means baseline, and outperform existing federated clustering methods like k-FED in realistic scenarios.
△ Less
Submitted 17 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Combining Datasets with Different Label Sets for Improved Nucleus Segmentation and Classification
Authors:
Amruta Parulekar,
Utkarsh Kanwat,
Ravi Kant Gupta,
Medha Chippa,
Thomas Jacob,
Tripti Bameta,
Swapnil Rane,
Amit Sethi
Abstract:
Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopat…
▽ More
Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopathology images with nuclear annotations and class labels have been made publicly available, the set of class labels differ across these datasets. We propose a method to train DNNs for instance segmentation and classification on multiple datasets where the set of classes across the datasets are related but not the same. Specifically, our method is designed to utilize a coarse-to-fine class hierarchy, where the set of classes labeled and annotated in a dataset can be at any level of the hierarchy, as long as the classes are mutually exclusive. Within a dataset, the set of classes need not even be at the same level of the class hierarchy tree. Our results demonstrate that segmentation and classification metrics for the class set used by the test split of a dataset can improve by pre-training on another dataset that may even have a different set of classes due to the expansion of the training set enabled by our method. Furthermore, generalization to previously unseen datasets also improves by combining multiple other datasets with different sets of classes for training. The improvement is both qualitative and quantitative. The proposed method can be adapted for various loss functions, DNN architectures, and application domains.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Artificial Intelligence-based Eosinophil Counting in Gastrointestinal Biopsies
Authors:
Harsh Shah,
Thomas Jacob,
Amruta Parulekar,
Anjali Amarapurkar,
Amit Sethi
Abstract:
Normally eosinophils are present in the gastrointestinal (GI) tract of healthy individuals. When the eosinophils increase beyond their usual amount in the GI tract, a patient gets varied symptoms. Clinicians find it difficult to diagnose this condition called eosinophilia. Early diagnosis can help in treating patients. Histopathology is the gold standard in the diagnosis for this condition. As thi…
▽ More
Normally eosinophils are present in the gastrointestinal (GI) tract of healthy individuals. When the eosinophils increase beyond their usual amount in the GI tract, a patient gets varied symptoms. Clinicians find it difficult to diagnose this condition called eosinophilia. Early diagnosis can help in treating patients. Histopathology is the gold standard in the diagnosis for this condition. As this is an under-diagnosed condition, counting eosinophils in the GI tract biopsies is important. In this study, we trained and tested a deep neural network based on UNet to detect and count eosinophils in GI tract biopsies. We used connected component analysis to extract the eosinophils. We studied correlation of eosinophilic infiltration counted by AI with a manual count. GI tract biopsy slides were stained with H&E stain. Slides were scanned using a camera attached to a microscope and five high-power field images were taken per slide. Pearson correlation coefficient was 85% between the machine-detected and manual eosinophil counts on 300 held-out (test) images.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
A Letter on Progress Made on Husky Carbon: A Legged-Aerial, Multi-modal Platform
Authors:
Adarsh Salagame,
Shoghair Manjikian,
Chenghao Wang,
Kaushik Venkatesh Krishnamurthy,
Shreyansh Pitroda,
Bibek Gupta,
Tobias Jacob,
Benjamin Mottis,
Eric Sihite,
Milad Ramezani,
Alireza Ramezani
Abstract:
Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our pro…
▽ More
Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our progress in developing our quadrupedal robot capable of multi-modal locomotion (legged and aerial locomotion), the Husky Carbon. We report the mechanical and electrical components utilized in our robot, in addition to the simulation and experimentation done to achieve our goal in developing a versatile multi-modal robotic platform.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Fed-DART and FACT: A solution for Federated Learning in a production environment
Authors:
Nico Weber,
Patrick Holzer,
Tania Jacob,
Enislay Ramentol
Abstract:
Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multi…
▽ More
Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multiple domains are required to realize its full potential. Having this in mind we have developed an innovative Federated Learning framework FACT based on Fed-DART, enabling an easy and scalable deployment, helping the user to fully leverage the potential of their private and decentralized data.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Marine vessel tracking using a monocular camera
Authors:
Tobias Jacob,
Raffaele Galliera,
Muddasar Ali,
Sikha Bagui
Abstract:
In this paper, a new technique for camera calibration using only GPS data is presented. A new way of tracking objects that move on a plane in a video is achieved by using the location and size of the bounding box to estimate the distance, achieving an average prediction error of 5.55m per 100m distance from the camera. This solution can be run in real-time at the edge, achieving efficient inferenc…
▽ More
In this paper, a new technique for camera calibration using only GPS data is presented. A new way of tracking objects that move on a plane in a video is achieved by using the location and size of the bounding box to estimate the distance, achieving an average prediction error of 5.55m per 100m distance from the camera. This solution can be run in real-time at the edge, achieving efficient inference in a low-powered IoT environment while also being able to track multiple different vessels.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
The Exact Rate Memory Tradeoff for Small Caches with Coded Placement
Authors:
Vijith Kumar K P,
Brijesh Kumar Rai,
Tony Jacob
Abstract:
The idea of coded caching was introduced by Maddah-Ali and Niesen who demonstrated the advantages of coding in caching problems. To capture the essence of the problem, they introduced the $(N, K)$ canonical cache network in which $K$ users with independent caches of size $M$ request files from a server that has $N$ files. Among other results, the caching scheme and lower bounds proposed by them le…
▽ More
The idea of coded caching was introduced by Maddah-Ali and Niesen who demonstrated the advantages of coding in caching problems. To capture the essence of the problem, they introduced the $(N, K)$ canonical cache network in which $K$ users with independent caches of size $M$ request files from a server that has $N$ files. Among other results, the caching scheme and lower bounds proposed by them led to a characterization of the exact rate memory tradeoff when $M\geq \frac{N}{K}(K-1)$. These lower bounds along with the caching scheme proposed by Chen et al. led to a characterization of the exact rate memory tradeoff when $M\leq \frac{1}{K}$. In this paper we focus on small caches where $M\in \left[0,\frac{N}{K}\right]$ and derive new lower bounds. For the case when $\big\lceil\frac{K+1}{2}\big\rceil\leq N \leq K$ and $M\in \big[\frac{1}{K},\frac{N}{K(N-1)}\big]$, our lower bounds demonstrate that the caching scheme introduced by G{ó}mez-Vilardeb{ó} is optimal and thus extend the characterization of the exact rate memory tradeoff. For the case $1\leq N\leq \big\lceil\frac{K+1}{2}\big\rceil$, we show that the new lower bounds improve upon the previously known lower bounds.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
The Exact Rate Memory Tradeoff for Large Caches with Coded Placement
Authors:
Vijith Kumar K P,
Brijesh Kumar Rai,
Tony Jacob
Abstract:
The idea of coded caching for content distribution networks was introduced by Maddah-Ali and Niesen, who considered the canonical $(N, K)$ cache network in which a server with $N$ files satisfy the demands of $K$ users (equipped with independent caches of size $M$ each). Among other results, their work provided a characterization of the exact rate memory tradeoff for the problem when…
▽ More
The idea of coded caching for content distribution networks was introduced by Maddah-Ali and Niesen, who considered the canonical $(N, K)$ cache network in which a server with $N$ files satisfy the demands of $K$ users (equipped with independent caches of size $M$ each). Among other results, their work provided a characterization of the exact rate memory tradeoff for the problem when $M\geq\frac{N}{K}(K-1)$. In this paper, we improve this result for large caches with $M\geq \frac{N}{K}(K-2)$. For the case $\big\lceil\frac{K+1}{2}\big\rceil\leq N \leq K$, we propose a new coded caching scheme, and derive a matching lower bound to show that the proposed scheme is optimal. This extends the characterization of the exact rate memory tradeoff to the case $M\geq \frac{N}{K}\Big(K-2+\frac{(K-2+1/N)}{(K-1)}\Big)$. For the case $1\leq N\leq \big\lceil\frac{K+1}{2}\big\rceil$, we derive a new lower bound, which demonstrates that the scheme proposed by Yu et al. is optimal and thus extend the characterization of the exact rate memory tradeoff to the case $M\geq \frac{N}{K}(K-2)$.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Fundamental Limits of Coded Caching: The Memory Rate Pair (K-1-1/K, 1/(K-1))
Authors:
Vijith Kumar K P,
Brijesh Kumar Rai,
Tony Jacob
Abstract:
Maddah-Ali and Niesen, in a seminal paper, introduced the notion of coded caching. The exact nature of the fundamental limits in this context has remained elusive even as several approximate characterizations have been found. A new optimal scheme for the (3, 3) cache network, operating at the memory rate pair (5/3, 1/2) for the demand where all the users request for distinct files, was introduced…
▽ More
Maddah-Ali and Niesen, in a seminal paper, introduced the notion of coded caching. The exact nature of the fundamental limits in this context has remained elusive even as several approximate characterizations have been found. A new optimal scheme for the (3, 3) cache network, operating at the memory rate pair (5/3, 1/2) for the demand where all the users request for distinct files, was introduced recently to partially address this issue. In this paper, an extension of this scheme to the general (K, K) cache network, operating at the memory rate pair (K-1-1/K, 1/(K-1)), is proposed. A new lower bound is also derived which demonstrates the optimality of the proposed scheme for the demand where all the users request for distinct files.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.