Search | arXiv e-print repository

Dynamically Weighted Federated k-Means

Authors: Patrick Holzer, Tania Jacob, Shubham Kavane

Abstract: Federated clustering, an integral aspect of federated machine learning, enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. In this paper, we introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering, to address the challenges associ… ▽ More Federated clustering, an integral aspect of federated machine learning, enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. In this paper, we introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering, to address the challenges associated with distributed data sources and heterogeneous data. Our proposed algorithm combines the benefits of traditional clustering techniques with the privacy and scalability benefits offered by federated learning. The algorithm facilitates collaborative clustering among multiple data owners, allowing them to cluster their local data collectively while exchanging minimal information with the central coordinator. The algorithm optimizes the clustering process by adaptively aggregating cluster assignments and centroids from each data source, thereby learning a global clustering solution that reflects the collective knowledge of the entire federated network. We address the issue of empty clusters, which commonly arises in the context of federated clustering. We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure. The results demonstrate that our approach can match the performance of the centralized classical k-means baseline, and outperform existing federated clustering methods like k-FED in realistic scenarios. △ Less

Submitted 17 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.03346 [pdf, other]

Combining Datasets with Different Label Sets for Improved Nucleus Segmentation and Classification

Authors: Amruta Parulekar, Utkarsh Kanwat, Ravi Kant Gupta, Medha Chippa, Thomas Jacob, Tripti Bameta, Swapnil Rane, Amit Sethi

Abstract: Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopat… ▽ More Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopathology images with nuclear annotations and class labels have been made publicly available, the set of class labels differ across these datasets. We propose a method to train DNNs for instance segmentation and classification on multiple datasets where the set of classes across the datasets are related but not the same. Specifically, our method is designed to utilize a coarse-to-fine class hierarchy, where the set of classes labeled and annotated in a dataset can be at any level of the hierarchy, as long as the classes are mutually exclusive. Within a dataset, the set of classes need not even be at the same level of the class hierarchy tree. Our results demonstrate that segmentation and classification metrics for the class set used by the test split of a dataset can improve by pre-training on another dataset that may even have a different set of classes due to the expansion of the training set enabled by our method. Furthermore, generalization to previously unseen datasets also improves by combining multiple other datasets with different sets of classes for training. The improvement is both qualitative and quantitative. The proposed method can be adapted for various loss functions, DNN architectures, and application domains. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2211.15667 [pdf, other]

Artificial Intelligence-based Eosinophil Counting in Gastrointestinal Biopsies

Authors: Harsh Shah, Thomas Jacob, Amruta Parulekar, Anjali Amarapurkar, Amit Sethi

Abstract: Normally eosinophils are present in the gastrointestinal (GI) tract of healthy individuals. When the eosinophils increase beyond their usual amount in the GI tract, a patient gets varied symptoms. Clinicians find it difficult to diagnose this condition called eosinophilia. Early diagnosis can help in treating patients. Histopathology is the gold standard in the diagnosis for this condition. As thi… ▽ More Normally eosinophils are present in the gastrointestinal (GI) tract of healthy individuals. When the eosinophils increase beyond their usual amount in the GI tract, a patient gets varied symptoms. Clinicians find it difficult to diagnose this condition called eosinophilia. Early diagnosis can help in treating patients. Histopathology is the gold standard in the diagnosis for this condition. As this is an under-diagnosed condition, counting eosinophils in the GI tract biopsies is important. In this study, we trained and tested a deep neural network based on UNet to detect and count eosinophils in GI tract biopsies. We used connected component analysis to extract the eosinophils. We studied correlation of eosinophilic infiltration counted by AI with a manual count. GI tract biopsy slides were stained with H&E stain. Slides were scanned using a camera attached to a microscope and five high-power field images were taken per slide. Pearson correlation coefficient was 85% between the machine-detected and manual eosinophil counts on 300 held-out (test) images. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 4 pages, 2 figures

arXiv:2207.12254 [pdf, other]

A Letter on Progress Made on Husky Carbon: A Legged-Aerial, Multi-modal Platform

Authors: Adarsh Salagame, Shoghair Manjikian, Chenghao Wang, Kaushik Venkatesh Krishnamurthy, Shreyansh Pitroda, Bibek Gupta, Tobias Jacob, Benjamin Mottis, Eric Sihite, Milad Ramezani, Alireza Ramezani

Abstract: Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our pro… ▽ More Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our progress in developing our quadrupedal robot capable of multi-modal locomotion (legged and aerial locomotion), the Husky Carbon. We report the mechanical and electrical components utilized in our robot, in addition to the simulation and experimentation done to achieve our goal in developing a versatile multi-modal robotic platform. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2104.05834, arXiv:2205.06392

arXiv:2205.11267 [pdf, other]

Fed-DART and FACT: A solution for Federated Learning in a production environment

Authors: Nico Weber, Patrick Holzer, Tania Jacob, Enislay Ramentol

Abstract: Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multi… ▽ More Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multiple domains are required to realize its full potential. Having this in mind we have developed an innovative Federated Learning framework FACT based on Fed-DART, enabling an easy and scalable deployment, helping the user to fully leverage the potential of their private and decentralized data. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2108.10367 [pdf, ps, other]

doi 10.5220/0010516000170028

Marine vessel tracking using a monocular camera

Authors: Tobias Jacob, Raffaele Galliera, Muddasar Ali, Sikha Bagui

Abstract: In this paper, a new technique for camera calibration using only GPS data is presented. A new way of tracking objects that move on a plane in a video is achieved by using the location and size of the bounding box to estimate the distance, achieving an average prediction error of 5.55m per 100m distance from the camera. This solution can be run in real-time at the edge, achieving efficient inferenc… ▽ More In this paper, a new technique for camera calibration using only GPS data is presented. A new way of tracking objects that move on a plane in a video is achieved by using the location and size of the bounding box to estimate the distance, achieving an average prediction error of 5.55m per 100m distance from the camera. This solution can be run in real-time at the edge, achieving efficient inference in a low-powered IoT environment while also being able to track multiple different vessels. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: 12 pages, 9 figures, the paper is based on the submission for the AI Tracks at Sea challenge made by the same team taking to a 3rd place in the competition, included in DeLTA 2021 conference proceedings, published on SCITEPRESS Digital Library and available at https://www.scitepress.org/PublicationsDetail.aspx?ID=yzZS+b/VkZ4=&t=1

arXiv:2102.04797 [pdf, other]

The Exact Rate Memory Tradeoff for Small Caches with Coded Placement

Authors: Vijith Kumar K P, Brijesh Kumar Rai, Tony Jacob

Abstract: The idea of coded caching was introduced by Maddah-Ali and Niesen who demonstrated the advantages of coding in caching problems. To capture the essence of the problem, they introduced the $(N, K)$ canonical cache network in which $K$ users with independent caches of size $M$ request files from a server that has $N$ files. Among other results, the caching scheme and lower bounds proposed by them le… ▽ More The idea of coded caching was introduced by Maddah-Ali and Niesen who demonstrated the advantages of coding in caching problems. To capture the essence of the problem, they introduced the $(N, K)$ canonical cache network in which $K$ users with independent caches of size $M$ request files from a server that has $N$ files. Among other results, the caching scheme and lower bounds proposed by them led to a characterization of the exact rate memory tradeoff when $M\geq \frac{N}{K}(K-1)$. These lower bounds along with the caching scheme proposed by Chen et al. led to a characterization of the exact rate memory tradeoff when $M\leq \frac{1}{K}$. In this paper we focus on small caches where $M\in \left[0,\frac{N}{K}\right]$ and derive new lower bounds. For the case when $\big\lceil\frac{K+1}{2}\big\rceil\leq N \leq K$ and $M\in \big[\frac{1}{K},\frac{N}{K(N-1)}\big]$, our lower bounds demonstrate that the caching scheme introduced by G{ó}mez-Vilardeb{ó} is optimal and thus extend the characterization of the exact rate memory tradeoff. For the case $1\leq N\leq \big\lceil\frac{K+1}{2}\big\rceil$, we show that the new lower bounds improve upon the previously known lower bounds. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2101.09785

arXiv:2101.09785 [pdf, other]

The Exact Rate Memory Tradeoff for Large Caches with Coded Placement

Authors: Vijith Kumar K P, Brijesh Kumar Rai, Tony Jacob

Abstract: The idea of coded caching for content distribution networks was introduced by Maddah-Ali and Niesen, who considered the canonical $(N, K)$ cache network in which a server with $N$ files satisfy the demands of $K$ users (equipped with independent caches of size $M$ each). Among other results, their work provided a characterization of the exact rate memory tradeoff for the problem when… ▽ More The idea of coded caching for content distribution networks was introduced by Maddah-Ali and Niesen, who considered the canonical $(N, K)$ cache network in which a server with $N$ files satisfy the demands of $K$ users (equipped with independent caches of size $M$ each). Among other results, their work provided a characterization of the exact rate memory tradeoff for the problem when $M\geq\frac{N}{K}(K-1)$. In this paper, we improve this result for large caches with $M\geq \frac{N}{K}(K-2)$. For the case $\big\lceil\frac{K+1}{2}\big\rceil\leq N \leq K$, we propose a new coded caching scheme, and derive a matching lower bound to show that the proposed scheme is optimal. This extends the characterization of the exact rate memory tradeoff to the case $M\geq \frac{N}{K}\Big(K-2+\frac{(K-2+1/N)}{(K-1)}\Big)$. For the case $1\leq N\leq \big\lceil\frac{K+1}{2}\big\rceil$, we derive a new lower bound, which demonstrates that the scheme proposed by Yu et al. is optimal and thus extend the characterization of the exact rate memory tradeoff to the case $M\geq \frac{N}{K}(K-2)$. △ Less

Submitted 24 January, 2021; originally announced January 2021.

arXiv:1905.08471 [pdf, ps, other]

Fundamental Limits of Coded Caching: The Memory Rate Pair (K-1-1/K, 1/(K-1))

Authors: Vijith Kumar K P, Brijesh Kumar Rai, Tony Jacob

Abstract: Maddah-Ali and Niesen, in a seminal paper, introduced the notion of coded caching. The exact nature of the fundamental limits in this context has remained elusive even as several approximate characterizations have been found. A new optimal scheme for the (3, 3) cache network, operating at the memory rate pair (5/3, 1/2) for the demand where all the users request for distinct files, was introduced… ▽ More Maddah-Ali and Niesen, in a seminal paper, introduced the notion of coded caching. The exact nature of the fundamental limits in this context has remained elusive even as several approximate characterizations have been found. A new optimal scheme for the (3, 3) cache network, operating at the memory rate pair (5/3, 1/2) for the demand where all the users request for distinct files, was introduced recently to partially address this issue. In this paper, an extension of this scheme to the general (K, K) cache network, operating at the memory rate pair (K-1-1/K, 1/(K-1)), is proposed. A new lower bound is also derived which demonstrates the optimality of the proposed scheme for the demand where all the users request for distinct files. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: 5 pages, 2 figures. To be presented in ISIT 2019

Showing 1–9 of 9 results for author: Jacob, T