-
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Authors:
Jiajie Li,
Brian R Quaranto,
Chenhui Xu,
Ishan Mishra,
Ruiyang Qin,
Dancheng Liu,
Peter C W Kim,
Jinjun Xiong
Abstract:
We present RASO, a foundation model designed to Recognize Any Surgical Object, offering robust open-set recognition capabilities across a broad range of surgical procedures and object classes, in both surgical images and videos. RASO leverages a novel weakly-supervised learning framework that generates tag-image-text pairs automatically from large-scale unannotated surgical lecture videos, signifi…
▽ More
We present RASO, a foundation model designed to Recognize Any Surgical Object, offering robust open-set recognition capabilities across a broad range of surgical procedures and object classes, in both surgical images and videos. RASO leverages a novel weakly-supervised learning framework that generates tag-image-text pairs automatically from large-scale unannotated surgical lecture videos, significantly reducing the need for manual annotations. Our scalable data generation pipeline gathers 2,200 surgical procedures and produces 3.6 million tag annotations across 2,066 unique surgical tags. Our experiments show that RASO achieves improvements of 2.9 mAP, 4.5 mAP, 10.6 mAP, and 7.2 mAP on four standard surgical benchmarks, respectively, in zero-shot settings, and surpasses state-of-the-art models in supervised surgical action recognition tasks. Code, model, and demo are available at https://ntlm1686.github.io/raso.
△ Less
Submitted 5 May, 2025; v1 submitted 25 January, 2025;
originally announced January 2025.
-
Client Contribution Normalization for Enhanced Federated Learning
Authors:
Mayank Kumar Kundalwal,
Anurag Saraswat,
Ishan Mishra,
Deepak Mishra
Abstract:
Mobile devices, including smartphones and laptops, generate decentralized and heterogeneous data, presenting significant challenges for traditional centralized machine learning models due to substantial communication costs and privacy risks. Federated Learning (FL) offers a promising alternative by enabling collaborative training of a global model across decentralized devices without data sharing.…
▽ More
Mobile devices, including smartphones and laptops, generate decentralized and heterogeneous data, presenting significant challenges for traditional centralized machine learning models due to substantial communication costs and privacy risks. Federated Learning (FL) offers a promising alternative by enabling collaborative training of a global model across decentralized devices without data sharing. However, FL faces challenges due to statistical heterogeneity among clients, where non-independent and identically distributed (non-IID) data impedes model convergence and performance. This paper focuses on data-dependent heterogeneity in FL and proposes a novel approach leveraging mean latent representations extracted from locally trained models. The proposed method normalizes client contributions based on these representations, allowing the central server to estimate and adjust for heterogeneity during aggregation. This normalization enhances the global model's generalization and mitigates the limitations of conventional federated averaging methods. The main contributions include introducing a normalization scheme using mean latent representations to handle statistical heterogeneity in FL, demonstrating the seamless integration with existing FL algorithms to improve performance in non-IID settings, and validating the approach through extensive experiments on diverse datasets. Results show significant improvements in model accuracy and consistency across skewed distributions. Our experiments with six FL schemes: FedAvg, FedProx, FedBABU, FedNova, SCAFFOLD, and SGDM highlight the robustness of our approach. This research advances FL by providing a practical and computationally efficient solution for statistical heterogeneity, contributing to the development of more reliable and generalized machine learning models.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Light-weight Deep Extreme Multilabel Classification
Authors:
Istasis Mishra,
Arpan Dasgupta,
Pratik Jawanpuria,
Bamdev Mishra,
Pawan Kumar
Abstract:
Extreme multi-label (XML) classification refers to the task of supervised multi-label learning that involves a large number of labels. Hence, scalability of the classifier with increasing label dimension is an important consideration. In this paper, we develop a method called LightDXML which modifies the recently developed deep learning based XML framework by using label embeddings instead of feat…
▽ More
Extreme multi-label (XML) classification refers to the task of supervised multi-label learning that involves a large number of labels. Hence, scalability of the classifier with increasing label dimension is an important consideration. In this paper, we develop a method called LightDXML which modifies the recently developed deep learning based XML framework by using label embeddings instead of feature embedding for negative sampling and iterating cyclically through three major phases: (1) proxy training of label embeddings (2) shortlisting of labels for negative sampling and (3) final classifier training using the negative samples. Consequently, LightDXML also removes the requirement of a re-ranker module, thereby, leading to further savings on time and memory requirements. The proposed method achieves the best of both worlds: while the training time, model size and prediction times are on par or better compared to the tree-based methods, it attains much better prediction accuracy that is on par with the deep learning based methods. Moreover, the proposed approach achieves the best tail-label prediction accuracy over most state-of-the-art XML methods on some of the large datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and IIIT Seed grant at IIIT, Hyderabad, India. Code: \url{https://github.com/misterpawan/LightDXML}
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Distilling Calibrated Student from an Uncalibrated Teacher
Authors:
Ishan Mishra,
Sethu Vamsi Krishna,
Deepak Mishra
Abstract:
Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of c…
▽ More
Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is critical in high-risk domains. In this paper, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to cutout, mixup, and CutMix, with knowledge distillation. We extend our approach beyond traditional knowledge distillation and find it suitable for Relational Knowledge Distillation and Contrastive Representation Distillation as well. The novelty of the work is that it provides a framework to distill a calibrated student from an uncalibrated teacher model without compromising the accuracy of the distilled student. We perform extensive experiments to validate our approach on various datasets, including CIFAR-10, CIFAR-100, CINIC-10 and TinyImageNet, and obtained calibrated student models. We also observe robust performance of our approach while evaluating it on corrupted CIFAR-100C data.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Probabilistic Trust Intervals for Out of Distribution Detection
Authors:
Gagandeep Singh,
Ishan Mishra,
Deepak Mishra
Abstract:
The ability of a deep learning network to distinguish between in-distribution (ID) and out-of-distribution (OOD) inputs is crucial for ensuring the reliability and trustworthiness of AI systems. Existing OOD detection methods often involve complex architectural innovations, such as ensemble models, which, while enhancing detection accuracy, significantly increase model complexity and training time…
▽ More
The ability of a deep learning network to distinguish between in-distribution (ID) and out-of-distribution (OOD) inputs is crucial for ensuring the reliability and trustworthiness of AI systems. Existing OOD detection methods often involve complex architectural innovations, such as ensemble models, which, while enhancing detection accuracy, significantly increase model complexity and training time. Other methods utilize surrogate samples to simulate OOD inputs, but these may not generalize well across different types of OOD data. In this paper, we propose a straightforward yet novel technique to enhance OOD detection in pre-trained networks without altering its original parameters. Our approach defines probabilistic trust intervals for each network weight, determined using in-distribution data. During inference, additional weight values are sampled, and the resulting disagreements among outputs are utilized for OOD detection. We propose a metric to quantify this disagreement and validate its effectiveness with empirical evidence. Our method significantly outperforms various baseline methods across multiple OOD datasets without requiring actual or surrogate OOD samples. We evaluate our approach on MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100 and CIFAR-10-C (a corruption-augmented version of CIFAR-10), across various neural network architectures (e.g., VGG-16, ResNet-20, DenseNet-100). On the MNIST-FashionMNIST setup, our method achieves a False Positive Rate (FPR) of 12.46\% at 95\% True Positive Rate (TPR), compared to 27.09\% achieved by the best baseline. On adversarial and corrupted datasets such as CIFAR-10-C, our proposed method easily differentiate between clean and noisy inputs. These results demonstrate the robustness of our approach in identifying corrupted and adversarial inputs, all without requiring OOD samples during training.
△ Less
Submitted 23 December, 2024; v1 submitted 2 February, 2021;
originally announced February 2021.
-
DeepSWIR: A Deep Learning Based Approach for the Synthesis of Short-Wave InfraRed Band using Multi-Sensor Concurrent Datasets
Authors:
Litu Rout,
Yatharath Bhateja,
Ankur Garg,
Indranil Mishra,
S Manthira Moorthi,
Debjyoti Dhar
Abstract:
Convolutional Neural Network (CNN) is achieving remarkable progress in various computer vision tasks. In the past few years, the remote sensing community has observed Deep Neural Network (DNN) finally taking off in several challenging fields. In this study, we propose a DNN to generate a predefined High Resolution (HR) synthetic spectral band using an ensemble of concurrent Low Resolution (LR) ban…
▽ More
Convolutional Neural Network (CNN) is achieving remarkable progress in various computer vision tasks. In the past few years, the remote sensing community has observed Deep Neural Network (DNN) finally taking off in several challenging fields. In this study, we propose a DNN to generate a predefined High Resolution (HR) synthetic spectral band using an ensemble of concurrent Low Resolution (LR) bands and existing HR bands. Of particular interest, the proposed network, namely DeepSWIR, synthesizes Short-Wave InfraRed (SWIR) band at 5m Ground Sampling Distance (GSD) using Green (G), Red (R) and Near InfraRed (NIR) bands at both 24m and 5m GSD, and SWIR band at 24m GSD. To our knowledge, the highest spatial resolution of commercially deliverable SWIR band is at 7.5m GSD. Also, we propose a Gaussian feathering based image stitching approach in light of processing large satellite imagery. To experimentally validate the synthesized HR SWIR band, we critically analyse the qualitative and quantitative results produced by DeepSWIR using state-of-the-art evaluation metrics. Further, we convert the synthesized DN values to Top Of Atmosphere (TOA) reflectance and compare with the corresponding band of Sentinel-2B. Finally, we show one real world application of the synthesized band by using it to map wetland resources over our region of interest.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.