Skip to main content

Showing 1–16 of 16 results for author: Sikdar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03709  [pdf, other

    cs.CV

    AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives

    Authors: Aniruddh Sikdar, Aditya Gandhamal, Suresh Sundaram

    Abstract: Open-vocabulary semantic segmentation (OVSS) involves assigning labels to each pixel in an image based on textual descriptions, leveraging world models like CLIP. However, they encounter significant challenges in cross-domain generalization, hindering their practical efficacy in real-world applications. Embodied AI systems are transforming autonomous navigation for ground vehicles and drones by en… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at Workshop on Foundation Models Meet Embodied Agents at CVPR 2025 (Non-archival Track)

  2. arXiv:2506.03706  [pdf, other

    cs.CV

    OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation

    Authors: Aditya Gandhamal, Aniruddh Sikdar, Suresh Sundaram

    Abstract: Open-vocabulary semantic segmentation (OVSS) entails assigning semantic labels to each pixel in an image using textual descriptions, typically leveraging world models such as CLIP. To enhance out-of-domain generalization, we propose Cost Aggregation with Optimal Transport (OV-COAST) for open-vocabulary semantic segmentation. To align visual-language features within the framework of optimal transpo… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR 2025 Workshop on Transformers for Vision (Non-archival track)

  3. arXiv:2504.15728  [pdf, other

    cs.CV

    SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems

    Authors: Manjunath D, Aniruddh Sikdar, Prajwal Gurunath, Sumanth Udupa, Suresh Sundaram

    Abstract: Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation by reducing the need for co-registered image pairs and minimizing reliance on large annotated IR datasets. However, inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models, leading to increased false positives and poor-qual… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR-W PBVS 2025

  4. arXiv:2410.20953  [pdf, other

    cs.CV

    IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks

    Authors: Manjunath D, Prajwal Gurunath, Sumanth Udupa, Aditya Gandhamal, Shrikar Madhu, Aniruddh Sikdar, Suresh Sundaram

    Abstract: Deep neural networks (DNNs) have shown exceptional performance when trained on well-illuminated images captured by Electro-Optical (EO) cameras, which provide rich texture details. However, in critical applications like aerial perception, it is essential for DNNs to maintain consistent reliability across all conditions, including low-light scenarios where EO cameras often struggle to capture suffi… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 9 pages, 2 figures

  5. arXiv:2410.12953  [pdf, other

    cs.LG cs.CV eess.IV

    Syn2Real Domain Generalization for Underwater Mine-like Object Detection Using Side-Scan Sonar

    Authors: Aayush Agrawal, Aniruddh Sikdar, Rajini Makam, Suresh Sundaram, Suresh Kumar Besai, Mahesh Gopi

    Abstract: Underwater mine detection with deep learning suffers from limitations due to the scarcity of real-world data. This scarcity leads to overfitting, where models perform well on training data but poorly on unseen data. This paper proposes a Syn2Real (Synthetic to Real) domain generalization approach using diffusion models to address this challenge. We demonstrate that synthetic data generated with… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 7 pages, 4 figures and 3 tables

  6. arXiv:2409.11206  [pdf, other

    cs.CV

    High-Order Evolving Graphs for Enhanced Representation of Traffic Dynamics

    Authors: Aditya Humnabadkar, Arindam Sikdar, Benjamin Cave, Huaizhong Zhang, Paul Bakaki, Ardhendu Behera

    Abstract: We present an innovative framework for traffic dynamics analysis using High-Order Evolving Graphs, designed to improve spatio-temporal representations in autonomous driving contexts. Our approach constructs temporal bidirectional bipartite graphs that effectively model the complex interactions within traffic scenes in real-time. By integrating Graph Neural Networks (GNNs) with high-order multi-agg… ▽ More

    Submitted 18 September, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted manuscript - 2nd Workshop on Vision-Centric Autonomous Driving (VCAD) as part of European Conference on Computer Vision (ECCV) 2024

  7. arXiv:2408.08182  [pdf, ps, other

    cs.CV cs.AI

    Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment

    Authors: Qiushuo Cheng, Catherine Morgan, Arindam Sikdar, Alessandro Masullo, Alan Whone, Majid Mirmehdi

    Abstract: People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring gait turning angles continuously and passively is a component step tow… ▽ More

    Submitted 4 June, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.01843  [pdf, other

    cs.CV

    Supervised Image Translation from Visible to Infrared Domain for Object Detection

    Authors: Prahlad Anand, Qiranul Saadiyean, Aniruddh Sikdar, Nalini N, Suresh Sundaram

    Abstract: This study aims to learn a translation from visible to infrared imagery, bridging the domain gap between the two modalities so as to improve accuracy on downstream tasks including object detection. Previous approaches attempt to perform bi-domain feature fusion through iterative optimization or end-to-end deep convolutional networks. However, we pose the problem as similar to that of image transla… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  9. arXiv:2312.02240  [pdf, other

    cs.CV cs.AI

    Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation

    Authors: Aniruddh Sikdar, Jayant Teotia, Suresh Sundaram

    Abstract: Improving the performance of semantic segmentation models using multispectral information is crucial, especially for environments with low-light and adverse conditions. Multi-modal fusion techniques pursue either the learning of cross-modality features to generate a fused image or engage in knowledge distillation but address multimodal and missing modality scenarios as distinct issues, which is no… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 10 pages, 6 figures

  10. arXiv:2311.18331  [pdf, other

    cs.CV cs.AI

    MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation

    Authors: Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar, Suresh Sundaram

    Abstract: Deep neural networks have shown exemplary performance on semantic scene understanding tasks on source domains, but due to the absence of style diversity during training, enhancing performance on unseen target domains using only single source domain data remains a challenging task. Generation of simulated data is a feasible alternative to retrieving large style-diverse real-world datasets as it is… ▽ More

    Submitted 28 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024

  11. arXiv:2212.07146  [pdf, other

    cs.CV

    Fully complex-valued deep learning model for visual perception

    Authors: Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram

    Abstract: Deep learning models operating in the complex domain are used due to their rich representation capacity. However, most of these models are either restricted to the first quadrant of the complex plane or project the complex-valued data into the real domain, causing a loss of information. This paper proposes that operating entirely in the complex domain increases the overall performance of complex-v… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 6 pages, 2 figures

  12. arXiv:2212.07084  [pdf, other

    cs.CV eess.IV

    Fully Complex-valued Fully Convolutional Multi-feature Fusion Network (FC2MFN) for Building Segmentation of InSAR images

    Authors: Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: Building segmentation in high-resolution InSAR images is a challenging task that can be useful for large-scale surveillance. Although complex-valued deep learning networks perform better than their real-valued counterparts for complex-valued SAR data, phase information is not retained throughout the network, which causes a loss of information. This paper proposes a Fully Complex-valued, Fully Conv… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted for publication in IEEE Symposium Series On Computational Intelligence 2022, 8 pages, 6 figures

  13. arXiv:2212.07039  [pdf, other

    cs.CV

    Multi-Modal Domain Fusion for Multi-modal Aerial View Object Classification

    Authors: Sumanth Udupa, Aniruddh Sikdar, Suresh Sundaram

    Abstract: Object detection and classification using aerial images is a challenging task as the information regarding targets are not abundant. Synthetic Aperture Radar(SAR) images can be used for Automatic Target Recognition(ATR) systems as it can operate in all-weather conditions and in low light settings. But, SAR images contain salt and pepper noise(speckle noise) that cause hindrance for the deep learni… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 7 pages,2 figures

  14. arXiv:2008.01297  [pdf, other

    cs.CL cs.DS

    An improved Bayesian TRIE based model for SMS text normalization

    Authors: Abhinava Sikdar, Niladri Chatterjee

    Abstract: Normalization of SMS text, commonly known as texting language, is being pursued for more than a decade. A probabilistic approach based on the Trie data structure was proposed in literature which was found to be better performing than HMM based approaches proposed earlier in predicting the correct alternative for an out-of-lexicon word. However, success of the Trie based approach depends largely on… ▽ More

    Submitted 18 November, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: 7 pages, 8 figures, under review at Pattern Recognition Letters

  15. A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks

    Authors: Shravan Nayak, Chanakya Ajit Ekbote, Annanya Pratap Singh Chauhan, Raghuram Bharadwaj Diddigi, Prishita Ray, Abhinava Sikdar, Sai Koti Reddy Danda, Shalabh Bhatnagar

    Abstract: We consider the problem of energy management in microgrid networks. A microgrid is capable of generating a limited amount of energy from a renewable resource and is responsible for handling the demands of its dedicated customers. Owing to the variable nature of renewable generation and the demands of the customers, it becomes imperative that each microgrid optimally manages its energy. This involv… ▽ More

    Submitted 15 November, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

  16. arXiv:1906.00705  [pdf, other

    cs.CV

    An Adaptive Training-less System for Anomaly Detection in Crowd Scenes

    Authors: Arindam Sikdar, Ananda S. Chowdhury

    Abstract: Anomaly detection in crowd videos has become a popular area of research for the computer vision community. Several existing methods generally perform a prior training about the scene with or without the use of labeled data. However, it is difficult to always guarantee the availability of prior data, especially, for scenarios like remote area surveillance. To address such challenge, we propose an a… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: 29 pages, 13 figures