Skip to main content

Showing 1–50 of 88 results for author: Banerjee, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02139  [pdf, ps, other

    cs.IR cs.AI cs.DL

    When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search

    Authors: William A. Ingram, Bipasha Banerjee, Edward A. Fox

    Abstract: Large language models (LLMs) are increasingly used to assign document relevance labels in information retrieval pipelines, especially in domains lacking human-labeled data. However, different models often disagree on borderline cases, raising concerns about how such disagreement affects downstream retrieval. This study examines labeling disagreement between two open-weight LLMs, LLaMA and Qwen, on… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Presented at LLM4Eval Workshop, SIGIR 2025 Padova, Italy, July 17, 2025

  2. arXiv:2506.21260  [pdf, ps, other

    cs.CV

    DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic

    Authors: Munish Monga, Vishal Chudasama, Pankaj Wasnik, Biplab Banerjee

    Abstract: Real-world object detection systems, such as those in autonomous driving and surveillance, must continuously learn new object categories and simultaneously adapt to changing environmental conditions. Existing approaches, Class Incremental Object Detection (CIOD) and Domain Incremental Object Detection (DIOD) only address one aspect of this challenge. CIOD struggles in unseen domains, while DIOD su… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  3. arXiv:2506.20464  [pdf

    cs.CV

    A Deep Learning Approach to Identify Rock Bolts in Complex 3D Point Clouds of Underground Mines Captured Using Mobile Laser Scanners

    Authors: Dibyayan Patra, Pasindu Ranasinghe, Bikram Banerjee, Simit Raval

    Abstract: Rock bolts are crucial components of the subterranean support systems in underground mines that provide adequate structural reinforcement to the rock mass to prevent unforeseen hazards like rockfalls. This makes frequent assessments of such bolts critical for maintaining rock mass stability and minimising risks in underground mining operations. Where manual surveying of rock bolts is challenging d… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    ACM Class: I.4.9

  4. arXiv:2504.20860  [pdf, other

    cs.CV

    FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models

    Authors: Mainak Singha, Subhankar Roy, Sarthak Mehrotra, Ankit Jha, Moloud Abdar, Biplab Banerjee, Elisa Ricci

    Abstract: Textual prompt tuning adapts Vision-Language Models (e.g., CLIP) in federated learning by tuning lightweight input tokens (or prompts) on local client data, while keeping network weights frozen. Post training, only the prompts are shared by the clients with the central server for aggregation. However, textual prompt tuning often struggles with overfitting to known concepts and may be overly relian… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  5. arXiv:2504.16433  [pdf, other

    cs.CV

    FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing

    Authors: Hariseetharam Gunduboina, Muhammad Haris Khan, Biplab Banerjee

    Abstract: In recent years, large-scale vision-language models (VLMs) like CLIP have gained attention for their zero-shot inference using instructional text prompts. While these models excel in general computer vision, their potential for domain generalization in remote sensing (RS) remains underexplored. Existing approaches enhance prompt learning by generating visual prompt tokens but rely on full-image fe… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  6. arXiv:2504.09203  [pdf, other

    cs.CV cs.AI

    AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images

    Authors: Saikat Dutta, Akhil Vasim, Siddhant Gole, Hamid Rezatofighi, Biplab Banerjee

    Abstract: Image segmentation beyond predefined categories is a key challenge in remote sensing, where novel and unseen classes often emerge during inference. Open-vocabulary image Segmentation addresses these generalization issues in traditional supervised segmentation models while reducing reliance on extensive per-pixel annotations, which are both expensive and labor-intensive to obtain. Most Open-Vocabul… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: Accepted at EarthVision workshop, CVPR 2025

  7. arXiv:2504.03181  [pdf, other

    cs.CV

    MIMRS: A Survey on Masked Image Modeling in Remote Sensing

    Authors: Shabnam Choudhury, Akhil Vasim, Michael Schmitt, Biplab Banerjee

    Abstract: Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image, such as pixels, patches, or latent representations, and training models to predict the missing information using the visible context. This approach has emerged as a cornerstone in self-supervised learning, unlocking new possibilities in visual understanding by leveraging unannotated data… ▽ More

    Submitted 7 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: 6 pages

  8. arXiv:2504.03169  [pdf, ps, other

    cs.CV

    REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval

    Authors: Shabnam Choudhury, Yash Salunkhe, Sarthak Mehrotra, Biplab Banerjee

    Abstract: The rapid expansion of remote sensing image archives demands the development of strong and efficient techniques for content-based image retrieval (RS-CBIR). This paper presents REJEPA (Retrieval with Joint-Embedding Predictive Architecture), an innovative self-supervised framework designed for unimodal RS-CBIR. REJEPA utilises spatially distributed context token encoding to forecast abstract repre… ▽ More

    Submitted 30 May, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: 14 pages

  9. arXiv:2503.16106  [pdf, other

    cs.CV

    OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP

    Authors: Mohamad Hassan N C, Divyam Gupta, Mainak Singha, Sai Bhargav Rongali, Ankit Jha, Muhammad Haris Khan, Biplab Banerjee

    Abstract: We introduce Low-Shot Open-Set Domain Generalization (LSOSDG), a novel paradigm unifying low-shot learning with open-set domain generalization (ODG). While prompt-based methods using models like CLIP have advanced DG, they falter in low-data regimes (e.g., 1-shot) and lack precision in detecting open-set samples with fine-grained semantics related to training classes. To address these challenges,… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  10. arXiv:2503.14897  [pdf, other

    cs.CV

    When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach

    Authors: Vaibhav Rathore, Shubhranil B, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee

    Abstract: Generalized Class Discovery (GCD) clusters base and novel classes in a target domain using supervision from a source domain with only base classes. Current methods often falter with distribution shifts and typically require access to target data during training, which can sometimes be impractical. To address this issue, we introduce the novel paradigm of Domain Generalization in GCD (DG-GCD), wher… ▽ More

    Submitted 21 March, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025 (Main Conference)

  11. arXiv:2503.12575  [pdf, other

    cs.CV cs.AI

    BalancedDPO: Adaptive Multi-Metric Alignment

    Authors: Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO)… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2501.09878  [pdf, other

    cs.CV cs.AI

    ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction

    Authors: Izzeddin Teeti, Aniket Thomas, Munish Monga, Sachin Kumar, Uddeshya Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin

    Abstract: We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a gr… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  13. arXiv:2412.20057  [pdf, ps, other

    cs.CL

    "My life is miserable, have to sign 500 autographs everyday": Exposing Humblebragging, the Brags in Disguise

    Authors: Sharath Naganna, Saprativa Bhattacharjee, Biplab Banerjee, Pushpak Bhattacharyya

    Abstract: Humblebragging is a phenomenon in which individuals present self-promotional statements under the guise of modesty or complaints. For example, a statement like, "Ugh, I can't believe I got promoted to lead the entire team. So stressful!", subtly highlights an achievement while pretending to be complaining. Detecting humblebragging is important for machines to better understand the nuances of human… ▽ More

    Submitted 1 June, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

    Comments: Accepted to ACL 2025 Findings

  14. arXiv:2412.09230  [pdf, other

    cs.CV cs.AI

    Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering

    Authors: Sai Bhargav Rongali, Mohamad Hassan N C, Ankit Jha, Neha Bhargava, Saurabh Prasad, Biplab Banerjee

    Abstract: This paper tackles the intricate challenge of video question-answering (VideoQA). Despite notable progress, current methods fall short of effectively integrating questions with video frames and semantic object-level abstractions to create question-aware video representations. We introduce Local-Global Question Aware Video Embedding (LGQAVE), which incorporates three major innovations to integrate… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Journal ref: WACV2025

  15. arXiv:2412.07539  [pdf, ps, other

    cs.LG

    Anomaly detection using Diffusion-based methods

    Authors: Aryan Bhosale, Samrat Mukherjee, Biplab Banerjee, Fabio Cuzzolin

    Abstract: This paper explores the utility of diffusion-based models for anomaly detection, focusing on their efficacy in identifying deviations in both compact and high-resolution datasets. Diffusion-based architectures, including Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs), are evaluated for their performance using reconstruction objectives. By leveraging the strength… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  16. arXiv:2412.00860  [pdf, other

    cs.LG cs.AI stat.ML

    Deep evolving semi-supervised anomaly detection

    Authors: Jack Belham, Aryan Bhosale, Samrat Mukherjee, Biplab Banerjee, Fabio Cuzzolin

    Abstract: The aim of this paper is to formalise the task of continual semi-supervised anomaly detection (CSAD), with the aim of highlighting the importance of such a problem formulation which assumes as close to real-world conditions as possible. After an overview of the relevant definitions of continual semi-supervised learning, its components, anomaly detection extension, and the training protocols; the p… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  17. Automating Chapter-Level Classification for Electronic Theses and Dissertations

    Authors: Bipasha Banerjee, William A. Ingram, Edward A. Fox

    Abstract: Traditional archival practices for describing electronic theses and dissertations (ETDs) rely on broad, high-level metadata schemes that fail to capture the depth, complexity, and interdisciplinary nature of these long scholarly works. The lack of detailed, chapter-level content descriptions impedes researchers' ability to locate specific sections or themes, thereby reducing discoverability and ov… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  18. arXiv:2411.17600  [pdf, other

    cs.DL cs.AI cs.IR

    Making History Readable

    Authors: Bipasha Banerjee, Jennifer Goyne, William A. Ingram

    Abstract: The Virginia Tech University Libraries (VTUL) Digital Library Platform (DLP) hosts digital collections that offer our users access to a wide variety of documents of historical and cultural importance. These collections are not only of academic importance but also provide our users with a glance at local historical events. Our DLP contains collections comprising digital objects featuring complex la… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  19. Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals

    Authors: William A. Ingram, Bipasha Banerjee, Edward A. Fox

    Abstract: As research institutions increasingly commit to supporting the United Nations' Sustainable Development Goals (SDGs), there is a pressing need to accurately assess their research output against these goals. Current approaches, primarily reliant on keyword-based Boolean search queries, conflate incidental keyword matches with genuine contributions, reducing retrieval precision and complicating bench… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  20. arXiv:2411.14202  [pdf, other

    cs.LG cs.CV

    Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks

    Authors: Sanchar Palit, Biplab Banerjee, Subhasis Chaudhuri

    Abstract: We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. This is compounded by the crucial need to mitigate catastrophic forgetting, particularly given the limited access to… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: at ICVGIP 2024

  21. arXiv:2411.02074  [pdf, other

    cs.CV

    GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery

    Authors: Bhupendra Solanki, Ashwin Nair, Mainak Singha, Souradeep Mukhopadhyay, Ankit Jha, Biplab Banerjee

    Abstract: Generalized Category Discovery (GCD) aims to cluster unlabeled images into known and novel categories using labeled images from known classes. To address the challenge of transferring features from known to unknown classes while mitigating model bias, we introduce GraphVL, a novel approach for vision-language modeling in GCD, leveraging CLIP. Our method integrates a graph convolutional network (GC… ▽ More

    Submitted 16 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: Accepted in ACM ICVGIP 2024

  22. arXiv:2409.17459  [pdf, other

    cs.CV

    TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene

    Authors: Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi

    Abstract: Despite advancements in Neural Implicit models for 3D surface reconstruction, handling dynamic environments with interactions between arbitrary rigid, non-rigid, or deformable entities remains challenging. The generic reconstruction methods adaptable to such dynamic scenes often require additional inputs like depth or optical flow or rely on pre-trained image features for reasonable outcomes. Thes… ▽ More

    Submitted 4 December, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted in NeurIPS 2024 https://github.com/sbsws88/TFS-NeRF

  23. arXiv:2409.08724  [pdf, other

    cs.LG cs.AI

    Quasimetric Value Functions with Dense Rewards

    Authors: Khadichabonu Valieva, Bikramjit Banerjee

    Abstract: As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s,a,g)$ has a quasimetric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  24. arXiv:2409.00530  [pdf, other

    cs.CV

    Incremental Open-set Domain Adaptation

    Authors: Sayan Rakshit, Hmrishav Bandyopadhyay, Nibaran Das, Biplab Banerjee

    Abstract: Catastrophic forgetting makes neural network models unstable when learning visual domains consecutively. The neural network model drifts to catastrophic forgetting-induced low performance of previously learnt domains when training with new domains. We illuminate this current neural network model weakness and develop a forgetting-resistant incremental learning strategy. Here, we propose a new unsup… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  25. arXiv:2409.00397  [pdf, other

    cs.CV

    COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation

    Authors: Munish Monga, Sachin Kumar Giroh, Ankit Jha, Mainak Singha, Biplab Banerjee, Jocelyn Chanussot

    Abstract: Multi-Target Domain Adaptation (MTDA) entails learning domain-invariant information from a single source domain and applying it to multiple unlabeled target domains. Yet, existing MTDA methods predominantly focus on addressing domain shifts within visual features, often overlooking semantic features and struggling to handle unknown classes, resulting in what is known as Open-Set (OS) MTDA. While l… ▽ More

    Submitted 16 December, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted in BMVC 2024

  26. arXiv:2407.05145  [pdf, other

    stat.ML cs.LG

    On high-dimensional modifications of the nearest neighbor classifier

    Authors: Annesha Ghosh, Deep Ghoshal, Bilol Banerjee, Anil K. Ghosh

    Abstract: Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their locat… ▽ More

    Submitted 24 October, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

  27. arXiv:2407.04207  [pdf, other

    cs.CV

    Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning

    Authors: Mainak Singha, Ankit Jha, Divyam Gupta, Pranav Singla, Biplab Banerjee

    Abstract: We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to exploit CLIP's i… ▽ More

    Submitted 22 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted in ECCV 2024

  28. arXiv:2405.13559  [pdf

    cs.CE

    Identification of microstructure from macroscopic measurement using inverse multiscale analysis

    Authors: Anjan Mukherjee, Biswanth Banerjee

    Abstract: Most of the tailored materials are heterogeneous at the ingredient level. Analysis of those heterogeneous structures requires the knowledge of microstructure. With the knowledge of microstructure, multiscale analysis is carried out with homogenization at the micro level. Second-order homogenization is carried out whenever the ingredient size is comparable to the structure size. Therefore, knowledg… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Structural Engineering Convention SEC 2023

  29. arXiv:2405.13384  [pdf, other

    cs.CE

    Elastic-gap free strain gradient crystal plasticity model that effectively account for plastic slip gradient and grain boundary dissipation

    Authors: Anjan Mukherjee, Biswanath Banerjee

    Abstract: This paper proposes an elastic-gap free strain gradient crystal plasticity model that addresses dissipation caused by plastic slip gradient and grain boundary (GB) Burger tensor. The model involves splitting plastic slip gradient and GB Burger tensor into energetic dissipative quantities. Unlike conventional models, the bulk and GB defect energy are considered to be a quadratic functional of the e… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Submitted in Journal of the Mechanics and Physics of Solids

  30. arXiv:2405.01040  [pdf, other

    cs.CV cs.CL eess.IV

    Few Shot Class Incremental Learning using Vision-Language models

    Authors: Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee

    Abstract: Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The cha… ▽ More

    Submitted 15 August, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  31. arXiv:2404.05366  [pdf, other

    cs.CV

    CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

    Authors: Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee

    Abstract: In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted in L3D-IVU, CVPR Workshop, 2024

  32. arXiv:2404.00710  [pdf, other

    cs.CV

    Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

    Authors: Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee

    Abstract: We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harne… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024

  33. arXiv:2403.18454  [pdf, other

    cs.CV

    Scaling Vision-and-Language Navigation With Offline RL

    Authors: Valay Bundele, Mahesh Bhupati, Biplab Banerjee, Aditya Grover

    Abstract: The study of vision-and-language navigation (VLN) has typically relied on expert trajectories, which may not always be available in real-world situations due to the significant effort required to collect them. On the other hand, existing approaches to training VLN agents that go beyond available expert data involve data augmentations or online exploration which can be tedious and risky. In contras… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Published in Transactions on Machine Learning Research (04/2024)

  34. Comparative Evaluation of Traditional and Deep Learning-Based Segmentation Methods for Spoil Pile Delineation Using UAV Images

    Authors: Sureka Thiruchittampalam, Bikram P. Banerjee, Nancy F. Glenn, Simit Raval

    Abstract: The stability of mine dumps is contingent upon the precise arrangement of spoil piles, taking into account their geological and geotechnical attributes. Yet, on-site characterisation of individual piles poses a formidable challenge. The utilisation of image-based techniques for spoil pile characterisation, employing remotely acquired data through unmanned aerial systems, is a promising complementa… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  35. arXiv:2311.15812  [pdf, other

    cs.CV

    C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

    Authors: Avigyan Bhattacharya, Mainak Singha, Ankit Jha, Biplab Banerjee

    Abstract: We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the impo… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted in ACM ICVGIP 2023

  36. arXiv:2311.02599  [pdf, other

    cs.CV

    Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization

    Authors: Prathmesh Bele, Valay Bundele, Avigyan Bhattacharya, Ankit Jha, Gemma Roig, Biplab Banerjee

    Abstract: Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify ope… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 11 pages, WACV 2024

  37. arXiv:2310.00828  [pdf, ps, other

    cs.CY

    A Model for Calculating Cost of Applying Electronic Governance and Robotic Process Automation to a Distributed Management System

    Authors: Bonny Banerjee, Saurabh Pahune

    Abstract: Electronic Governance (eGov) and Robotic Process Automation (RPA) are two technological advancements that have the potential to revolutionize the way organizations manage their operations. When applied to Distributed Management (DM), these technologies can further enhance organizational efficiency and effectiveness. In this brief article, we present a mathematical model for calculating the cost of… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  38. arXiv:2309.13470  [pdf, other

    cs.CV

    HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

    Authors: Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee

    Abstract: Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime,… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 8 Page, 2 Figures, 2 Tables, Accepted in Adapting to Change: Reliable Multimodal Learning Across Domains Workshop, ECML PKDD 2023

  39. arXiv:2309.12814  [pdf, other

    cs.CV

    Domain Adaptive Few-Shot Open-Set Learning

    Authors: Debabrata Pal, Deeptej More, Sai Bhargav, Dipesh Tamboli, Vaneet Aggarwal, Biplab Banerjee

    Abstract: Few-shot learning has made impressive strides in addressing the crucial challenges of recognizing unknown samples from novel classes in target query sets and managing visual shifts between domains. However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Journal ref: ICCV 2023

  40. arXiv:2309.01050  [pdf, other

    cs.CV

    Efficient Curriculum based Continual Learning with Informative Subset Selection for Remote Sensing Scene Classification

    Authors: S Divakar Bhat, Biplab Banerjee, Subhasis Chaudhuri, Avik Bhattacharya

    Abstract: We tackle the problem of class incremental learning (CIL) in the realm of landcover classification from optical remote sensing (RS) images in this paper. The paradigm of CIL has recently gained much prominence given the fact that data are generally obtained in a sequential manner for real-world phenomenon. However, CIL has not been extensively considered yet in the domain of RS irrespective of the… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  41. arXiv:2308.11605  [pdf, other

    cs.CV

    GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

    Authors: Mainak Singha, Ankit Jha, Biplab Banerjee

    Abstract: Large-scale foundation models, such as CLIP, have demonstrated remarkable success in visual recognition tasks by embedding images in a semantically rich space. Self-supervised learning (SSL) has also shown promise in improving visual recognition by learning invariant features. However, the combination of CLIP with SSL is found to face challenges due to the multi-task framework that blends CLIP's c… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted at BMVC 2023

  42. arXiv:2308.05659  [pdf, other

    cs.CV

    AD-CLIP: Adapting Domains in Prompt Space Using CLIP

    Authors: Mainak Singha, Harsh Pal, Ankit Jha, Biplab Banerjee

    Abstract: Although deep learning models have shown impressive performance on supervised learning tasks, they often struggle to generalize well when the training (source) and test (target) domains differ. Unsupervised domain adaptation (DA) has emerged as a popular solution to this problem. However, current DA techniques rely on visual backbones, which may lack semantic richness. Despite the potential of lar… ▽ More

    Submitted 16 September, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: 10 pages, 8 figures, 4 tables. Accepted at OOD-CV, ICCV Workshop, 2023

  43. arXiv:2308.04589  [pdf, other

    cs.CV cs.AI

    Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction

    Authors: Izzeddin Teeti, Rongali Sai Bhargav, Vivek Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin

    Abstract: The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches r… ▽ More

    Submitted 20 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

  44. arXiv:2307.14570  [pdf, other

    cs.CV cs.RO

    Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach

    Authors: Sandika Biswas, Kejie Li, Biplab Banerjee, Subhasis Chaudhuri, Hamid Rezatofighi

    Abstract: Holistic 3D human-scene reconstruction is a crucial and emerging research area in robot perception. A key challenge in holistic 3D human-scene reconstruction is to generate a physically plausible 3D scene from a single monocular RGB image. The existing research mainly proposes optimization-based approaches for reconstructing the scene from a sequence of RGB frames with explicitly defined physical… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted in RAL 2023

  45. arXiv:2306.14264  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck

    Authors: Jayesh Songara, Shivam Pande, Shabnam Choudhury, Biplab Banerjee, Rajbabu Velmurugan

    Abstract: In this research, we deal with the problem of visual question answering (VQA) in remote sensing. While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy. Furthermore, processing image information jointly with language features adds addition… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  46. arXiv:2306.10955  [pdf, other

    cs.CV

    Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment

    Authors: Shivam Pande, Nassim Ait Ali Braham, Yi Wang, Conrad M Albrecht, Biplab Banerjee, Xiao Xiang Zhu

    Abstract: Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images. However, these images suffer from the problem of curse of dimensionality and usually require a large number samples for tasks such as classification, especially in supervised setting. Recently, to effectively train the deep learning models with mini… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: The paper was submitted in IGARSS, 2023 conference and is not accepted to appear in the proceedings. The page requirement is 4 pages, including references

  47. arXiv:2306.06717  [pdf, other

    cs.CV cs.GR

    PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications

    Authors: Manorama Jha, Bhaskar Banerjee

    Abstract: We present an efficient and robust point cloud registration (PCR) workflow for part-wise rigid point cloud alignment using the Microsoft HoloLens 2. Point Cloud Registration (PCR) is an important problem in Augmented and Mixed Reality use cases, and we present a study for a special class of non-rigid transformations. Many commonly encountered objects are composed of rigid parts that move relative… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted for presentation at WiCV @ CVPR 2023

  48. arXiv:2305.17520  [pdf, other

    cs.CV cs.AI

    USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

    Authors: Vikrant Rangnekar, Uddeshya Upadhyay, Zeynep Akata, Biplab Banerjee

    Abstract: Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc. However, the high cost of annotation and labeling makes it challenging to achieve accurate results. We propose incorporating active learning into dense regression models to address this problem. Active learning allows models to select the most informative samp… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted at UAI 2023

  49. arXiv:2305.05159  [pdf, other

    cs.LG cs.AI cs.MA

    Latent Interactive A2C for Improved RL in Open Many-Agent Systems

    Authors: Keyang He, Prashant Doshi, Bikramjit Banerjee

    Abstract: There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. But, these methods involve obtaining various types of information from the other agents, which may not be feasible in competitive or adversarial settings. A recent method, the interactive advantage actor critic (IA2C), engages in decentralized training coupled with decentralized execution… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  50. arXiv:2304.05995  [pdf, other

    cs.CV

    APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP

    Authors: Mainak Singha, Ankit Jha, Bhupendra Solanki, Shirsha Bose, Biplab Banerjee

    Abstract: In recent years, the success of large-scale vision-language models (VLMs) such as CLIP has led to their increased usage in various computer vision tasks. These models enable zero-shot inference through carefully crafted instructional text prompts without task-specific supervision. However, the potential of VLMs for generalization tasks in remote sensing (RS) has not been fully realized. To address… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 11 Pages, 6 figures, 8 tables, Accepted in Earth Vision (CVPR 2023)