Skip to main content

Showing 1–50 of 87 results for author: Roy-Chowdhury, A K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15065  [pdf, ps, other

    cs.LG cs.RO

    HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models

    Authors: Trishna Chakraborty, Udita Ghosh, Xiaopan Zhang, Fahim Faisal Niloy, Yue Dong, Jiachen Li, Amit K. Roy-Chowdhury, Chengyu Song

    Abstract: Large language models (LLMs) are increasingly being adopted as the cognitive core of embodied agents. However, inherited hallucinations, which stem from failures to ground user instructions in the observed physical environment, can lead to navigation errors, such as searching for a refrigerator that does not exist. In this paper, we present the first systematic study of hallucinations in LLM-based… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  2. arXiv:2506.04453  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

    Authors: Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler

    Abstract: Federated learning (FL) allows multiple data-owners to collaboratively train machine learning models by exchanging local gradients, while keeping their private data on-device. To simultaneously enhance privacy and training efficiency, recently parameter-efficient fine-tuning (PEFT) of large-scale pretrained models has gained substantial attention in FL. While keeping a pretrained (backbone) model… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)

  3. arXiv:2504.05789  [pdf, other

    cs.CV

    Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation

    Authors: Sarosij Bose, Hannah Dela Cruz, Arindam Dutta, Elena Kokkoni, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Human pose estimation is a critical tool across a variety of healthcare applications. Despite significant progress in pose estimation algorithms targeting adults, such developments for infants remain limited. Existing algorithms for infant pose estimation, despite achieving commendable performance, depend on fully supervised approaches that require large amounts of labeled data. These algorithms a… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Accepted at ABAW@CVPR 2025

  4. arXiv:2504.03629  [pdf, other

    cs.RO

    SeGuE: Semantic Guided Exploration for Mobile Robots

    Authors: Cody Simons, Aritra Samanta, Amit K. Roy-Chowdhury, Konstantinos Karydis

    Abstract: The rise of embodied AI applications has enabled robots to perform complex tasks which require a sophisticated understanding of their environment. To enable successful robot operation in such settings, maps must be constructed so that they include semantic information, in addition to geometric information. In this paper, we address the novel problem of semantic exploration, whereby a mobile robot… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures, 3 tables

  5. arXiv:2503.15867  [pdf, other

    cs.CV cs.AI

    TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data

    Authors: Rohit Kundu, Athula Balachandran, Amit K. Roy-Chowdhury

    Abstract: Detecting DeepFakes has become a crucial research area as the widespread use of AI image generators enables the effortless creation of face-manipulated and fully synthetic content, yet existing methods are often limited to binary classification (real vs. fake) and lack interpretability. To address these challenges, we propose TruthLens, a novel and highly generalizable framework for DeepFake detec… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  6. arXiv:2503.15671  [pdf, other

    cs.CV

    CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

    Authors: Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu

    Abstract: Reconstructing clothed humans from a single image is a fundamental task in computer vision with wide-ranging applications. Although existing monocular clothed human reconstruction solutions have shown promising results, they often rely on the assumption that the human subject is in an occlusion-free environment. Thus, when encountering in-the-wild occluded images, these algorithms produce multivie… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  7. arXiv:2503.02102  [pdf, other

    cs.CL cs.AI

    Provable Benefits of Task-Specific Prompts for In-context Learning

    Authors: Xiangyu Chang, Yingcong Li, Muti Kara, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribut… ▽ More

    Submitted 5 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) 2025

  8. arXiv:2501.02773  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation

    Authors: Arindam Dutta, Sarosij Bose, Saketh Bachu, Calvin-Khang Ta, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Occlusions are a significant challenge to human pose estimation algorithms, often resulting in inaccurate and anatomically implausible poses. Although current occlusion-robust human pose estimation algorithms exhibit impressive performance on existing datasets, their success is largely attributed to supervised training and the availability of additional information, such as multiple views or tempo… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 9 pages, 7 figures

  9. arXiv:2412.12278  [pdf, other

    cs.CV

    Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

    Authors: Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury

    Abstract: Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches. To address this, we intr… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  10. arXiv:2411.04291  [pdf, other

    cs.CL cs.CV

    Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models

    Authors: Saketh Bachu, Erfan Shayegani, Trishna Chakraborty, Rohit Lal, Arindam Dutta, Chengyu Song, Yue Dong, Nael Abu-Ghazaleh, Amit K. Roy-Chowdhury

    Abstract: Vision-language models (VLMs) have improved significantly in multi-modal tasks, but their more complex architecture makes their safety alignment more challenging than the alignment of large language models (LLMs). In this paper, we reveal an unfair distribution of safety across the layers of VLM's vision encoder, with earlier and middle layers being disproportionately vulnerable to malicious input… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: Preprint, Under Review

  11. arXiv:2410.20621  [pdf, other

    cs.CV

    Egocentric and Exocentric Methods: A Short Survey

    Authors: Anirudh Thatipelli, Shao-Yuan Lo, Amit K. Roy-Chowdhury

    Abstract: Egocentric vision captures the scene from the point of view of the camera wearer, while exocentric vision captures the overall scene context. Jointly modeling ego and exo views is crucial to developing next-generation AI agents. The community has regained interest in the field of egocentric vision. While the third-person view and first-person have been thoroughly investigated, very few works aim t… ▽ More

    Submitted 8 May, 2025; v1 submitted 27 October, 2024; originally announced October 2024.

    Comments: Accepted in Computer Vision and Image Understanding (CVIU), 2025

  12. arXiv:2410.03626  [pdf, other

    cs.LG

    Robust Offline Imitation Learning from Diverse Auxiliary Data

    Authors: Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the qual… ▽ More

    Submitted 22 May, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted at TMLR

  13. arXiv:2409.19459  [pdf, other

    cs.RO cs.CV

    Language-guided Robust Navigation for Mobile Robots in Dynamically-changing Environments

    Authors: Cody Simons, Zhichao Liu, Brandon Marcus, Amit K. Roy-Chowdhury, Konstantinos Karydis

    Abstract: In this paper, we develop an embodied AI system for human-in-the-loop navigation with a wheeled mobile robot. We propose a direct yet effective method of monitoring the robot's current plan to detect changes in the environment that impact the intended trajectory of the robot significantly and then query a human for feedback. We also develop a means to parse human feedback expressed in natural lang… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  14. arXiv:2407.17460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

    Authors: Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li

    Abstract: Reinforcement learning (RL) enables social robots to generate trajectories without relying on human-designed rules or interventions, making it generally more effective than rule-based systems in adapting to complex, dynamic real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians, whereas existing RL-based solutions often… ▽ More

    Submitted 6 February, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Project website: https://sonic-social-nav.github.io/; 16 pages

  15. arXiv:2407.03549  [pdf, other

    cs.CV

    POSTURE: Pose Guided Unsupervised Domain Adaptation for Human Body Part Segmentation

    Authors: Arindam Dutta, Rohit Lal, Yash Garg, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, Amit K. Roy-Chowdhury

    Abstract: Existing algorithms for human body part segmentation have shown promising results on challenging datasets, primarily relying on end-to-end supervision. However, these algorithms exhibit severe performance drops in the face of domain shifts, leading to inaccurate segmentation masks. To tackle this issue, we introduce POSTURE: \underline{Po}se Guided Un\underline{s}upervised Domain Adap\underline{t}… ▽ More

    Submitted 22 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  16. arXiv:2406.02575  [pdf, other

    cs.CL cs.CR cs.LG

    Cross-Modal Safety Alignment: Is textual unlearning all you need?

    Authors: Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

    Abstract: Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting mu… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  17. arXiv:2402.08769  [pdf, other

    cs.LG cs.DC

    FLASH: Federated Learning Across Simultaneous Heterogeneities

    Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of h… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  18. arXiv:2401.04130  [pdf, other

    cs.LG cs.AI

    Plug-and-Play Transformer Modules for Test-Time Adaptation

    Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate… ▽ More

    Submitted 8 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  19. arXiv:2401.02561  [pdf, other

    cs.LG

    CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions

    Authors: Sk Miraj Ahmed, Fahim Faisal Niloy, Xiangyu Chang, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. T… ▽ More

    Submitted 6 November, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: NeurIPS 2024

  20. arXiv:2312.16221  [pdf, other

    cs.CV

    STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation

    Authors: Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, M. Salman Asif, Amit K. Roy-Chowdhury

    Abstract: The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal co… ▽ More

    Submitted 4 December, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: Paper accepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)-2025

  21. arXiv:2312.05407  [pdf, other

    cs.CV

    ODES: Domain Adaptation with Expert Guidance for Online Medical Image Segmentation

    Authors: Md Shazid Islam, Sayak Nag, Arindam Dutta, Miraj Ahmed, Fahim Faisal Niloy, Amit K. Roy-Chowdhury

    Abstract: Unsupervised domain adaptive segmentation typically relies on self-training using pseudo labels predicted by a pre-trained network on an unlabeled target dataset. However, the noisy nature of such pseudo-labels presents a major bottleneck in adapting a network to the distribution shift between source and target datasets. This challenge is exaggerated when the network encounters an incoming data st… ▽ More

    Submitted 15 October, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  22. arXiv:2312.02420  [pdf, other

    cs.CV

    Repurposing SAM for User-Defined Semantics Aware Segmentation

    Authors: Rohit Kundu, Sudipta Paul, Arindam Dutta, Amit K. Roy-Chowdhury

    Abstract: The Segment Anything Model (SAM) excels at generating precise object masks from input prompts but lacks semantic awareness, failing to associate its generated masks with specific object categories. To address this limitation, we propose U-SAM, a novel framework that imbibes semantic awareness into SAM, enabling it to generate targeted masks for user-specified object categories. Given only object c… ▽ More

    Submitted 2 April, 2025; v1 submitted 4 December, 2023; originally announced December 2023.

  23. arXiv:2311.05077  [pdf, other

    cs.CV

    POISE: Pose Guided Human Silhouette Extraction under Occlusions

    Authors: Arindam Dutta, Rohit Lal, Dripta S. Raychaudhuri, Calvin Khang Ta, Amit K. Roy-Chowdhury

    Abstract: Human silhouette extraction is a fundamental task in computer vision with applications in various downstream tasks. However, occlusions pose a significant challenge, leading to incomplete and distorted silhouettes. To address this challenge, we introduce POISE: Pose Guided Human Silhouette Extraction under Occlusions, a novel self-supervised fusion framework that enhances accuracy and robustness i… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Journal ref: Winter Conference on Applications of Computer Vision, 2024

  24. arXiv:2311.04991  [pdf, other

    cs.LG cs.CV

    Effective Restoration of Source Knowledge in Continual Test Time Adaptation

    Authors: Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Traditional test-time adaptation (TTA) methods face significant challenges in adapting to dynamic environments characterized by continuously changing long-term target distributions. These challenges primarily stem from two factors: catastrophic forgetting of previously learned valuable source knowledge and gradual error accumulation caused by miscalibrated pseudo labels. To address these issues, t… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  25. arXiv:2309.11157  [pdf, other

    cs.CV

    Learning Deformable 3D Graph Similarity to Track Plant Cells in Unregistered Time Lapse Images

    Authors: Md Shazid Islam, Arindam Dutta, Calvin-Khang Ta, Kevin Rodriguez, Christian Michael, Mark Alber, G. Venugopala Reddy, Amit K. Roy-Chowdhury

    Abstract: Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this pa… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  26. arXiv:2308.13954  [pdf, other

    cs.CV

    Prior-guided Source-free Domain Adaptation for Human Pose Estimation

    Authors: Dripta S. Raychaudhuri, Calvin-Khang Ta, Arindam Dutta, Rohit Lal, Amit K. Roy-Chowdhury

    Abstract: Domain adaptation methods for 2D human pose estimation typically require continuous access to the source data during adaptation, which can be challenging due to privacy, memory, or computational constraints. To address this limitation, we focus on the task of source-free domain adaptation for pose estimation, where a source model must adapt to a new target domain using only unlabeled target data.… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  27. arXiv:2308.11880  [pdf, other

    cs.CV cs.LG

    SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

    Authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these a… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 12 pages, 5 figures, 9 tables, ICCV 2023

  28. arXiv:2308.11744  [pdf, other

    cs.CV

    Efficient Controllable Multi-Task Architectures

    Authors: Abhishek Aich, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker, Yumin Suh

    Abstract: We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encod… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  29. arXiv:2307.04905  [pdf, other

    cs.LG cs.DC

    FedYolo: Augmenting Federated Learning with Pretrained Transformers

    Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

    Abstract: The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkab… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 20 pages, 18 figures

  30. Collaborative Multi-Agent Video Fast-Forwarding

    Authors: Shuyue Lan, Zhilu Wang, Ermin Wei, Amit K. Roy-Chowdhury, Qi Zhu

    Abstract: Multi-agent applications have recently gained significant popularity. In many computer vision tasks, a network of agents, such as a team of robots with cameras, could work collaboratively to perceive the environment for efficient and accurate situation awareness. However, these agents often have limited computation, communication, and storage resources. Thus, reducing resource consumption while st… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: IEEE Transactions on Multimedia, 2023. arXiv admin note: text overlap with arXiv:2008.04437

  31. arXiv:2212.07010  [pdf, other

    cs.CV

    Cross-Domain Video Anomaly Detection without Target Domain Adaptation

    Authors: Abhishek Aich, Kuan-Chuan Peng, Amit K. Roy-Chowdhury

    Abstract: Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inf… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023; Includes Supplementary Material

  32. arXiv:2210.07940  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

    Authors: Sudipta Paul, Amit K. Roy-Chowdhury, Anoop Cherian

    Abstract: Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation. However, the real world is not only multimodal, but also often complex, and thus in spite of these advances, agents still need to understand the uncertainty in t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  33. arXiv:2210.01298  [pdf, other

    cs.CV cs.RO

    Centroid Distance Keypoint Detector for Colored Point Clouds

    Authors: Hanzhe Teng, Dimitrios Chatziparaschis, Xinyue Kan, Amit K. Roy-Chowdhury, Konstantinos Karydis

    Abstract: Keypoint detection serves as the basis for many computer vision and robotics applications. Despite the fact that colored point clouds can be readily obtained, most existing keypoint detectors extract only geometry-salient keypoints, which can impede the overall performance of systems that intend to (or have the potential to) leverage color information. To promote advances in such systems, we propo… ▽ More

    Submitted 15 June, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023; copyright will be transferred to IEEE upon publication

  34. arXiv:2209.09883  [pdf, other

    cs.CV

    Leveraging Local Patch Differences in Multi-Object Scenes for Generative Adversarial Attacks

    Authors: Abhishek Aich, Shasha Li, Chengyu Song, M. Salman Asif, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury

    Abstract: State-of-the-art generative model-based attacks against image classifiers overwhelmingly focus on single-object (i.e., single dominant object) images. Different from such settings, we tackle a more practical problem of generating adversarial perturbations using multi-object (i.e., multiple dominant objects) images as they are representative of most real-world scenes. Our goal is to design an attac… ▽ More

    Submitted 3 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted at WACV 2023 (Round 1), camera-ready version

  35. arXiv:2209.09502  [pdf, other

    cs.CV

    GAMA: Generative Adversarial Multi-Object Scene Attacks

    Authors: Abhishek Aich, Calvin-Khang Ta, Akash Gupta, Chengyu Song, Srikanth V. Krishnamurthy, M. Salman Asif, Amit K. Roy-Chowdhury

    Abstract: The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to t… ▽ More

    Submitted 15 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022; First two authors contributed equally; Includes Supplementary Material

  36. arXiv:2209.04027  [pdf, other

    cs.CV

    Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

    Authors: Sk Miraj Ahmed, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Amit K. Roy-Chowdhury

    Abstract: Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  37. Poisson2Sparse: Self-Supervised Poisson Denoising From a Single Image

    Authors: Calvin-Khang Ta, Abhishek Aich, Akash Gupta, Amit K. Roy-Chowdhury

    Abstract: Image enhancement approaches often assume that the noise is signal independent, and approximate the degradation model as zero-mean additive Gaussian. However, this assumption does not hold for biomedical imaging systems where sensor-based sources of noise are proportional to signal strengths, and the noise is better represented as a Poisson process. In this work, we explore a sparsity and dictiona… ▽ More

    Submitted 27 June, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to MICCAI 2022

  38. arXiv:2204.00942  [pdf, other

    cs.CV

    A-ACT: Action Anticipation through Cycle Transformations

    Authors: Akash Gupta, Jingen Liu, Liefeng Bo, Amit K. Roy-Chowdhury, Tao Mei

    Abstract: While action anticipation has garnered a lot of research interest recently, most of the works focus on anticipating future action directly through observed visual cues only. In this work, we take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms. To incorporate this ability in intelligent systems a question worth pondering up… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

  39. arXiv:2203.15230  [pdf, other

    cs.CV cs.CR cs.LG

    Zero-Query Transfer Attacks on Context-Aware Object Detectors

    Authors: Zikui Cai, Shantanu Rane, Alejandro E. Brito, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury, M. Salman Asif

    Abstract: Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check, wherein, if the detected objects are not consistent with an appropriately defined context, then an attack is suspected. Stronger attacks are needed to fool su… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 Accepted

  40. arXiv:2203.14949  [pdf, other

    cs.CV cs.LG

    Controllable Dynamic Multi-Task Architectures

    Authors: Dripta S. Raychaudhuri, Yumin Suh, Samuel Schulter, Xiang Yu, Masoud Faraki, Amit K. Roy-Chowdhury, Manmohan Chandraker

    Abstract: Multi-task learning commonly encounters competition for resources among tasks, specifically when model capacity is limited. This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired t… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  41. arXiv:2112.03223  [pdf, other

    cs.CV cs.AI cs.LG

    Context-Aware Transfer Attacks for Object Detection

    Authors: Zikui Cai, Xinxin Xie, Shasha Li, Mingjun Yin, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury, M. Salman Asif

    Abstract: Blackbox transfer attacks for image classifiers have been extensively studied in recent years. In contrast, little progress has been made on transfer attacks for object detectors. Object detectors take a holistic view of the image and the detection of one object (or lack thereof) often depends on other objects in the scene. This makes such detectors inherently context-aware and adversarial attacks… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: accepted to AAAI 2022

  42. arXiv:2110.12321  [pdf, other

    cs.CV cs.LG

    ADC: Adversarial attacks against object Detection that evade Context consistency checks

    Authors: Mingjun Yin, Shasha Li, Chengyu Song, M. Salman Asif, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

    Abstract: Deep Neural Networks (DNNs) have been shown to be vulnerable to adversarial examples, which are slightly perturbed input images which lead DNNs to make wrong predictions. To protect from such examples, various defense strategies have been proposed. A very recent defense strategy for detecting adversarial examples, that has been shown to be robust to current attacks, is to check for intrinsic conte… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

    Comments: WCAV'22 Acceptted

  43. arXiv:2110.01823  [pdf, other

    cs.CV

    Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

    Authors: Shasha Li, Abhishek Aich, Shitong Zhu, M. Salman Asif, Chengyu Song, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

    Abstract: When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassi… ▽ More

    Submitted 26 October, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021; First two authors contributed equally; Includes Supplementary Material

  44. arXiv:2108.09891  [pdf, other

    cs.CV

    Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency

    Authors: Xueping Wang, Shasha Li, Min Liu, Yaonan Wang, Amit K. Roy-Chowdhury

    Abstract: The success of deep neural networks (DNNs) has promoted the widespread applications of person re-identification (ReID). However, ReID systems inherit the vulnerability of DNNs to malicious attacks of visually inconspicuous adversarial perturbations. Detection of adversarial attacks is, therefore, a fundamental requirement for robust ReID systems. In this work, we propose a Multi-Expert Adversarial… ▽ More

    Submitted 31 March, 2022; v1 submitted 22 August, 2021; originally announced August 2021.

    Comments: Accepted at IEEE ICCV 2021

  45. arXiv:2108.08421  [pdf, other

    cs.CV cs.LG

    Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes

    Authors: Mingjun Yin, Shasha Li, Zikui Cai, Chengyu Song, M. Salman Asif, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

    Abstract: Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks (e.g., by checking the object co-occurrence relationships in complex scenes). However, existing approaches are tied to specific models and do not offer genera… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV'21 Accepted

  46. arXiv:2108.02832  [pdf, other

    eess.IV cs.CV

    Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning

    Authors: Akash Gupta, Padmaja Jonnalagedda, Bir Bhanu, Amit K. Roy-Chowdhury

    Abstract: Most of the existing works in supervised spatio-temporal video super-resolution (STVSR) heavily rely on a large-scale external dataset consisting of paired low-resolution low-frame rate (LR-LFR)and high-resolution high-frame-rate (HR-HFR) videos. Despite their remarkable performance, these methods make a prior assumption that the low-resolution video is obtained by down-scaling the high-resolution… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  47. arXiv:2108.00340  [pdf, other

    cs.CV

    Reconstruction guided Meta-learning for Few Shot Open Set Recognition

    Authors: Sayak Nag, Dripta S. Raychaudhuri, Sujoy Paul, Amit K. Roy-Chowdhury

    Abstract: In many applications, we are constrained to learn classifiers from very limited data (few-shot classification). The task becomes even more challenging if it is also required to identify samples from unknown categories (open-set classification). Learning a good abstraction for a class with very few samples is extremely difficult, especially under open-set settings. As a result, open-set recognition… ▽ More

    Submitted 30 September, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: Accepted for publication in IEEE Transactions in Pattern Analysis and Machine Intelligence (TPAMI)

  48. arXiv:2107.14368  [pdf, other

    eess.IV cs.LG

    Deep Quantized Representation for Enhanced Reconstruction

    Authors: Akash Gupta, Abhishek Aich, Kevin Rodriguez, G. Venugopala Reddy, Amit K. Roy-Chowdhury

    Abstract: While machine learning approaches have shown remarkable performance in biomedical image analysis, most of these methods rely on high-quality and accurate imaging data. However, collecting such data requires intensive and careful manual effort. One of the major challenges in imaging the Shoot Apical Meristem (SAM) of Arabidopsis thaliana, is that the deeper slices in the z-stack suffer from differe… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: Accepted to ISBI Workshop, 2020

  49. arXiv:2107.11878  [pdf, other

    cs.CV

    Spatio-Temporal Representation Factorization for Video-based Person Re-Identification

    Authors: Abhishek Aich, Meng Zheng, Srikrishna Karanam, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu

    Abstract: Despite much recent progress in video-based person re-identification (re-ID), the current state-of-the-art still suffers from common real-world challenges such as appearance similarity among various people, occlusions, and frame misalignment. To alleviate these problems, we propose Spatio-Temporal Representation Factorization (STRF), a flexible new computational unit that can be used in conjunctio… ▽ More

    Submitted 14 August, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: Accepted at IEEE ICCV 2021, Includes Supplementary Material

  50. arXiv:2105.10037  [pdf, other

    cs.LG cs.AI

    Cross-domain Imitation from Observations

    Authors: Dripta S. Raychaudhuri, Sujoy Paul, Jeroen van Baar, Amit K. Roy-Chowdhury

    Abstract: Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at ICML 2021 as a long presentation