Skip to main content

Showing 1–50 of 79 results for author: Hayat, M

.
  1. arXiv:2506.00477  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Flashbacks to Harmonize Stability and Plasticity in Continual Learning

    Authors: Leila Mahmoodi, Peyman Moghadam, Munawar Hayat, Christian Simon, Mehrtash Harandi

    Abstract: We introduce Flashback Learning (FL), a novel method designed to harmonize the stability and plasticity of models in Continual Learning (CL). Unlike prior approaches that primarily focus on regularizing model updates to preserve old information while learning new concepts, FL explicitly balances this trade-off through a bidirectional form of regularization. This approach effectively guides the mod… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Manuscript submitted to Neural Networks (Elsevier) in August 2024; and accepted in May 2025 for publication. This version is author-accepted manuscript before copyediting and typesetting. The codes of this article will be available at https://github.com/csiro-robotics/Flashback-Learning

  2. arXiv:2504.13206  [pdf, other

    cs.GR

    DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

    Authors: Aniket Roy, Shubhankar Borse, Shreya Kadambi, Debasmit Das, Shweta Mahajan, Risheek Garrepalli, Hyojin Park, Ankita Nayak, Rama Chellappa, Munawar Hayat, Fatih Porikli

    Abstract: We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertw… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  3. arXiv:2504.11491  [pdf, other

    eess.IV cs.CV cs.LG cs.MM

    Attention GhostUNet++: Enhanced Segmentation of Adipose Tissue and Liver in CT Images

    Authors: Mansoor Hayat, Supavadee Aramvith, Subrata Bhattacharjee, Nouman Ahmad

    Abstract: Accurate segmentation of abdominal adipose tissue, including subcutaneous (SAT) and visceral adipose tissue (VAT), along with liver segmentation, is essential for understanding body composition and associated health risks such as type 2 diabetes and cardiovascular disease. This study proposes Attention GhostUNet++, a novel deep learning model incorporating Channel, Spatial, and Depth Attention mec… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted for presentation in the 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2025)

  4. arXiv:2503.18244  [pdf, other

    cs.CV

    CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

    Authors: Jungsoo Lee, Debasmit Das, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli

    Abstract: We propose a novel knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models (e.g., MobileNetV3). Despite recent advancements in LVFMs, such as DINOv2 and CLIP, their potential in knowledge distillation for enhancing edge models remains underexplored. While knowledge distillation is a promising approach fo… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  5. arXiv:2502.19673  [pdf, other

    cs.CV

    SubZero: Composing Subject, Style, and Action via Zero-Shot Personalization

    Authors: Shubhankar Borse, Kartikeya Bhardwaj, Mohammad Reza Karimi Dastjerdi, Hyojin Park, Shreya Kadambi, Shobitha Shivakumar, Prathamesh Mandke, Ankita Nayak, Harris Teague, Munawar Hayat, Fatih Porikli

    Abstract: Diffusion models are increasingly popular for generative tasks, including personalized composition of subjects and styles. While diffusion models can generate user-specified subjects performing text-guided actions in custom styles, they require fine-tuning and are not feasible for personalization on mobile devices. Hence, tuning-free personalization methods such as IP-Adapters have progressively g… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  6. arXiv:2412.17040  [pdf, other

    cs.LG

    HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories

    Authors: Eric Hedlin, Munawar Hayat, Fatih Porikli, Kwang Moo Yi, Shweta Mahajan

    Abstract: To efficiently adapt large models or to train generative models of neural representations, Hypernetworks have drawn interest. While hypernetworks work well, training them is cumbersome, and often requires ground truth optimized weights for each sample. However, obtaining each of these weights is a training problem of its own-one needs to train, e.g., adaptation weights or even an entire neural fie… ▽ More

    Submitted 19 May, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

  7. arXiv:2410.11971  [pdf, other

    cs.LG cs.AI cs.CV

    DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning

    Authors: Risheek Garrepalli, Shweta Mahajan, Munawar Hayat, Fatih Porikli

    Abstract: Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance… ▽ More

    Submitted 28 March, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  8. arXiv:2409.06991  [pdf, other

    cs.CV

    1M-Deepfakes Detection Challenge

    Authors: Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

    Abstract: The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This ch… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: ACM MM 2024. Challenge webpage: https://deepfakes1m.github.io/

  9. arXiv:2406.08798  [pdf, other

    cs.CV

    FouRA: Fourier Low Rank Adaptation

    Authors: Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Esteves, Munawar Hayat, Fatih Porikli

    Abstract: While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples. This effect becomes more pronounced at higher values of adapter strength and for adapters with higher ranks which are fine-tuned on smaller datasets… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models

    Authors: Parul Gupta, Munawar Hayat, Abhinav Dhall, Thanh-Toan Do

    Abstract: Few-shot image synthesis entails generating diverse and realistic images of novel categories using only a few example images. While multiple recent efforts in this direction have achieved impressive results, the existing approaches are dependent only upon the few novel samples available at test time in order to generate new images, which restricts the diversity of the generated images. To overcome… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.13330  [pdf, other

    eess.IV cs.CV

    SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation

    Authors: Mansoor Hayat, Supavadee Aramvith, Titipat Achakulvisut

    Abstract: SEGSRNet addresses the challenge of precisely identifying surgical instruments in low-resolution stereo endoscopic images, a common issue in medical imaging and robotic surgery. Our innovative framework enhances image clarity and segmentation accuracy by applying state-of-the-art super-resolution techniques before segmentation. This ensures higher-quality inputs for more precise segmentation. SEGS… ▽ More

    Submitted 26 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Paper accepted for Presentation in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, Florida, USA (Camera Ready Version)

  12. arXiv:2403.18092  [pdf, other

    cs.CV

    OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

    Authors: Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

    Abstract: The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongsi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  13. arXiv:2403.09620  [pdf, other

    cs.CV

    PosSAM: Panoptic Open-vocabulary Segment Anything

    Authors: Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli

    Abstract: In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing appr… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  14. arXiv:2401.05779  [pdf, other

    cs.CV

    Erasing Undesirable Influence in Diffusion Models

    Authors: Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi

    Abstract: Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. Although various techniques have been proposed to mitigate unwanted influences in diffusion models while preserving overall performance, achieving a balance between these goals remains challenging. In this work, we introduce EraseDiff, an… ▽ More

    Submitted 20 November, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Diffusion Model, Machine Unlearning

  15. DiffAugment: Diffusion based Long-Tailed Visual Relationship Recognition

    Authors: Parul Gupta, Tuan Nguyen, Abhinav Dhall, Munawar Hayat, Trung Le, Thanh-Toan Do

    Abstract: The task of Visual Relationship Recognition (VRR) aims to identify relationships between two interacting objects in an image and is particularly challenging due to the widely-spread and highly imbalanced distribution of <subject, relation, object> triplets. To overcome the resultant performance bias in existing VRR approaches, we introduce DiffAugment -- a method which first augments the tail clas… ▽ More

    Submitted 1 March, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  16. arXiv:2311.15308  [pdf, other

    cs.CV

    AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

    Authors: Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov

    Abstract: The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In t… ▽ More

    Submitted 29 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM MM 2024

  17. Combined Channel and Spatial Attention-based Stereo Endoscopic Image Super-Resolution

    Authors: Mansoor Hayat, Supavadee Armvith, Titipat Achakulvisut

    Abstract: Stereo Imaging technology integration into medical diagnostics and surgeries brings a great revolution in the field of medical sciences. Now, surgeons and physicians have better insight into the anatomy of patients' organs. Like other technologies, stereo cameras have limitations, e.g., low resolution (LR) and blurry output images. Currently, most of the proposed techniques for super-resolution fo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Journal ref: TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)

  18. Classification of three-family flavoured DFSZ axion models that have no domain wall problem

    Authors: Peter Cox, Matthew J. Dolan, Maaz Hayat, Andrea Thamm, Raymond R. Volkas

    Abstract: We provide an exhaustive classification of three-family DFSZ axion models that have no cosmological domain wall problem. This result is obtained by making the Peccei-Quinn symmetry flavour dependent in certain specific ways, thus reinforcing a possible connection between the strong CP problem and the flavour puzzle. Known DFSZ flavour variants such as the top-specific model emerge as special cases… ▽ More

    Submitted 5 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 20 pages, 3 tables, v2: references added, journal version

    Journal ref: J. High Energ. Phys. 2024, 11 (2024)

  19. arXiv:2307.08238  [pdf, other

    cs.CV

    Unified Open-Vocabulary Dense Visual Prediction

    Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

    Abstract: In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individually tackle each task. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Comp… ▽ More

    Submitted 18 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  20. arXiv:2307.03339  [pdf, other

    cs.CV

    Open-Vocabulary Object Detection via Scene Graph Discovery

    Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

    Abstract: In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns an… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  21. arXiv:2305.05255  [pdf, other

    cs.HC

    Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit

    Authors: Shreya Ghosh, Zhixi Cai, Parul Gupta, Garima Sharma, Abhinav Dhall, Munawar Hayat, Tom Gedeon

    Abstract: Automatic group emotion recognition plays an important role in understanding complex human-human interaction. This paper introduces, Emolysis, a Python-based, standalone open-source group emotion analysis toolkit for use in different social situations upon getting consent from the users. Given any input video, Emolysis processes synchronized multimodal input and maps it to group level emotion, val… ▽ More

    Submitted 6 August, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted by ACII Demo 2024. Both Shreya Ghosh and Zhixi Cai contributed equally to this research

  22. arXiv:2305.01979  [pdf, other

    cs.CV

    Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

    Authors: Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat

    Abstract: Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of… ▽ More

    Submitted 16 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: The paper is under consideration/review at Computer Vision and Image Understanding Journal

  23. arXiv:2304.05678  [pdf, other

    cs.CV

    Real-time Trajectory-based Social Group Detection

    Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi

    Abstract: Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancemen… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  24. arXiv:2303.13556  [pdf, other

    cs.CV

    ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

    Authors: Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Gholamreza Haffari

    Abstract: Confidence-based pseudo-labeling is among the dominant approaches in semi-supervised learning (SSL). It relies on including high-confidence predictions made on unlabeled data as additional targets to train the model. We propose ProtoCon, a novel SSL method aimed at the less-explored label-scarce SSL where such methods usually underperform. ProtoCon refines the pseudo-labels by leveraging their nea… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR2023 (highlight)

  25. arXiv:2211.06627  [pdf, other

    cs.CV

    MARLIN: Masked Autoencoder for facial video Representation LearnINg

    Authors: Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

    Abstract: This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust… ▽ More

    Submitted 22 March, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  26. arXiv:2210.10317  [pdf, other

    cs.CV

    LAVA: Label-efficient Visual Learning and Adaptation

    Authors: Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Mehrtash Harandi, Gholamreza Haffari

    Abstract: We present LAVA, a simple yet effective method for multi-domain visual transfer learning with limited data. LAVA builds on a few recent innovations to enable adapting to partially labelled datasets with class and domain shifts. First, LAVA learns self-supervised visual representations on the source dataset and ground them using class label semantics to overcome transfer collapse problems associate… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted in WACV2023

  27. arXiv:2210.03114  [pdf, other

    cs.CV

    CLIP model is an Efficient Continual Learner

    Authors: Vishal Thengane, Salman Khan, Munawar Hayat, Fahad Khan

    Abstract: The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  28. arXiv:2209.07704  [pdf, other

    eess.IV cs.CV

    Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation

    Authors: Himashi Peiris, Munawar Hayat, Zhaolin Chen, Gary Egan, Mehrtash Harandi

    Abstract: As intensities of MRI volumes are inconsistent across institutes, it is essential to extract universal features of multi-modal MRIs to precisely segment brain tumors. In this concept, we propose a volumetric vision transformer that follows two windowing strategies in attention for extracting fine features and local distributional smoothness (LDS) during model training inspired by virtual adversari… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  29. arXiv:2209.05724  [pdf, other

    cs.LG cs.CR cs.CV

    Concealing Sensitive Samples against Gradient Leakage in Federated Learning

    Authors: Jing Wu, Munawar Hayat, Mingyi Zhou, Mehrtash Harandi

    Abstract: Federated Learning (FL) is a distributed learning paradigm that enhances users privacy by eliminating the need for clients to share raw, private data with the server. Despite the success, recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users private data via eavesdropping on the shared gradient information. We hypothesize that a key factor in… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: Defence against model inversion attack in federated learning

  30. arXiv:2208.01840  [pdf, other

    cs.CV

    'Labelling the Gaps': A Weakly Supervised Automatic Eye Gaze Estimation

    Authors: Shreya Ghosh, Abhinav Dhall, Jarrod Knibbe, Munawar Hayat

    Abstract: Over the past few years, there has been an increasing interest to interpret gaze direction in an unconstrained environment with limited supervision. Owing to data curation and annotation issues, replicating gaze estimation method to other platforms, such as unconstrained outdoor or AR/VR, might lead to significant drop in performance due to insufficient availability of accurately annotated data fo… ▽ More

    Submitted 12 August, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

  31. arXiv:2207.03048  [pdf, other

    cs.CV

    AV-Gaze: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-Profilic Faces

    Authors: Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe

    Abstract: In challenging real-life conditions such as extreme head-pose, occlusions, and low-resolution images where the visual information fails to estimate visual attention/gaze direction, audio signals could provide important and complementary information. In this paper, we explore if audio-guided coarse head-pose can further enhance visual attention estimation performance for non-prolific faces. Since i… ▽ More

    Submitted 11 August, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

  32. arXiv:2205.07056  [pdf, other

    cs.CV

    Transformer Scale Gate for Semantic Segmentation

    Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

    Abstract: Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation. Existing transformer-based segmentation models combine features across scales without any selection, where features on sub-optimal scales may degrade segmentation outcomes. Leveraging from the inherent properties of Vision Transformers, we propose a simple yet effective module, Transformer Scale… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  33. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  34. arXiv:2205.01649  [pdf, other

    eess.IV cs.CV

    Learning Enriched Features for Fast Image Restoration and Enhancement

    Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

    Abstract: Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based… ▽ More

    Submitted 19 April, 2022; originally announced May 2022.

    Comments: This article supersedes arXiv:2003.06792. Accepted for publication in TPAMI

  35. arXiv:2204.06228  [pdf, other

    cs.CV

    Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

    Authors: Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

    Abstract: Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segme… ▽ More

    Submitted 3 May, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: DICTA 2022

  36. arXiv:2204.00822  [pdf, ps, other

    cs.CV

    Semantic-Aware Domain Generalized Segmentation

    Authors: Duo Peng, Yinjie Lei, Munawar Hayat, Yulan Guo, Wen Li

    Abstract: Deep models trained on source domain lack generalization when evaluated on unseen target domains with different data distributions. The problem becomes even more pronounced when we have no access to target domain samples for adaptation. In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain da… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: 16 pages, 7 figures, accepted at CVPR 2022 (Oral Presentation)

    MSC Class: 68T45 ACM Class: I.2.10; I.4.6

  37. arXiv:2203.16895  [pdf, ps, other

    cs.CV

    Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

    Authors: Zhao Jin, Yinjie Lei, Naveed Akhtar, Haifeng Li, Munawar Hayat

    Abstract: Point cloud scene flow estimation is of practical importance for dynamic scene navigation in autonomous driving. Since scene flow labels are hard to obtain, current methods train their models on synthetic data and transfer them to real scenes. However, large disparities between existing synthetic datasets and real scenes lead to poor model transfer. We make two major contributions to address that.… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  38. arXiv:2201.09873  [pdf, other

    eess.IV cs.CV

    Transformers in Medical Imaging: A Survey

    Authors: Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, Huazhu Fu

    Abstract: Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growin… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 41 pages, \url{https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging}

  39. arXiv:2201.06696  [pdf, other

    cs.CV

    ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

    Authors: Hengcan Shi, Munawar Hayat, Yicheng Wu, Jianfei Cai

    Abstract: Object proposal generation is an important and fundamental task in computer vision. In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. Unlike previous works which require a large number of bounding box annotations and/or can only generate proposals for limited object categories, our ProposalCLIP is able to predict proposals for a large v… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 10 pages, 5 figures

  40. arXiv:2201.06686  [pdf, ps, other

    cs.CV

    Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching

    Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

    Abstract: Referring expression grounding is an important and challenging task in computer vision. To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences. The few existing solutions to unpaired referring grounding are still preliminary, due to the challenges… ▽ More

    Submitted 5 June, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: 9 pages, 7 figures

  41. arXiv:2111.13300  [pdf, other

    eess.IV cs.CV

    A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation

    Authors: Himashi Peiris, Munawar Hayat, Zhaolin Chen, Gary Egan, Mehrtash Harandi

    Abstract: We propose a Transformer architecture for volumetric segmentation, a challenging task that requires keeping a complex balance in encoding local and global spatial cues, and preserving information along all axes of the volume. Encoder of the proposed design benefits from self-attention mechanism to simultaneously encode local and global cues, while the decoder employs a parallel self and cross atte… ▽ More

    Submitted 30 June, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  42. arXiv:2111.09881  [pdf, other

    cs.CV

    Restormer: Efficient Transformer for High-Resolution Image Restoration

    Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

    Abstract: Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shor… ▽ More

    Submitted 11 March, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: Accepted at CVPR 2022. #CVPR2022

  43. arXiv:2110.12100  [pdf, other

    cs.CV

    MTGLS: Multi-Task Gaze Estimation with Limited Supervision

    Authors: Shreya Ghosh, Munawar Hayat, Abhinav Dhall, Jarrod Knibbe

    Abstract: Robust gaze estimation is a challenging task, even for deep CNNs, due to the non-availability of large-scale labeled data. Moreover, gaze annotation is a time-consuming process and requires specialized hardware setups. We propose MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision, which leverages abundantly available non-annotated facial image data. MTGLS distills knowledge fro… ▽ More

    Submitted 13 December, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

  44. arXiv:2108.05479  [pdf, other

    cs.CV

    Automatic Gaze Analysis: A Survey of Deep Learning based Approaches

    Authors: Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, Qiang Ji

    Abstract: Eye gaze analysis is an important research problem in the field of Computer Vision and Human-Computer Interaction. Even with notable progress in the last 10 years, automatic gaze analysis still remains challenging due to the uniqueness of eye appearance, eye-head interplay, occlusion, image quality, and illumination conditions. There are several open questions, including what are the important cue… ▽ More

    Submitted 21 July, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

  45. Modal Decomposition of the Linear Swing Equation in Networks with Symmetries

    Authors: Kshitij Bhatta, Majeed Hayat, Francesco Sorrentino

    Abstract: Symmetries are widespread in physical, technological, biological, and social systems and networks, including power grids. The swing equation is a classic model for the dynamics of powergrid networks. The main goal of this paper is to explain how network symmetries affect the swing equation transient and steady state dynamics. We introduce a modal decomposition that allows us to study transient eff… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 13 Pages including references, 8 Figures. Accepted for publishing in IEEE TNSE

  46. arXiv:2106.07085  [pdf, other

    cs.CV

    Survey: Image Mixing and Deleting for Data Augmentation

    Authors: Humza Naveed, Saeed Anwar, Munawar Hayat, Kashif Javed, Ajmal Mian

    Abstract: Neural networks are prone to overfitting and memorizing data patterns. To avoid over-fitting and enhance their generalization and performance, various methods have been suggested in the literature, including dropout, regularization, label smoothing, etc. One such method is augmentation which introduces different types of corruption in the data to prevent the model from overfitting and to memorize… ▽ More

    Submitted 6 February, 2023; v1 submitted 13 June, 2021; originally announced June 2021.

  47. arXiv:2105.10497  [pdf, other

    cs.CV cs.AI cs.LG

    Intriguing Properties of Vision Transformers

    Authors: Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

    Abstract: Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. These models are based on multi-head self-attention mechanisms that can flexibly attend to a sequence of image patches to encode contextual cues. An important question is how such flexibility in attending image-wide context conditioned on a given patch can facilitate handling nuisances in nat… ▽ More

    Submitted 25 November, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

    Comments: NeurIPS'21 (Spotlight), Code: https://git.io/Js15X

  48. arXiv:2103.14641  [pdf, other

    cs.CV

    On Generating Transferable Targeted Perturbations

    Authors: Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

    Abstract: While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific `targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (\ours). We note that the existing methods are less suitable for this task due to their reli… ▽ More

    Submitted 13 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: ICCV, 2021. Code is available at https://github.com/Muzammal-Naseer/TTP

  49. arXiv:2103.14021  [pdf, other

    cs.CV

    Orthogonal Projection Loss

    Authors: Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep neural networks have achieved remarkable performance on a range of classification tasks, with softmax cross-entropy (CE) loss emerging as the de-facto objective function. The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes. However, this is a relative constraint and does not explicitly force different class fea… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  50. arXiv:2102.02808  [pdf, other

    cs.CV

    Multi-Stage Progressive Image Restoration

    Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

    Abstract: Image restoration tasks demand a complex balance between spatial details and high-level contextualized information while recovering images. In this paper, we propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall r… ▽ More

    Submitted 16 March, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Accepted at CVPR 2021