Skip to main content

Showing 1–23 of 23 results for author: Zaheer, M Z

.
  1. arXiv:2412.02366  [pdf, other

    cs.CV

    GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing

    Authors: Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar, Naveed Akhtar

    Abstract: Data augmentation is widely used to enhance generalization in visual classification tasks. However, traditional methods struggle when source and target domains differ, as in domain adaptation, due to their inability to address domain gaps. This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach that enhances both in-domain and cross-domain image classifica… ▽ More

    Submitted 5 December, 2024; v1 submitted 3 December, 2024; originally announced December 2024.

    Comments: https://diffusemix.github.io/

  2. arXiv:2408.17059  [pdf, ps, other

    cs.CV cs.AI cs.LG

    A Survey of the Self Supervised Learning Mechanisms for Vision Transformers

    Authors: Asifullah Khan, Anabia Sohail, Mustansar Fiaz, Mehdi Hassan, Tariq Habib Afridi, Sibghat Ullah Marwat, Farzeen Munir, Safdar Ali, Hannan Naseem, Muhammad Zaigham Zaheer, Kamran Ali, Tangina Sultana, Ziaurrehman Tanoli, Naeem Akhter

    Abstract: Vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated labels to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challeng… ▽ More

    Submitted 10 June, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 40 Pages, 4 Figures, 7 Tables

  3. arXiv:2408.07445  [pdf, other

    cs.CV

    Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Hassan Sajjad, Tom De Schepper, Markus Schedl

    Abstract: Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to t… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  4. arXiv:2407.16243  [pdf, other

    cs.CV

    Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities

    Authors: Muhammad Irzam Liaqat, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Saad Saeed, Hassan Sajjad, Tom De Schepper, Karthik Nandakumar, Muhammad Haris Khan Markus Schedl

    Abstract: Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific streams making the models reliant on the availability of a complete set of modalities. In… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  5. arXiv:2405.14881  [pdf, other

    cs.CV

    DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

    Authors: Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar

    Abstract: Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resu… ▽ More

    Submitted 5 April, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024

  6. Exploiting Autoencoder's Weakness to Generate Pseudo Anomalies

    Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Djamila Aouada, Seung-Ik Lee

    Abstract: Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary to the expectation, anomalous data is often well re… ▽ More

    Submitted 17 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: SharedIt link: https://rdcu.be/dGOrh

    Journal ref: Neural Computing and Applications, pp.1-17 (2024)

  7. arXiv:2404.09342  [pdf, other

    cs.CV cs.SD eess.AS

    Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

    Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACM Multimedia Conference - Grand Challenge

  8. arXiv:2404.00847  [pdf, other

    cs.CV cs.LG

    Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

    Authors: Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan, Karthik Nandakumar

    Abstract: Unsupervised (US) video anomaly detection (VAD) in surveillance applications is gaining more popularity recently due to its practical real-world applications. As surveillance videos are privacy sensitive and the availability of large-scale video data may enable better US-VAD systems, collaborative learning can be highly rewarding in this setting. However, due to the extremely challenging nature of… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024

  9. arXiv:2403.16270  [pdf, other

    cs.CV

    Constricting Normal Latent Space for Anomaly Detection with Normal-only Training Data

    Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Seung-Ik Lee

    Abstract: In order to devise an anomaly detection model using only normal training data, an autoencoder (AE) is typically trained to reconstruct the data. As a result, the AE can extract normal representations in its latent space. During test time, since AE is not trained using real anomalies, it is expected to poorly reconstruct the anomalous data. However, several researchers have observed that it is not… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: ICLR Workshop 2024 (PML4LRS)

  10. arXiv:2308.01966  [pdf, other

    cs.MM cs.CL cs.LG cs.SD eess.AS

    DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

    Authors: Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim

    Abstract: Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling an… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: Accepted in ACMM Grand Challenge

  11. PseudoBound: Limiting the anomaly reconstruction capability of one-class classifiers using pseudo anomalies

    Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Seung-Ik Lee

    Abstract: Due to the rarity of anomalous events, video anomaly detection is typically approached as one-class classification (OCC) problem. Typically in OCC, an autoencoder (AE) is trained to reconstruct the normal only training data with the expectation that, in test time, it can poorly reconstruct the anomalous data. However, previous studies have shown that, even trained with only normal data, AEs can of… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Journal ref: Marcella Astrid, Muhammad Zaigham Zaheer, and Seung-Ik Lee. "PseudoBound: Limiting the Anomaly Reconstruction Capability of One-Class Classifiers Using Pseudo Anomalies". In: Neurocomputing 534 (May 14, 2023), pp. 147-160

  12. arXiv:2303.06129  [pdf, other

    cs.CV

    Single-branch Network for Multimodal Training

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, Arif Mahmood

    Abstract: With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each mod… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  13. arXiv:2210.11974  [pdf, other

    cs.CV

    Face Pyramid Vision Transformer

    Authors: Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood

    Abstract: A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted in BMVC 2022

  14. arXiv:2203.13716  [pdf, other

    cs.CV

    Stabilizing Adversarially Learned One-Class Novelty Detection Using Pseudo Anomalies

    Authors: Muhammad Zaigham Zaheer, Jin Ha Lee, Arif Mahmood, Marcella Astrid, Seung-Ik Lee

    Abstract: Recently, anomaly scores have been formulated using reconstruction loss of the adversarially learned generators and/or classification loss of discriminators. Unavailability of anomaly examples in the training data makes optimization of such networks challenging. Attributed to the adversarial training, performance of such models fluctuates drastically with each training step, making it difficult to… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: This work has been submitted to the IEEE Transactions on Image Processing for possible publication

  15. arXiv:2203.13704  [pdf, other

    cs.CV

    Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos

    Authors: Muhammad Zaigham Zaheer, Arif Mahmood, Marcella Astrid, Seung-Ik Lee

    Abstract: Formulating learning systems for the detection of real-world anomalous events using only video-level labels is a challenging task mainly due to the presence of noisy labels as well as the rare occurrence of anomalous events in the training data. We propose a weakly supervised anomaly detection system which has multiple contributions including a random batch selection mechanism to reduce inter-batc… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: This work has been submitted to the IEEE Transactions on Neural Networks and Learning Systems (TNNLS) for possible publication

  16. arXiv:2203.03962  [pdf, other

    cs.CV

    Generative Cooperative Learning for Unsupervised Video Anomaly Detection

    Authors: Muhammad Zaigham Zaheer, Arif Mahmood, Muhammad Haris Khan, Mattia Segu, Fisher Yu, Seung-Ik Lee

    Abstract: Video anomaly detection is well investigated in weakly-supervised and one-class classification (OCC) settings. However, unsupervised video anomaly detection methods are quite sparse, likely because anomalies are less frequent in occurrence and usually not well-defined, which when coupled with the absence of ground truth supervision, could adversely affect the performance of the learning algorithms… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted to the Conference on Computer Vision and Pattern Recognition CVPR 2022

  17. arXiv:2110.09768  [pdf, other

    cs.CV

    Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection

    Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Seung-Ik Lee

    Abstract: Due to the limited availability of anomaly examples, video anomaly detection is often seen as one-class classification (OCC) problem. A popular way to tackle this problem is by utilizing an autoencoder (AE) trained only on normal data. At test time, the AE is then expected to reconstruct the normal input well while reconstructing the anomalies poorly. However, several studies show that, even with… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Published at ICCV Workshops 2021. https://openaccess.thecvf.com/content/ICCV2021W/RSLCV/html/Astrid_Synthetic_Temporal_Anomaly_Guided_End-to-End_Video_Anomaly_Detection_ICCVW_2021_paper.html

  18. arXiv:2110.09742  [pdf, other

    cs.CV

    Learning Not to Reconstruct Anomalies

    Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Jae-Yeong Lee, Seung-Ik Lee

    Abstract: Video anomaly detection is often seen as one-class classification (OCC) problem due to the limited availability of anomaly examples. Typically, to tackle this problem, an autoencoder (AE) is trained to reconstruct the input with training set consisting only of normal data. At test time, the AE is then expected to well reconstruct the normal data while poorly reconstructing the anomalous data. Howe… ▽ More

    Submitted 24 October, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted in BMVC 2021

  19. arXiv:2105.11058  [pdf, other

    cs.CV

    Deep Visual Anomaly detection with Negative Learning

    Authors: Jin-Ha Lee, Marcella Astrid, Muhammad Zaigham Zaheer, Seung-Ik Lee

    Abstract: With the increase in the learning capability of deep convolution-based architectures, various applications of such models have been proposed over time. In the field of anomaly detection, improvements in deep learning opened new prospects of exploration for the researchers whom tried to automate the labor-intensive features of data collection. First, in terms of data collection, it is impossible to… ▽ More

    Submitted 23 May, 2021; originally announced May 2021.

  20. arXiv:2104.14770  [pdf, other

    cs.CV

    Cleaning Label Noise with Clusters for Minimally Supervised Anomaly Detection

    Authors: Muhammad Zaigham Zaheer, Jin-ha Lee, Marcella Astrid, Arif Mahmood, Seung-Ik Lee

    Abstract: Learning to detect real-world anomalous events using video-level annotations is a difficult task mainly because of the noise present in labels. An anomalous labelled video may actually contain anomaly only in a short duration while the rest of the video can be normal. In the current work, we formulate a weakly supervised anomaly detection method that is trained using only video-level labels. To th… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: Presented in the CVPR20 Workshop Learning from Unlabeled Videos. An archival version of this research work, published in SPL, can be accessed at: https://ieeexplore.ieee.org/document/9204830. arXiv admin note: substantial text overlap with arXiv:2008.11887

    Journal ref: Computer Vision and Pattern Recognition Workshops (2020)

  21. arXiv:2011.12077  [pdf, other

    cs.CV cs.AI

    CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection

    Authors: Muhammad Zaigham Zaheer, Arif Mahmood, Marcella Astrid, Seung-Ik Lee

    Abstract: Learning to detect real-world anomalous events through video-level labels is a challenging task due to the rare occurrence of anomalies as well as noise in the labels. In this work, we propose a weakly supervised anomaly detection method which has manifold contributions including1) a random batch based training procedure to reduce inter-batch correlation, 2) a normalcy suppression mechanism to min… ▽ More

    Submitted 4 August, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Presented in the European Conference on Computer Vision ECCV 2020. (Changes from actual paper: 1) Recently published methods have been added in ShanghaiTech and UCF Crime comparison tabs. 2) Due to some error in arxiv compilation, few references are exceeding the paragraph. Also, word 'normalcy' in the title is misspelling despite being correct in the code. (Contents are intact)

  22. A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels

    Authors: Muhammad Zaigham Zaheer, Arif Mahmood, Hochul Shin, Seung-Ik Lee

    Abstract: Anomalous event detection in surveillance videos is a challenging and practical research problem among image and video processing community. Compared to the frame-level annotations of anomalous events, obtaining video-level annotations is quite fast and cheap though such high-level labels may contain significant noise. More specifically, an anomalous labeled video may actually contain anomaly only… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: Accepted to the IEEE Signal Processing Letters Journal

  23. arXiv:2004.07657  [pdf, other

    cs.CV

    Old is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm

    Authors: Muhammad Zaigham Zaheer, Jin-ha Lee, Marcella Astrid, Seung-Ik Lee

    Abstract: A popular method for anomaly detection is to use the generator of an adversarial network to formulate anomaly scores over reconstruction loss of input. Due to the rare occurrence of anomalies, optimizing such networks can be a cumbersome task. Another possible approach is to use both generator and discriminator for anomaly detection. However, attributed to the involvement of adversarial training,… ▽ More

    Submitted 19 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted at the Conference on Computer Vision and Pattern Recognition CVPR 2020. http://openaccess.thecvf.com/content_CVPR_2020/html/Zaheer_Old_Is_Gold_Redefining_the_Adversarially_Learned_One-Class_Classifier_Training_CVPR_2020_paper.html