Skip to main content

Showing 1–15 of 15 results for author: Behzad, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01203  [pdf, ps, other

    cs.CV

    Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition

    Authors: Muzammil Behzad

    Abstract: Facial expression recognition (FER) is a fundamental task in affective computing with applications in human-computer interaction, mental health analysis, and behavioral understanding. In this paper, we propose SMILE-VLM, a self-supervised vision-language model for 3D/4D FER that unifies multiview visual representation learning with natural language supervision. SMILE-VLM learns robust, semanticall… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  2. arXiv:2505.23358  [pdf, ps, other

    cs.CV

    Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model

    Authors: Reem AlJunaid, Muzammil Behzad

    Abstract: Generating informative and knowledge-rich image captions remains a challenge for many existing captioning models, which often produce generic descriptions that lack specificity and contextual depth. To address this limitation, we propose KRCapVLM, a knowledge replay-based novel image captioning framework using vision-language model. We incorporate beam search decoding to generate more diverse and… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  3. arXiv:2505.19895  [pdf, ps, other

    cs.CV

    Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement

    Authors: Afrah Shaahid, Muzammil Behzad

    Abstract: Underwater images are often affected by complex degradations such as light absorption, scattering, color casts, and artifacts, making enhancement critical for effective object detection, recognition, and scene understanding in aquatic environments. Existing methods, especially diffusion-based approaches, typically rely on synthetic paired datasets due to the scarcity of real underwater references,… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2505.19242  [pdf, ps, other

    cs.CV

    Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model

    Authors: Alaa Dalaq, Muzammil Behzad

    Abstract: Image segmentation is a fundamental task in computer vision, aimed at partitioning an image into semantically meaningful regions. Referring image segmentation extends this task by using natural language expressions to localize specific objects, requiring effective integration of visual and linguistic information. In this work, we propose SegVLM, a vision-language model that incorporates architectu… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  5. arXiv:2505.10672  [pdf, ps, other

    eess.IV cs.CV

    MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation

    Authors: Hania Ghouse, Muzammil Behzad

    Abstract: Efficient and accurate multi-organ segmentation from abdominal CT volumes is a fundamental challenge in medical image analysis. Existing 3D segmentation approaches are computationally and memory intensive, often processing entire volumes that contain many anatomically irrelevant slices. Meanwhile, 2D methods suffer from class imbalance and lack cross-view contextual awareness. To address these lim… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.09336  [pdf, ps, other

    cs.CV

    Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition

    Authors: Muzammil Behzad

    Abstract: In this paper, we introduce MultiviewVLM, a vision-language model designed for unsupervised contrastive multiview representation learning of facial emotions from 3D/4D data. Our architecture integrates pseudo-labels derived from generated textual prompts to guide implicit alignment of emotional semantics. To capture shared information across multi-views, we propose a joint embedding space that ali… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2504.19739  [pdf, other

    cs.CV

    Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model

    Authors: Muzammil Behzad, Guoying Zhao

    Abstract: In this paper, we introduce AffectVLM, a vision-language model designed to integrate multiviews for a semantically rich and visually comprehensive understanding of facial emotions from 3D/4D data. To effectively capture visual features, we propose a joint representation learning framework paired with a novel gradient-friendly loss function that accelerates model convergence towards optimal feature… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  8. arXiv:2002.03157  [pdf, other

    cs.CV

    Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition

    Authors: Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

    Abstract: In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing randomized channel concatenation. Encoded in the given 3D landmar… ▽ More

    Submitted 19 August, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

  9. arXiv:1910.05445  [pdf, other

    cs.CV

    Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition

    Authors: Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

    Abstract: We propose a novel landmarks-assisted collaborative end-to-end deep framework for automatic 4D FER. Using 4D face scan data, we calculate its various geometrical images, and afterwards use rank pooling to generate their dynamic images encapsulating important facial muscle movements over time. As well, the given 3D landmarks are projected on a 2D plane as binary images and convolutional layers are… ▽ More

    Submitted 7 February, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: Published in 15th IEEE International Conference on Automatic Face and Gesture Recognition

  10. arXiv:1905.02319  [pdf, other

    cs.CV

    Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network

    Authors: Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

    Abstract: This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling… ▽ More

    Submitted 7 February, 2020; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Published in the 30th British Machine Vision Conference (BMVC) 2019

  11. arXiv:1806.09980  [pdf, other

    eess.SP cs.CV cs.NI eess.IV

    Toward Performance Optimization in IoT-based Next-Gen Wireless Sensor Networks

    Authors: Muzammil Behzad, Manal Abdullah, Muhammad Talal Hassan, Yao Ge, Mahmood Ashraf Khan

    Abstract: In this paper, we propose a novel framework for performance optimization in Internet of Things (IoT)-based next-generation wireless sensor networks. In particular, a computationally-convenient system is presented to combat two major research problems in sensor networks. First is the conventionally-tackled resource optimization problem which triggers the drainage of battery at a faster rate within… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

    Comments: 45 pages, 22 figures, pending article. arXiv admin note: substantial text overlap with arXiv:1712.04259

  12. arXiv:1805.00472  [pdf, other

    cs.CV

    Image Denoising via Collaborative Dual-Domain Patch Filtering

    Authors: Muzammil Behzad

    Abstract: In this paper, we propose a novel image denoising algorithm exploiting features from both spatial as well as transformed domain. We implement intensity-invariance based improved grouping for collaborative support-agnostic sparse reconstruction. For collaboration firstly, we stack similar-structured patches via intensity-invariant correlation measure. The grouped patches collaborate to yield desira… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: 14 pages, 14 figures, 4 tables, article pending

  13. arXiv:1804.00898  [pdf

    cs.NI

    M-BEHZAD: Minimum distance Based Energy efficiency using Hemisphere Zoning with Advanced Divide-and-Rule Scheme for Wireless Sensor Networks

    Authors: Muzammil Behzad

    Abstract: Routing Protocols are engaged in a vigorous fashion to boost up energy efficiency in WSNs. In this paper, we propose a novel routing protocol; Minimum distance Based Energy efficiency using Hemisphere Zoning with Advanced Divide-and-Rule scheme (M-BEHZAD), to maximize network lifespan, throughput and stability period of the sensors deployed in an un-attended network zone. To accomplish these objec… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

    Comments: Presented in 6th Annual Students Scientific Forum by King Fahd University of Petroleum and Minerals, Saudi Arabia

  14. arXiv:1712.04259  [pdf, other

    cs.NI

    Layer-Adaptive Communication and Collaborative Transformed-Domain Representations for Performance Optimization in WSNs

    Authors: Muzammil Behzad, Manal Abdullah, Muhammad Talal Hassan, Yao Ge, Mahmood Ashraf Khan

    Abstract: In this paper, we combat the problem of performance optimization in wireless sensor networks. Specifically, a novel framework is proposed to handle two major research issues. Firstly, we optimize the utilization of resources available to various nodes at hand. This is achieved via proposed optimal network clustering enriched with layer-adaptive 3-tier communication mechanism to diminish energy hol… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: Submitted to the 32nd IEEE International Conference on Advanced Information Networking and Applications (IEEE AINA-2018)

  15. arXiv:1609.02932  [pdf, other

    cs.CV

    Image Denoising Via Collaborative Support-Agnostic Recovery

    Authors: Muzammil Behzad, Mudassir Masood, Tarig Ballal, Maha Shadaydeh, Tareq Y. Al-Naffouri

    Abstract: In this paper, we propose a novel image denoising algorithm using collaborative support-agnostic sparse reconstruction. An observed image is first divided into patches. Similarly structured patches are grouped together to be utilized for collaborative processing. In the proposed collaborative schemes, similar patches are assumed to share the same support taps. For sparse reconstruction, the likeli… ▽ More

    Submitted 9 September, 2016; originally announced September 2016.