Search | arXiv e-print repository

PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition

Authors: Ibtissam Saadi, Abdenour Hadid, Douglas W. Cunningham, Abdelmalik Taleb-Ahmed, Yassin El Hillali

Abstract: Vision-Language Models (VLMs) like CLIP offer promising solutions for Dynamic Facial Expression Recognition (DFER) but face challenges such as inefficient full fine-tuning, high complexity, and poor alignment between textual and visual representations. Additionally, existing methods struggle with ineffective temporal modeling. To address these issues, we propose PE-CLIP, a parameter-efficient fine… ▽ More Vision-Language Models (VLMs) like CLIP offer promising solutions for Dynamic Facial Expression Recognition (DFER) but face challenges such as inefficient full fine-tuning, high complexity, and poor alignment between textual and visual representations. Additionally, existing methods struggle with ineffective temporal modeling. To address these issues, we propose PE-CLIP, a parameter-efficient fine-tuning (PEFT) framework that adapts CLIP for DFER while significantly reducing trainable parameters while maintaining high accuracy. PE-CLIP introduces two specialized adapters: a Temporal Dynamic Adapter (TDA) and a Shared Adapter (ShA). The TDA is a GRU-based module with dynamic scaling that captures sequential dependencies while emphasizing informative temporal features and suppressing irrelevant variations. The ShA is a lightweight adapter that refines representations within both textual and visual encoders, ensuring consistency and efficiency. Additionally, we integrate Multi-modal Prompt Learning (MaPLe), introducing learnable prompts for visual and action unit-based textual inputs, enhancing semantic alignment between modalities and enabling efficient CLIP adaptation for dynamic tasks. We evaluate PE-CLIP on two benchmark datasets, DFEW and FERV39K, achieving competitive performance compared to state-of-the-art methods while requiring fewer trainable parameters. By balancing efficiency and accuracy, PE-CLIP sets a new benchmark in resource-efficient DFER. The source code of the proposed PE-CLIP will be publicly available at https://github.com/Ibtissam-SAADI/PE-CLIP . △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2409.03438 [pdf, other]

Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression

Authors: Ibtissam Saadi, Douglas W. Cunningham, Taleb-ahmed Abdelmalik, Abdenour Hadid, Yassin El Hillali

Abstract: Existing methods for driver facial expression recognition (DFER) are often computationally intensive, rendering them unsuitable for real-time applications. In this work, we introduce a novel transfer learning-based dual architecture, named ShuffViT-DFER, which elegantly combines computational efficiency and accuracy. This is achieved by harnessing the strengths of two lightweight and efficient mod… ▽ More Existing methods for driver facial expression recognition (DFER) are often computationally intensive, rendering them unsuitable for real-time applications. In this work, we introduce a novel transfer learning-based dual architecture, named ShuffViT-DFER, which elegantly combines computational efficiency and accuracy. This is achieved by harnessing the strengths of two lightweight and efficient models using convolutional neural network (CNN) and vision transformers (ViT). We efficiently fuse the extracted features to enhance the performance of the model in accurately recognizing the facial expressions of the driver. Our experimental results on two benchmarking and public datasets, KMU-FED and KDEF, highlight the validity of our proposed method for real-time application with superior performance when compared to state-of-the-art methods. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: Accepted for publication in The 6th IEEE International Conference on Artificial Intelligence Circuits and Systems (IEEE AICAS 2024), 5 pages, 3 figures

arXiv:2308.04150 [pdf, other]

doi 10.1109/TVCG.2023.3326523

Perception of Line Attributes for Visualization

Authors: Anna Sterzik, Nils Lichtenberg, Jana Wilms, Michael Krone, Douglas W. Cunningham, Kai Lawonn

Abstract: Line attributes such as width and dashing are commonly used to encode information. However, many questions on the perception of line attributes remain, such as how many levels of attribute variation can be distinguished or which line attributes are the preferred choices for which tasks. We conducted three studies to develop guidelines for using stylized lines to encode scalar data. In our first st… ▽ More Line attributes such as width and dashing are commonly used to encode information. However, many questions on the perception of line attributes remain, such as how many levels of attribute variation can be distinguished or which line attributes are the preferred choices for which tasks. We conducted three studies to develop guidelines for using stylized lines to encode scalar data. In our first study, participants drew stylized lines to encode uncertainty information. Uncertainty is usually visualized alongside other data. Therefore, alternative visual channels are important for the visualization of uncertainty. Additionally, uncertainty -- e.g., in weather forecasts -- is a familiar topic to most people. Thus, we picked it for our visualization scenarios in study 1. We used the results of our study to determine the most common line attributes for drawing uncertainty: Dashing, luminance, wave amplitude, and width. While those line attributes were especially common for drawing uncertainty, they are also commonly used in other areas. In studies 2 and 3, we investigated the discriminability of the line attributes determined in study 1. Studies 2 and 3 did not require specific application areas; thus, their results apply to visualizing any scalar data in line attributes. We evaluated the just-noticeable differences (JND) and derived recommendations for perceptually distinct line levels. We found that participants could discriminate considerably more levels for the line attribute width than for wave amplitude, dashing, or luminance. △ Less

Submitted 8 November, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 11 pages, 9 figures, to be published in IEEE Transactions on Visualization and Computer Graphics The supplemental material for this paper is available at https://www.doi.org/10.17605/OSF.IO/7SCF8

arXiv:2308.03644 [pdf, other]

doi 10.1109/TVCG.2023.3326574

Perceptually Uniform Construction of Illustrative Textures

Authors: Anna Sterzik, Monique Meuschke, Douglas W. Cunningham, Kai Lawonn

Abstract: Illustrative textures, such as stippling or hatching, were predominantly used as an alternative to conventional Phong rendering. Recently, the potential of encoding information on surfaces or maps using different densities has also been recognized. This has the significant advantage that additional color can be used as another visual channel and the illustrative textures can then be overlaid. Effe… ▽ More Illustrative textures, such as stippling or hatching, were predominantly used as an alternative to conventional Phong rendering. Recently, the potential of encoding information on surfaces or maps using different densities has also been recognized. This has the significant advantage that additional color can be used as another visual channel and the illustrative textures can then be overlaid. Effectively, it is thus possible to display multiple information, such as two different scalar fields on surfaces simultaneously. In previous work, these textures were manually generated and the choice of density was unempirically determined. Here, we first want to determine and understand the perceptual space of illustrative textures. We chose a succession of simplices with increasing dimensions as primitives for our textures: Dots, lines, and triangles. Thus, we explore the texture types of stippling, hatching, and triangles. We create a range of textures by sampling the density space uniformly. Then, we conduct three perceptual studies in which the participants performed pairwise comparisons for each texture type. We use multidimensional scaling (MDS) to analyze the perceptual spaces per category. The perception of stippling and triangles seems relatively similar. Both are adequately described by a 1D manifold in 2D space. The perceptual space of hatching consists of two main clusters: Crosshatched textures, and textures with only one hatching direction. However, the perception of hatching textures with only one hatching direction is similar to the perception of stippling and triangles. Based on our findings, we construct perceptually uniform illustrative textures. Afterwards, we provide concrete application examples for the constructed textures. △ Less

Submitted 8 November, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: 11 pages, 15 figures, to be published in IEEE Transactions on Visualization and Computer Graphics The supplementary material is available at https://www.doi.org/10.17605/OSF.IO/43C6V

Showing 1–4 of 4 results for author: Cunningham, D W