-
PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition
Authors:
Ibtissam Saadi,
Abdenour Hadid,
Douglas W. Cunningham,
Abdelmalik Taleb-Ahmed,
Yassin El Hillali
Abstract:
Vision-Language Models (VLMs) like CLIP offer promising solutions for Dynamic Facial Expression Recognition (DFER) but face challenges such as inefficient full fine-tuning, high complexity, and poor alignment between textual and visual representations. Additionally, existing methods struggle with ineffective temporal modeling. To address these issues, we propose PE-CLIP, a parameter-efficient fine…
▽ More
Vision-Language Models (VLMs) like CLIP offer promising solutions for Dynamic Facial Expression Recognition (DFER) but face challenges such as inefficient full fine-tuning, high complexity, and poor alignment between textual and visual representations. Additionally, existing methods struggle with ineffective temporal modeling. To address these issues, we propose PE-CLIP, a parameter-efficient fine-tuning (PEFT) framework that adapts CLIP for DFER while significantly reducing trainable parameters while maintaining high accuracy. PE-CLIP introduces two specialized adapters: a Temporal Dynamic Adapter (TDA) and a Shared Adapter (ShA). The TDA is a GRU-based module with dynamic scaling that captures sequential dependencies while emphasizing informative temporal features and suppressing irrelevant variations. The ShA is a lightweight adapter that refines representations within both textual and visual encoders, ensuring consistency and efficiency. Additionally, we integrate Multi-modal Prompt Learning (MaPLe), introducing learnable prompts for visual and action unit-based textual inputs, enhancing semantic alignment between modalities and enabling efficient CLIP adaptation for dynamic tasks. We evaluate PE-CLIP on two benchmark datasets, DFEW and FERV39K, achieving competitive performance compared to state-of-the-art methods while requiring fewer trainable parameters. By balancing efficiency and accuracy, PE-CLIP sets a new benchmark in resource-efficient DFER. The source code of the proposed PE-CLIP will be publicly available at https://github.com/Ibtissam-SAADI/PE-CLIP .
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
Authors:
Ibtissam Saadi,
Douglas W. Cunningham,
Taleb-ahmed Abdelmalik,
Abdenour Hadid,
Yassin El Hillali
Abstract:
Existing methods for driver facial expression recognition (DFER) are often computationally intensive, rendering them unsuitable for real-time applications. In this work, we introduce a novel transfer learning-based dual architecture, named ShuffViT-DFER, which elegantly combines computational efficiency and accuracy. This is achieved by harnessing the strengths of two lightweight and efficient mod…
▽ More
Existing methods for driver facial expression recognition (DFER) are often computationally intensive, rendering them unsuitable for real-time applications. In this work, we introduce a novel transfer learning-based dual architecture, named ShuffViT-DFER, which elegantly combines computational efficiency and accuracy. This is achieved by harnessing the strengths of two lightweight and efficient models using convolutional neural network (CNN) and vision transformers (ViT). We efficiently fuse the extracted features to enhance the performance of the model in accurately recognizing the facial expressions of the driver. Our experimental results on two benchmarking and public datasets, KMU-FED and KDEF, highlight the validity of our proposed method for real-time application with superior performance when compared to state-of-the-art methods.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Perception of Line Attributes for Visualization
Authors:
Anna Sterzik,
Nils Lichtenberg,
Jana Wilms,
Michael Krone,
Douglas W. Cunningham,
Kai Lawonn
Abstract:
Line attributes such as width and dashing are commonly used to encode information. However, many questions on the perception of line attributes remain, such as how many levels of attribute variation can be distinguished or which line attributes are the preferred choices for which tasks. We conducted three studies to develop guidelines for using stylized lines to encode scalar data. In our first st…
▽ More
Line attributes such as width and dashing are commonly used to encode information. However, many questions on the perception of line attributes remain, such as how many levels of attribute variation can be distinguished or which line attributes are the preferred choices for which tasks. We conducted three studies to develop guidelines for using stylized lines to encode scalar data. In our first study, participants drew stylized lines to encode uncertainty information. Uncertainty is usually visualized alongside other data. Therefore, alternative visual channels are important for the visualization of uncertainty. Additionally, uncertainty -- e.g., in weather forecasts -- is a familiar topic to most people. Thus, we picked it for our visualization scenarios in study 1. We used the results of our study to determine the most common line attributes for drawing uncertainty: Dashing, luminance, wave amplitude, and width. While those line attributes were especially common for drawing uncertainty, they are also commonly used in other areas. In studies 2 and 3, we investigated the discriminability of the line attributes determined in study 1. Studies 2 and 3 did not require specific application areas; thus, their results apply to visualizing any scalar data in line attributes. We evaluated the just-noticeable differences (JND) and derived recommendations for perceptually distinct line levels. We found that participants could discriminate considerably more levels for the line attribute width than for wave amplitude, dashing, or luminance.
△ Less
Submitted 8 November, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Perceptually Uniform Construction of Illustrative Textures
Authors:
Anna Sterzik,
Monique Meuschke,
Douglas W. Cunningham,
Kai Lawonn
Abstract:
Illustrative textures, such as stippling or hatching, were predominantly used as an alternative to conventional Phong rendering. Recently, the potential of encoding information on surfaces or maps using different densities has also been recognized. This has the significant advantage that additional color can be used as another visual channel and the illustrative textures can then be overlaid. Effe…
▽ More
Illustrative textures, such as stippling or hatching, were predominantly used as an alternative to conventional Phong rendering. Recently, the potential of encoding information on surfaces or maps using different densities has also been recognized. This has the significant advantage that additional color can be used as another visual channel and the illustrative textures can then be overlaid. Effectively, it is thus possible to display multiple information, such as two different scalar fields on surfaces simultaneously. In previous work, these textures were manually generated and the choice of density was unempirically determined. Here, we first want to determine and understand the perceptual space of illustrative textures. We chose a succession of simplices with increasing dimensions as primitives for our textures: Dots, lines, and triangles. Thus, we explore the texture types of stippling, hatching, and triangles. We create a range of textures by sampling the density space uniformly. Then, we conduct three perceptual studies in which the participants performed pairwise comparisons for each texture type. We use multidimensional scaling (MDS) to analyze the perceptual spaces per category. The perception of stippling and triangles seems relatively similar. Both are adequately described by a 1D manifold in 2D space. The perceptual space of hatching consists of two main clusters: Crosshatched textures, and textures with only one hatching direction. However, the perception of hatching textures with only one hatching direction is similar to the perception of stippling and triangles. Based on our findings, we construct perceptually uniform illustrative textures. Afterwards, we provide concrete application examples for the constructed textures.
△ Less
Submitted 8 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.