Search | arXiv e-print repository

A survey on Graph Deep Representation Learning for Facial Expression Recognition

Authors: Théo Gueuret, Akrem Sellami, Chaabane Djeraba

Abstract: This comprehensive review delves deeply into the various methodologies applied to facial expression recognition (FER) through the lens of graph representation learning (GRL). Initially, we introduce the task of FER and the concepts of graph representation and GRL. Afterward, we discuss some of the most prevalent and valuable databases for this task. We explore promising approaches for graph repres… ▽ More This comprehensive review delves deeply into the various methodologies applied to facial expression recognition (FER) through the lens of graph representation learning (GRL). Initially, we introduce the task of FER and the concepts of graph representation and GRL. Afterward, we discuss some of the most prevalent and valuable databases for this task. We explore promising approaches for graph representation in FER, including graph diffusion, spatio-temporal graphs, and multi-stream architectures. Finally, we identify future research opportunities and provide concluding remarks. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2304.10211 [pdf, other]

Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras

Authors: Sami Barchid, Benjamin Allaert, Amel Aissaoui, José Mennesson, Chaabane Djéraba

Abstract: Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of… ▽ More Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2206.06506 [pdf, other]

Spiking Neural Networks for Frame-based and Event-based Single Object Localization

Authors: Sami Barchid, José Mennesson, Jason Eshraghian, Chaabane Djéraba, Mohammed Bennamoun

Abstract: Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks. However, understanding the impacts of sensor noises and input encodings on the network activity and performance remains difficult with common neuromorphic vision baselines like classification. Therefore, we propose a spiking neural network approach for single object localization traine… ▽ More Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks. However, understanding the impacts of sensor noises and input encodings on the network activity and performance remains difficult with common neuromorphic vision baselines like classification. Therefore, we propose a spiking neural network approach for single object localization trained using surrogate gradient descent, for frame- and event-based sensors. We compare our method with similar artificial neural networks and show that our model has competitive/better performance in accuracy, robustness against various corruptions, and has lower energy consumption. Moreover, we study the impact of neural coding schemes for static images in accuracy, robustness, and energy efficiency. Our observations differ importantly from previous studies on bio-plausible learning rules, which helps in the design of surrogate gradient trained architectures, and offers insight to design priorities in future neuromorphic technologies in terms of noise characteristics and data encoding methods. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 21 pages, 12 figures

arXiv:2202.13662 [pdf, other]

Bina-Rep Event Frames: a Simple and Effective Representation for Event-based cameras

Authors: Sami Barchid, José Mennesson, Chaabane Djéraba

Abstract: This paper presents "Bina-Rep", a simple representation method that converts asynchronous streams of events from event cameras to a sequence of sparse and expressive event frames. By representing multiple binary event images as a single frame of $N$-bit numbers, our method is able to obtain sparser and more expressive event frames thanks to the retained information about event orders in the origin… ▽ More This paper presents "Bina-Rep", a simple representation method that converts asynchronous streams of events from event cameras to a sequence of sparse and expressive event frames. By representing multiple binary event images as a single frame of $N$-bit numbers, our method is able to obtain sparser and more expressive event frames thanks to the retained information about event orders in the original stream. Coupled with our proposed model based on a convolutional neural network, the reported results achieve state-of-the-art performance and repeatedly outperforms other common event representation methods. Our approach also shows competitive robustness against common image corruptions, compared to other representation techniques. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2106.13992 [pdf]

Mining atmospheric data

Authors: Chaabane Djeraba, Jérôme Riedi

Abstract: This paper overviews two interdependent issues important for mining remote sensing data (e.g. images) obtained from atmospheric monitoring missions. The first issue relates the building new public datasets and benchmarks, which are hot priority of the remote sensing community. The second issue is the investigation of deep learning methodologies for atmospheric data classification based on vast amo… ▽ More This paper overviews two interdependent issues important for mining remote sensing data (e.g. images) obtained from atmospheric monitoring missions. The first issue relates the building new public datasets and benchmarks, which are hot priority of the remote sensing community. The second issue is the investigation of deep learning methodologies for atmospheric data classification based on vast amount of data without annotations and with localized annotated data provided by sparse observing networks at the surface. The targeted application is air quality assessment and prediction. Air quality is defined as the pollution level linked with several atmospheric constituents such as gases and aerosols. There are dependency relationships between the bad air quality, caused by air pollution, and the public health. The target application is the development of a fast prediction model for local and regional air quality assessment and tracking. The results of mining data will have significant implication for citizen and decision makers by providing a fast prediction and reliable air quality monitoring system able to cover the local and regional scale through intelligent extrapolation of sparse ground-based in situ measurement networks. △ Less

Submitted 26 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 figure

arXiv:2105.11925 [pdf, other]

Review on Indoor RGB-D Semantic Segmentation with Deep Convolutional Neural Networks

Authors: Sami Barchid, José Mennesson, Chaabane Djéraba

Abstract: Many research works focus on leveraging the complementary geometric information of indoor depth sensors in vision tasks performed by deep convolutional neural networks, notably semantic segmentation. These works deal with a specific vision task known as "RGB-D Indoor Semantic Segmentation". The challenges and resulting solutions of this task differ from its standard RGB counterpart. This results i… ▽ More Many research works focus on leveraging the complementary geometric information of indoor depth sensors in vision tasks performed by deep convolutional neural networks, notably semantic segmentation. These works deal with a specific vision task known as "RGB-D Indoor Semantic Segmentation". The challenges and resulting solutions of this task differ from its standard RGB counterpart. This results in a new active research topic. The objective of this paper is to introduce the field of Deep Convolutional Neural Networks for RGB-D Indoor Semantic Segmentation. This review presents the most popular public datasets, proposes a categorization of the strategies employed by recent contributions, evaluates the performance of the current state-of-the-art, and discusses the remaining challenges and promising directions for future works. △ Less

Submitted 25 May, 2021; originally announced May 2021.

arXiv:2105.05609 [pdf, other]

Deep Spiking Convolutional Neural Network for Single Object Localization Based On Deep Continuous Local Learning

Authors: Sami Barchid, José Mennesson, Chaabane Djéraba

Abstract: With the advent of neuromorphic hardware, spiking neural networks can be a good energy-efficient alternative to artificial neural networks. However, the use of spiking neural networks to perform computer vision tasks remains limited, mainly focusing on simple tasks such as digit recognition. It remains hard to deal with more complex tasks (e.g. segmentation, object detection) due to the small numb… ▽ More With the advent of neuromorphic hardware, spiking neural networks can be a good energy-efficient alternative to artificial neural networks. However, the use of spiking neural networks to perform computer vision tasks remains limited, mainly focusing on simple tasks such as digit recognition. It remains hard to deal with more complex tasks (e.g. segmentation, object detection) due to the small number of works on deep spiking neural networks for these tasks. The objective of this paper is to make the first step towards modern computer vision with supervised spiking neural networks. We propose a deep convolutional spiking neural network for the localization of a single object in a grayscale image. We propose a network based on DECOLLE, a spiking model that enables local surrogate gradient-based learning. The encouraging results reported on Oxford-IIIT-Pet validates the exploitation of spiking neural networks with a supervised learning approach for more elaborate vision tasks in the future. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2012.13217 [pdf, other]

doi 10.1109/TIP.2021.3129120

Dynamic Facial Expression Recognition under Partial Occlusion with Optical Flow Reconstruction

Authors: Delphine Poux, Benjamin Allaert, Nacim Ihaddadene, Ioan Marius Bilasco, Chaabane Djeraba, Mohammed Bennamoun

Abstract: Video facial expression recognition is useful for many applications and received much interest lately. Although some solutions give really good results in a controlled environment (no occlusion), recognition in the presence of partial facial occlusion remains a challenging task. To handle occlusions, solutions based on the reconstruction of the occluded part of the face have been proposed. These s… ▽ More Video facial expression recognition is useful for many applications and received much interest lately. Although some solutions give really good results in a controlled environment (no occlusion), recognition in the presence of partial facial occlusion remains a challenging task. To handle occlusions, solutions based on the reconstruction of the occluded part of the face have been proposed. These solutions are mainly based on the texture or the geometry of the face. However, the similarity of the face movement between different persons doing the same expression seems to be a real asset for the reconstruction. In this paper we exploit this asset and propose a new solution based on an auto-encoder with skip connections to reconstruct the occluded part of the face in the optical flow domain. To the best of our knowledge, this is the first proposition to directly reconstruct the movement for facial expression recognition. We validated our approach in the controlled dataset CK+ on which different occlusions were generated. Our experiments show that the proposed method reduce significantly the gap, in terms of recognition accuracy, between occluded and non-occluded situations. We also compare our approach with existing state-of-the-art solutions. In order to lay the basis of a reproducible and fair comparison in the future, we also propose a new experimental protocol that includes occlusion generation and reconstruction evaluation. △ Less

Submitted 24 December, 2020; originally announced December 2020.

arXiv:1905.10784 [pdf, other]

doi 10.1109/TAFFC.2021.3124142

Impact of facial landmark localization on facial expression recognition

Authors: Romain Belmonte, Benjamin Allaert, Pierre Tirilly, Ioan Marius Bilasco, Chaabane Djeraba, Nicu Sebe

Abstract: Although facial landmark localization (FLL) approaches are becoming increasingly accurate for characterizing facial regions, one question remains unanswered: what is the impact of these approaches on subsequent related tasks? In this paper, the focus is put on facial expression recognition (FER), where facial landmarks are used for face registration, which is a common usage. Since the most used da… ▽ More Although facial landmark localization (FLL) approaches are becoming increasingly accurate for characterizing facial regions, one question remains unanswered: what is the impact of these approaches on subsequent related tasks? In this paper, the focus is put on facial expression recognition (FER), where facial landmarks are used for face registration, which is a common usage. Since the most used datasets for facial landmark localization do not allow for a proper measurement of performance according to the different difficulties (e.g., pose, expression, illumination, occlusion, motion blur), we also quantify the performance of recent approaches in the presence of head pose variations and facial expressions. Finally, a study of the impact of these approaches on FER is conducted. We show that the landmark accuracy achieved so far optimizing the conventional Euclidean distance does not necessarily guarantee a gain in performance for FER. To deal with this issue, we propose a new evaluation metric for FLL adapted to FER. △ Less

Submitted 19 July, 2021; v1 submitted 26 May, 2019; originally announced May 2019.

arXiv:1904.13154 [pdf, other]

doi 10.1007/s11042-020-08993-5

Facial Expressions Analysis Under Occlusions Based on Specificities of Facial Motion Propagation

Authors: Delphine Poux, Benjamin Allaert, Jose Mennesson, Nacim Ihaddadene, Ioan Marius Bilasco, Chaabane Djeraba

Abstract: Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement… ▽ More Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1904.11592 [pdf, other]

doi 10.1016/j.neucom.2022.05.077

Optical Flow Techniques for Facial Expression Analysis -- a Practical Evaluation Study

Authors: Benjamin Allaert, Isaac Ronald Ward, Ioan Marius Bilasco, Chaabane Djeraba, Mohammed Bennamoun

Abstract: Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. The aim of this work is not to propose a… ▽ More Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. The aim of this work is not to propose a new expression recognition technique, but to understand better the adequacy of existing state-of-the art optical flow for encoding facial motion in the context of facial expression recognition. Our evaluations highlight the fact that motion approximation methods used to overcome motion discontinuities have a significant impact when optical flows are used to characterize facial expressions. △ Less

Submitted 31 January, 2022; v1 submitted 25 April, 2019; originally announced April 2019.

arXiv:1805.01951 [pdf, other]

doi 10.1109/TAFFC.2019.2949559

Advanced local motion patterns for macro and micro facial expression recognition

Authors: B. Allaert, IM. Bilasco, C. Djeraba

Abstract: In this paper, we develop a new method that recognizes facial expressions, on the basis of an innovative local motion patterns feature, with three main contributions. The first one is the analysis of the face skin temporal elasticity and face deformations during expression. The second one is a unified approach for both macro and micro expression recognition. And, the third one is the step forward… ▽ More In this paper, we develop a new method that recognizes facial expressions, on the basis of an innovative local motion patterns feature, with three main contributions. The first one is the analysis of the face skin temporal elasticity and face deformations during expression. The second one is a unified approach for both macro and micro expression recognition. And, the third one is the step forward towards in-the-wild expression recognition, dealing with challenges such as various intensity and various expression activation patterns, illumination variation and small head pose variations. Our method outperforms state-of-the-art methods for micro expression recognition and positions itself among top-rank state-of-the-art methods for macro expression recognition. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Showing 1–12 of 12 results for author: Djeraba, C