-
Mapping Emotions in the Brain: A Bi-Hemispheric Neural Model with Explainable Deep Learning
Authors:
David Freire-Obregón,
Agnieszka Dubiel,
Prasoon Kumar Vinodkumar,
Gholamreza Anbarjafari,
Dorota Kamińska,
Modesto Castrillón-Santana
Abstract:
Recent advances have shown promise in emotion recognition from electroencephalogram (EEG) signals by employing bi-hemispheric neural architectures that incorporate neuroscientific priors into deep learning models. However, interpretability remains a significant limitation for their application in sensitive fields such as affective computing and cognitive modeling. In this work, we introduce a post…
▽ More
Recent advances have shown promise in emotion recognition from electroencephalogram (EEG) signals by employing bi-hemispheric neural architectures that incorporate neuroscientific priors into deep learning models. However, interpretability remains a significant limitation for their application in sensitive fields such as affective computing and cognitive modeling. In this work, we introduce a post-hoc interpretability framework tailored to dual-stream EEG classifiers, extending the Local Interpretable Model-Agnostic Explanations (LIME) approach to accommodate structured, bi-hemispheric inputs. Our method adapts LIME to handle structured two-branch inputs corresponding to left and right-hemisphere EEG channel groups. It decomposes prediction relevance into per-channel contributions across hemispheres and emotional classes. We apply this framework to a previously validated dual-branch recurrent neural network trained on EmoNeuroDB, a dataset of EEG recordings captured during a VR-based emotion elicitation task. The resulting explanations reveal emotion-specific hemispheric activation patterns consistent with known neurophysiological phenomena, such as frontal lateralization in joy and posterior asymmetry in sadness. Furthermore, we aggregate local explanations across samples to derive global channel importance profiles, enabling a neurophysiologically grounded interpretation of the model's decisions. Correlation analysis between symmetric electrodes further highlights the model's emotion-dependent lateralization behavior, supporting the functional asymmetries reported in affective neuroscience.
△ Less
Submitted 16 July, 2025;
originally announced July 2025.
-
Predicting Soccer Penalty Kick Direction Using Human Action Recognition
Authors:
David Freire-Obregón,
Oliverio J. Santana,
Javier Lorenzo-Navarro,
Daniel Hernández-Sosa,
Modesto Castrillón-Santana
Abstract:
Action anticipation has become a prominent topic in Human Action Recognition (HAR). However, its application to real-world sports scenarios remains limited by the availability of suitable annotated datasets. This work presents a novel dataset of manually annotated soccer penalty kicks to predict shot direction based on pre-kick player movements. We propose a deep learning classifier to benchmark t…
▽ More
Action anticipation has become a prominent topic in Human Action Recognition (HAR). However, its application to real-world sports scenarios remains limited by the availability of suitable annotated datasets. This work presents a novel dataset of manually annotated soccer penalty kicks to predict shot direction based on pre-kick player movements. We propose a deep learning classifier to benchmark this dataset that integrates HAR-based feature embeddings with contextual metadata. We evaluate twenty-two backbone models across seven architecture families (MViTv2, MViTv1, SlowFast, Slow, X3D, I3D, C2D), achieving up to 63.9% accuracy in predicting shot direction (left or right), outperforming the real goalkeepers' decisions. These results demonstrate the dataset's value for anticipatory action recognition and validate our model's potential as a generalizable approach for sports-based predictive tasks.
△ Less
Submitted 16 July, 2025;
originally announced July 2025.
-
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
Authors:
Modesto Castrillón-Santana,
Oliverio J Santana,
David Freire-Obregón,
Daniel Hernández-Sosa,
Javier Lorenzo-Navarro
Abstract:
Facial expression recognition (FER) is a key research area in computer vision and human-computer interaction. Despite recent advances in deep learning, challenges persist, especially in generalizing to new scenarios. In fact, zero-shot FER significantly reduces the performance of state-of-the-art FER models. To address this problem, the community has recently started to explore the integration of…
▽ More
Facial expression recognition (FER) is a key research area in computer vision and human-computer interaction. Despite recent advances in deep learning, challenges persist, especially in generalizing to new scenarios. In fact, zero-shot FER significantly reduces the performance of state-of-the-art FER models. To address this problem, the community has recently started to explore the integration of knowledge from Large Language Models for visual tasks. In this work, we evaluate a broad collection of locally executed Visual Language Models (VLMs), avoiding the lack of task-specific knowledge by adopting a Visual Question Answering strategy. We compare the proposed pipeline with state-of-the-art FER models, both integrating and excluding VLMs, evaluating well-known FER benchmarks: AffectNet, FERPlus, and RAF-DB. The results show excellent performance for some VLMs in zero-shot FER scenarios, indicating the need for further exploration to improve FER generalization.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
Authors:
Sondos Mohamed,
Walter Zimmer,
Ross Greer,
Ahmed Alaaeldin Ghita,
Modesto Castrillón-Santana,
Mohan Trivedi,
Alois Knoll,
Salvatore Mario Carta,
Mirko Marras
Abstract:
Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for…
▽ More
Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset when performing transfer learning. Code, data, and qualitative video results are available on the project website: https://roadsense3d.github.io.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Towards Bi-Hemispheric Emotion Mapping through EEG: A Dual-Stream Neural Network Approach
Authors:
David Freire-Obregón,
Daniel Hernández-Sosa,
Oliverio J. Santana,
Javier Lorenzo-Navarro,
Modesto Castrillón-Santana
Abstract:
Emotion classification through EEG signals plays a significant role in psychology, neuroscience, and human-computer interaction. This paper addresses the challenge of mapping human emotions using EEG data in the Mapping Human Emotions through EEG Signals FG24 competition. Subjects mimic the facial expressions of an avatar, displaying fear, joy, anger, sadness, disgust, and surprise in a VR setting…
▽ More
Emotion classification through EEG signals plays a significant role in psychology, neuroscience, and human-computer interaction. This paper addresses the challenge of mapping human emotions using EEG data in the Mapping Human Emotions through EEG Signals FG24 competition. Subjects mimic the facial expressions of an avatar, displaying fear, joy, anger, sadness, disgust, and surprise in a VR setting. EEG data is captured using a multi-channel sensor system to discern brain activity patterns. We propose a novel two-stream neural network employing a Bi-Hemispheric approach for emotion inference, surpassing baseline methods and enhancing emotion recognition accuracy. Additionally, we conduct a temporal analysis revealing that specific signal intervals at the beginning and end of the emotion stimulus sequence contribute significantly to improve accuracy. Leveraging insights gained from this temporal analysis, our approach offers enhanced performance in capturing subtle variations in the states of emotions.
△ Less
Submitted 19 May, 2024; v1 submitted 6 April, 2024;
originally announced May 2024.
-
A Large-Scale Re-identification Analysis in Sporting Scenarios: the Betrayal of Reaching a Critical Point
Authors:
David Freire-Obregón,
Javier Lorenzo-Navarro,
Oliverio J. Santana,
Daniel Hernández-Sosa,
Modesto Castrillón-Santana
Abstract:
Re-identifying participants in ultra-distance running competitions can be daunting due to the extensive distances and constantly changing terrain. To overcome these challenges, computer vision techniques have been developed to analyze runners' faces, numbers on their bibs, and clothing. However, our study presents a novel gait-based approach for runners' re-identification (re-ID) by leveraging var…
▽ More
Re-identifying participants in ultra-distance running competitions can be daunting due to the extensive distances and constantly changing terrain. To overcome these challenges, computer vision techniques have been developed to analyze runners' faces, numbers on their bibs, and clothing. However, our study presents a novel gait-based approach for runners' re-identification (re-ID) by leveraging various pre-trained human action recognition (HAR) models and loss functions. Our results show that this approach provides promising results for re-identifying runners in ultra-distance competitions. Furthermore, we investigate the significance of distinct human body movements when athletes are approaching their endurance limits and their potential impact on re-ID accuracy. Our study examines how the recognition of a runner's gait is affected by a competition's critical point (CP), defined as a moment of severe fatigue and the point where the finish line comes into view, just a few kilometers away from this location. We aim to determine how this CP can improve the accuracy of athlete re-ID. Our experimental results demonstrate that gait recognition can be significantly enhanced (up to a 9% increase in mAP) as athletes approach this point. This highlights the potential of utilizing gait recognition in real-world scenarios, such as ultra-distance competitions or long-duration surveillance tasks.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
An X3D Neural Network Analysis for Runner's Performance Assessment in a Wild Sporting Environment
Authors:
David Freire-Obregón,
Javier Lorenzo-Navarro,
Oliverio J. Santana,
Daniel Hernández-Sosa,
Modesto Castrillón-Santana
Abstract:
We present a transfer learning analysis on a sporting environment of the expanded 3D (X3D) neural networks. Inspired by action quality assessment methods in the literature, our method uses an action recognition network to estimate athletes' cumulative race time (CRT) during an ultra-distance competition. We evaluate the performance considering the X3D, a family of action recognition networks that…
▽ More
We present a transfer learning analysis on a sporting environment of the expanded 3D (X3D) neural networks. Inspired by action quality assessment methods in the literature, our method uses an action recognition network to estimate athletes' cumulative race time (CRT) during an ultra-distance competition. We evaluate the performance considering the X3D, a family of action recognition networks that expand a small 2D image classification architecture along multiple network axes, including space, time, width, and depth. We demonstrate that the resulting neural network can provide remarkable performance for short input footage, with a mean absolute error of 12 minutes and a half when estimating the CRT for runners who have been active from 8 to 20 hours. Our most significant discovery is that X3D achieves state-of-the-art performance while requiring almost seven times less memory to achieve better precision than previous work.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Towards cumulative race time regression in sports: I3D ConvNet transfer learning in ultra-distance running events
Authors:
David Freire-Obregón,
Javier Lorenzo-Navarro,
Oliverio J. Santana,
Daniel Hernández-Sosa,
Modesto Castrillón-Santana
Abstract:
Predicting an athlete's performance based on short footage is highly challenging. Performance prediction requires high domain knowledge and enough evidence to infer an appropriate quality assessment. Sports pundits can often infer this kind of information in real-time. In this paper, we propose regressing an ultra-distance runner cumulative race time (CRT), i.e., the time the runner has been in ac…
▽ More
Predicting an athlete's performance based on short footage is highly challenging. Performance prediction requires high domain knowledge and enough evidence to infer an appropriate quality assessment. Sports pundits can often infer this kind of information in real-time. In this paper, we propose regressing an ultra-distance runner cumulative race time (CRT), i.e., the time the runner has been in action since the race start, by using only a few seconds of footage as input. We modified the I3D ConvNet backbone slightly and trained a newly added regressor for that purpose. We use appropriate pre-processing of the visual input to enable transfer learning from a specific runner. We show that the resulting neural network can provide a remarkable performance for short input footage: 18 minutes and a half mean absolute error in estimating the CRT for runners who have been in action from 8 to 20 hours. Our methodology has several favorable properties: it does not require a human expert to provide any insight, it can be used at any moment during the race by just observing a runner, and it can inform the race staff about a runner at any given time.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Decontextualized I3D ConvNet for ultra-distance runners performance analysis at a glance
Authors:
David Freire-Obregón,
Javier Lorenzo-Navarro,
Modesto Castrillón-Santana
Abstract:
In May 2021, the site runnersworld.com published that participation in ultra-distance races has increased by 1,676% in the last 23 years. Moreover, nearly 41% of those runners participate in more than one race per year. The development of wearable devices has undoubtedly contributed to motivating participants by providing performance measures in real-time. However, we believe there is room for imp…
▽ More
In May 2021, the site runnersworld.com published that participation in ultra-distance races has increased by 1,676% in the last 23 years. Moreover, nearly 41% of those runners participate in more than one race per year. The development of wearable devices has undoubtedly contributed to motivating participants by providing performance measures in real-time. However, we believe there is room for improvement, particularly from the organizers point of view. This work aims to determine how the runners performance can be quantified and predicted by considering a non-invasive technique focusing on the ultra-running scenario. In this sense, participants are captured when they pass through a set of locations placed along the race track. Each footage is considered an input to an I3D ConvNet to extract the participant's running gait in our work. Furthermore, weather and illumination capture conditions or occlusions may affect these footages due to the race staff and other runners. To address this challenging task, we have tracked and codified the participant's running gait at some RPs and removed the context intending to ensure a runner-of-interest proper evaluation. The evaluation suggests that the features extracted by an I3D ConvNet provide enough information to estimate the participant's performance along the different race tracks.
△ Less
Submitted 26 May, 2022; v1 submitted 13 March, 2022;
originally announced March 2022.
-
Deep learning for source camera identification on mobile devices
Authors:
David Freire-Obregón,
Fabio Narducci,
Silvio Barra,
Modesto Castrillón-Santana
Abstract:
In the present paper, we propose a source camera identification method for mobile devices based on deep learning. Recently, convolutional neural networks (CNNs) have shown a remarkable performance on several tasks such as image recognition, video analysis or natural language processing. A CNN consists on a set of layers where each layer is composed by a set of high pass filters which are applied a…
▽ More
In the present paper, we propose a source camera identification method for mobile devices based on deep learning. Recently, convolutional neural networks (CNNs) have shown a remarkable performance on several tasks such as image recognition, video analysis or natural language processing. A CNN consists on a set of layers where each layer is composed by a set of high pass filters which are applied all over the input image. This convolution process provides the unique ability to extract features automatically from data and to learn from those features. Our proposal describes a CNN architecture which is able to infer the noise pattern of mobile camera sensors (also known as camera fingerprint) with the aim at detecting and identifying not only the mobile device used to capture an image (with a 98\% of accuracy), but also from which embedded camera the image was captured. More specifically, we provide an extensive analysis on the proposed architecture considering different configurations. The experiment has been carried out using the images captured from different mobile devices cameras (MICHE-I Dataset was used) and the obtained results have proved the robustness of the proposed method.
△ Less
Submitted 13 October, 2017; v1 submitted 30 September, 2017;
originally announced October 2017.
-
Comparative study of histogram distance measures for re-identification
Authors:
Pedro A. Marín-Reyes,
Javier Lorenzo-Navarro,
Modesto Castrillón-Santana
Abstract:
Color based re-identification methods usually rely on a distance function to measure the similarity between individuals. In this paper we study the behavior of several histogram distance measures in different color spaces. We wonder whether there is a particular histogram distance measure better than others, likewise also, if there is a color space that present better discrimination features. Seve…
▽ More
Color based re-identification methods usually rely on a distance function to measure the similarity between individuals. In this paper we study the behavior of several histogram distance measures in different color spaces. We wonder whether there is a particular histogram distance measure better than others, likewise also, if there is a color space that present better discrimination features. Several experiments are designed and evaluated in several images to obtain measures against various color spaces. We test in several image databases. A measure ranking is generated to calculate the area under the CMC, this area is the indicator used to evaluate which distance measure and color space present the best performance for the considered databases. Also, other parameters such as the image division in horizontal stripes and number of histogram bins, have been studied.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
Optimized clothes segmentation to boost gender classification in unconstrained scenarios
Authors:
D. Freire-Obregón,
M. Castrillón-Santana,
J. Lorenzo-Navarro
Abstract:
Several applications require demographic information of ordinary people in unconstrained scenarios. This is not a trivial task due to significant human appearance variations. In this work, we introduce trixels for clustering image regions, enumerating their advantages compared to superpixels. The classical GrabCut algorithm is later modified to segment trixels instead of pixels in an unsupervised…
▽ More
Several applications require demographic information of ordinary people in unconstrained scenarios. This is not a trivial task due to significant human appearance variations. In this work, we introduce trixels for clustering image regions, enumerating their advantages compared to superpixels. The classical GrabCut algorithm is later modified to segment trixels instead of pixels in an unsupervised context. Combining with face detection lead us to a clothes segmentation approach close to real time. The study uses the challenging Pascal VOC dataset for segmentation evaluation experiments. A final experiment analyzes the fusion of clothes features with state-of-the-art gender classifiers in ClothesDB, revealing a significant performance improvement in gender classification.
△ Less
Submitted 12 November, 2016;
originally announced November 2016.
-
Descriptors and regions of interest fusion for gender classification in the wild. Comparison and combination with Convolutional Neural Networks
Authors:
M. Castrillón-Santana,
J. Lorenzo-Navarro,
E. Ramón-Balmaseda
Abstract:
Gender classification (GC) has achieved high accuracy in different experimental evaluations based mostly on inner facial details. However, these results do not generalize well in unrestricted datasets and particularly in cross-database experiments, where the performance drops drastically. In this paper, we analyze the state-of-the-art GC accuracy on three large datasets: MORPH, LFW and GROUPS. We…
▽ More
Gender classification (GC) has achieved high accuracy in different experimental evaluations based mostly on inner facial details. However, these results do not generalize well in unrestricted datasets and particularly in cross-database experiments, where the performance drops drastically. In this paper, we analyze the state-of-the-art GC accuracy on three large datasets: MORPH, LFW and GROUPS. We discuss their respective difficulties and bias, concluding that the most challenging and wildest complexity is present in GROUPS. This dataset covers hard conditions such as low resolution imagery and cluttered background. Firstly, we analyze in depth the performance of different descriptors extracted from the face and its local context on this dataset. Selecting the bests and studying their most suitable combination allows us to design a solution that beats any previously published results for GROUPS with the Dago's protocol, reaching an accuracy over 94.2%, reducing the gap with other simpler datasets. The chosen solution based on local descriptors is later evaluated in a cross-database scenario with the three mentioned datasets, and full dataset 5-fold cross validation. The achieved results are compared with a Convolutional Neural Network approach, achieving rather similar marks. Finally, a solution is proposed combining both focuses, exhibiting great complementarity, boosting GC performance to beat previously published results in GC both cross-database, and full in-database evaluations.
△ Less
Submitted 19 February, 2016; v1 submitted 24 July, 2015;
originally announced July 2015.