-
Overcoming Occlusions in the Wild: A Multi-Task Age Head Approach to Age Estimation
Authors:
Waqar Tanveer,
Laura Fernández-Robles,
Eduardo Fidalgo,
Víctor González-Castro,
Enrique Alegre
Abstract:
Facial age estimation has achieved considerable success under controlled conditions. However, in unconstrained real-world scenarios, which are often referred to as 'in the wild', age estimation remains challenging, especially when faces are partially occluded, which may obscure their visibility. To address this limitation, we propose a new approach integrating generative adversarial networks (GANs…
▽ More
Facial age estimation has achieved considerable success under controlled conditions. However, in unconstrained real-world scenarios, which are often referred to as 'in the wild', age estimation remains challenging, especially when faces are partially occluded, which may obscure their visibility. To address this limitation, we propose a new approach integrating generative adversarial networks (GANs) and transformer architectures to enable robust age estimation from occluded faces. We employ an SN-Patch GAN to effectively remove occlusions, while an Attentive Residual Convolution Module (ARCM), paired with a Swin Transformer, enhances feature representation. Additionally, we introduce a Multi-Task Age Head (MTAH) that combines regression and distribution learning, further improving age estimation under occlusion. Experimental results on the FG-NET, UTKFace, and MORPH datasets demonstrate that our proposed approach surpasses existing state-of-the-art techniques for occluded facial age estimation by achieving an MAE of $3.00$, $4.54$, and $2.53$ years, respectively.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
The formula for the completion time of project networks
Authors:
Manuel Castejón-Limas,
Gabriel Medina Martínez,
Virginia Riego del Castillo,
Laura Fernández-Robles
Abstract:
This paper formulates the completion time $τ$ of a project network as $ τ=\|\mathbf{R} \mathbf{t} \|_\infty $ where the rows of $\mathbf{R}$ are simple paths of the network and $\mathbf{t}$ is a column vector representing the duration of the activities. Considering this product as a linear transformation leads to interesting findings on the topological relevance of both paths and activities using…
▽ More
This paper formulates the completion time $τ$ of a project network as $ τ=\|\mathbf{R} \mathbf{t} \|_\infty $ where the rows of $\mathbf{R}$ are simple paths of the network and $\mathbf{t}$ is a column vector representing the duration of the activities. Considering this product as a linear transformation leads to interesting findings on the topological relevance of both paths and activities using singular value decomposition. The notion of spectral networks is introduced to condense the fundamental structure of the project network. A definition of project stress is introduced to establish a comparison index between two alternatives in terms of slack. Additionally, the Moore-Penrose inverse of $\mathbf{R}$ is presented to find the configuration of the durations of the activities resulting in a given simple path duration vector. Then, the systematic mapping review process carried out to assess our claims' novelty is reported. Finally, we reflect on the notion of relevance for paths and activities and the relationship of the incidence matrix with the proposed approach.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
Authors:
Andrés Carofilis,
Laura Fernández-Robles,
Enrique Alegre,
Eduardo Fidalgo
Abstract:
A recent trend in speech processing is the use of embeddings created through machine learning models trained on a specific task with large datasets. By leveraging the knowledge already acquired, these models can be reused in new tasks where the amount of available data is small. This paper proposes a pipeline to create a new model, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), cap…
▽ More
A recent trend in speech processing is the use of embeddings created through machine learning models trained on a specific task with large datasets. By leveraging the knowledge already acquired, these models can be reused in new tasks where the amount of available data is small. This paper proposes a pipeline to create a new model, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), capable of generating robust embeddings for speech processing. MeWEHV combines the embeddings generated by a pre-trained raw audio waveform encoder model, and deep features extracted from Mel Frequency Cepstral Coefficients (MFCCs) using Convolutional Neural Networks (CNNs). We evaluate the performance of MeWEHV on three tasks: speaker, language, and accent identification. For the first one, we use the VoxCeleb1 dataset and present YouSpeakers204, a new and publicly available dataset for English speaker identification that contains 19607 audio clips from 204 persons speaking in six different accents, allowing other researchers to work with a very balanced dataset, and to create new models that are robust to multiple accents. For evaluating the language identification task, we use the VoxForge and Common Language datasets. Finally, for accent identification, we use the Latin American Spanish Corpora (LASC) and Common Voice datasets. Our approach allows a significant increase in the performance of state-of-the-art models on all the tested datasets, with a low additional computational cost.
△ Less
Submitted 24 June, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.