Skip to main content

Showing 1–50 of 128 results for author: Escalera, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17223  [pdf, ps, other

    cs.CV

    REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

    Authors: Siyang Song, Micol Spitale, Xiangyu Kong, Hengde Zhu, Cheng Luo, Cristina Palmero, German Barquero, Sergio Escalera, Michel Valstar, Mohamed Daoudi, Tobias Baur, Fabien Ringeval, Andrew Howes, Elisabeth Andre, Hatice Gunes

    Abstract: In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, re… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    MSC Class: 68T40

  2. arXiv:2505.07300  [pdf, other

    cs.CV

    L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

    Authors: Sofia Casarin, Sergio Escalera, Oswald Lanz

    Abstract: Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded. Despite rapid developments in the field, current SOTA ZC proxies are typ… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: accepted at CVPR 2025

  3. arXiv:2504.12021  [pdf, other

    cs.CV

    Action Anticipation from SoccerNet Football Video Broadcasts

    Authors: Mohamad Dalal, Artur Xarles, Anthony Cioppa, Silvio Giancola, Marc Van Droogenbroeck, Bernard Ghanem, Albert Clapés, Sergio Escalera, Thomas B. Moeslund

    Abstract: Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player's motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which c… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 15 pages, 14 figures. To be published in the CVSports CVPR workshop

    ACM Class: I.2.10; I.4.8

  4. arXiv:2504.06163  [pdf, other

    cs.CV

    Action Valuation in Sports: A Survey

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: Action Valuation (AV) has emerged as a key topic in Sports Analytics, offering valuable insights by assigning scores to individual actions based on their contribution to desired outcomes. Despite a few surveys addressing related concepts such as Player Valuation, there is no comprehensive review dedicated to an in-depth analysis of AV across different sports. In this survey, we introduce a taxonom… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  5. arXiv:2504.05265  [pdf, other

    cs.CV

    From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models

    Authors: German Barquero, Nadine Bertsch, Manojkumar Marramreddy, Carlos Chacón, Filippo Arcadu, Ferran Rigual, Nicky Sijia He, Cristina Palmero, Sergio Escalera, Yuting Ye, Robin Kips

    Abstract: In extended reality (XR), generating full-body motion of the users is important to understand their actions, drive their virtual avatars for social interaction, and convey a realistic sense of presence. While prior works focused on spatially sparse and always-on input signals from motion controllers, many XR applications opt for vision-based hand tracking for reduced user friction and better immer… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Published in CVPR'25. Webpage: https://barquerogerman.github.io/RPM/

  6. arXiv:2504.01019  [pdf, other

    cs.CV

    MixerMDM: Learnable Composition of Human Motion Diffusion Models

    Authors: Pablo Ruiz-Ponce, German Barquero, Cristina Palmero, Sergio Escalera, José García-Rodríguez

    Abstract: Generating human motion guided by conditions such as textual descriptions is challenging due to the need for datasets with pairs of high-quality motion and their corresponding conditions. The difficulty increases when aiming for finer control in the generation. To that end, prior works have proposed to combine several motion diffusion models pre-trained on datasets with different types of conditio… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Accepted - Project Page: https://pabloruizponce.com/papers/MixerMDM

  7. arXiv:2504.00458  [pdf, other

    cs.CV

    Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection

    Authors: Shunxin Chen, Ajian Liu, Junze Zheng, Jun Wan, Kailai Peng, Sergio Escalera, Zhen Lei

    Abstract: Facial recognition systems in real-world scenarios are susceptible to both digital and physical attacks. Previous methods have attempted to achieve classification by learning a comprehensive feature space. However, these methods have not adequately accounted for the inherent characteristics of physical and digital attack data, particularly the large intra class variation in attacks and the small i… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 9 pages, 5 figures, accepted by AAAI-2025 (Oral)

  8. arXiv:2503.15166  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU

    Authors: Àlex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Machine unlearning methods have become increasingly important for selective concept removal in large pre-trained models. While recent work has explored unlearning in Euclidean contrastive vision-language models, the effectiveness of concept removal in hyperbolic spaces remains unexplored. This paper investigates machine unlearning in hyperbolic contrastive learning by adapting Alignment Calibratio… ▽ More

    Submitted 14 April, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Preprint

  9. YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID

    Authors: Iñaki Erregue, Kamal Nasrollahi, Sergio Escalera

    Abstract: We introduce YOLO11-JDE, a fast and accurate multi-object tracking (MOT) solution that combines real-time object detection with self-supervised Re-Identification (Re-ID). By incorporating a dedicated Re-ID branch into YOLO11s, our model performs Joint Detection and Embedding (JDE), generating appearance features for each detection. The Re-ID branch is trained in a fully self-supervised setting whi… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: This paper has been accepted to the 5th Workshop on Real-World Surveillance: Applications and Challenges (WACV 2025)

  10. arXiv:2501.01728  [pdf, other

    cs.CV

    Multimodal classification of forest biodiversity potential from 2D orthophotos and 3D airborne laser scanning point clouds

    Authors: Simon B. Jensen, Stefan Oehmcke, Andreas Møgelmose, Meysam Madadi, Christian Igel, Sergio Escalera, Thomas B. Moeslund

    Abstract: Assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This study investigates whether deep learning-based fusion of close-range sensing data from 2D orthophotos and 3D airborne laser scanning (ALS) point clouds can reliable assess the biodiversity pote… ▽ More

    Submitted 1 May, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  11. arXiv:2411.13332  [pdf, other

    cs.LG cs.AI

    Verifying Machine Unlearning with Explainable AI

    Authors: Àlex Pujol Vidal, Anders S. Johansen, Mohammad N. S. Jahromi, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within the context of harbor front monitoring, focusing on data privacy and regulatory compliance. With the increasing need to adhere to privacy legislation such as the General Data Protection Regulation (GDPR), traditional methods of retraining ML models for data deletions prove impractical due to their… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: ICPRW2024

  12. arXiv:2411.05705  [pdf

    cs.CV eess.IV

    Image inpainting enhancement by replacing the original mask with a self-attended region from the input image

    Authors: Kourosh Kiani, Razieh Rastgoo, Alireza Chaji, Sergio Escalera

    Abstract: Image inpainting, the process of restoring missing or corrupted regions of an image by reconstructing pixel information, has recently seen considerable advancements through deep learning-based approaches. In this paper, we introduce a novel deep learning-based pre-processing methodology for image inpainting utilizing the Vision Transformer (ViT). Our approach involves replacing masked pixel values… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  13. arXiv:2410.02392  [pdf, other

    cs.LG math.AT

    MANTRA: The Manifold Triangulations Assemblage

    Authors: Rubén Ballester, Ernst Röell, Daniel Bīn Schmid, Mathieu Alain, Sergio Escalera, Carles Casacuberta, Bastian Rieck

    Abstract: The rising interest in leveraging higher-order interactions present in complex systems has led to a surge in more expressive models exploiting higher-order structures in the data, especially in topological deep learning (TDL), which designs neural networks on higher-order domains such as simplicial complexes. However, progress in this field is hindered by the scarcity of datasets for benchmarking… ▽ More

    Submitted 3 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025 (https://openreview.net/forum?id=X6y5CC44HM)

  14. arXiv:2409.11923  [pdf, other

    cs.CV

    Agglomerative Token Clustering

    Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

    Abstract: We present Agglomerative Token Clustering (ATC), a novel token merging method that consistently outperforms previous token merging and pruning methods across image classification, image synthesis, and object detection & segmentation tasks. ATC merges clusters through bottom-up hierarchical clustering, without the introduction of extra learnable parameters. We find that ATC achieves state-of-the-ar… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: ECCV 2024. Project webpage at https://vap.aau.dk/atc/

  15. arXiv:2409.10587  [pdf, other

    cs.CV

    SoccerNet 2024 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski , et al. (59 additional authors not shown)

    Abstract: The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 1 figure

  16. arXiv:2406.09073  [pdf, other

    cs.LG

    Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

    Authors: Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon

    Abstract: We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2405.14094  [pdf, other

    cs.LG cs.AI cs.CV math.AT stat.ML

    Attending to Topological Spaces: The Cellular Transformer

    Authors: Rubén Ballester, Pablo Hernández-García, Mathilde Papillon, Claudio Battiloro, Nina Miolane, Tolga Birdal, Carles Casacuberta, Sergio Escalera, Mustafa Hajij

    Abstract: Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers… ▽ More

    Submitted 26 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  18. arXiv:2405.06994  [pdf, other

    cs.CV cs.LG

    GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts

    Authors: Sofia Casarin, Oswald Lanz, Sergio Escalera

    Abstract: Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  19. arXiv:2404.09988  [pdf, other

    cs.CV

    in2IN: Leveraging individual Information to Generate Human INteractions

    Authors: Pablo Ruiz Ponce, German Barquero, Cristina Palmero, Sergio Escalera, Jose Garcia-Rodriguez

    Abstract: Generating human-human motion interactions conditioned on textual descriptions is a very useful application in many areas such as robotics, gaming, animation, and the metaverse. Alongside this utility also comes a great difficulty in modeling the highly dimensional inter-personal dynamics. In addition, properly capturing the intra-personal diversity of interactions has a lot of challenges. Current… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Project page: https://pabloruizponce.github.io/in2IN/

  20. arXiv:2404.09703  [pdf, other

    cs.LG stat.ML

    AI Competitions and Benchmarks: Dataset Development

    Authors: Romain Egele, Julio C. S. Jacques Junior, Jan N. van Rijn, Isabelle Guyon, Xavier Baró, Albert Clapés, Prasanna Balaprakash, Sergio Escalera, Thomas Moeslund, Jun Wan

    Abstract: Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even in today's digital era, where substantial data is generated daily, it is uncommon for it to be readily usable; most often, it necessitates meticulous manual dat… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Preprint version of the 3rd Chapter of the book: Competitions and Benchmarks, the science behind the contests (https://sites.google.com/chalearn.org/book/home)

  21. arXiv:2404.06211  [pdf, other

    cs.CV

    Unified Physical-Digital Attack Detection Challenge

    Authors: Haocheng Yuan, Ajian Liu, Junze Zheng, Jun Wan, Jiankang Deng, Sergio Escalera, Hugo Jair Escalante, Isabelle Guyon, Zhen Lei

    Abstract: Face Anti-Spoofing (FAS) is crucial to safeguard Face Recognition (FR) Systems. In real-world scenarios, FRs are confronted with both physical and digital attacks. However, existing algorithms often address only one type of attack at a time, which poses significant limitations in real-world scenarios where FR systems face hybrid physical-digital threats. To facilitate the research of Unified Attac… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures

  22. arXiv:2404.05392  [pdf, other

    cs.CV

    T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discriminability among frame representations, high output temporal resolution to maintain prediction precision, and the necessity to capture information at different temporal scales to handle e… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  23. arXiv:2404.01891  [pdf, other

    cs.CV

    ASTRA: An Action Spotting TRAnsformer for Soccer Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transfor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  24. arXiv:2404.01775  [pdf, other

    cs.CV cs.AI cs.LG

    A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  25. arXiv:2403.15194  [pdf, other

    cs.CV cs.LG

    Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion

    Authors: Sofia Casarin, Cynthia I. Ugwu, Sergio Escalera, Oswald Lanz

    Abstract: The landscape of deep learning research is moving towards innovative strategies to harness the true potential of data. Traditionally, emphasis has been on scaling model architectures, resulting in large and complex neural networks, which can be difficult to train with limited computational resources. However, independently of the model size, data quality (i.e. amount and variability) is still a ma… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  26. arXiv:2403.14333  [pdf, other

    cs.CV

    CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

    Authors: Ajian Liu, Shuai Xue, Jianwen Gan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Zhen Lei

    Abstract: Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains. Existing methods either rely on domain labels to align domain-invariant feature spaces, or disentangle generalizable features from the whole sample, which inevitably lead to the distortion of semantic feature structures and achieve limited generalization. In this work, we make use o… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures

  27. arXiv:2402.15509  [pdf, other

    cs.CV

    Seamless Human Motion Composition with Blended Positional Encodings

    Authors: German Barquero, Sergio Escalera, Cristina Palmero

    Abstract: Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Project page: https://barquerogerman.github.io/FlowMDM/

  28. arXiv:2402.14720  [pdf

    cs.CV

    A Transformer Model for Boundary Detection in Continuous Sign Language

    Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera

    Abstract: Sign Language Recognition (SLR) has garnered significant attention from researchers in recent years, particularly the intricate domain of Continuous Sign Language Recognition (CSLR), which presents heightened complexity compared to Isolated Sign Language Recognition (ISLR). One of the prominent challenges in CSLR pertains to accurately detecting the boundaries of isolated signs within a continuous… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  29. arXiv:2402.02441  [pdf, other

    cs.LG cs.AI cs.MS stat.CO

    TopoX: A Suite of Python Packages for Machine Learning on Topological Domains

    Authors: Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Rubén Ballester, Claudio Battiloro, Guillermo Bernárdez, Tolga Birdal, Aiden Brent, Peter Chin, Sergio Escalera, Simone Fiorellino, Odin Hoff Gardaa, Gurusankar Gopalakrishnan, Devendra Govil, Josef Hoppe, Maneel Reddy Karri, Jude Khouja, Manuel Lecha, Neal Livesay, Jan Meißner, Soham Mukherjee, Alexander Nikitin, Theodore Papamarkou , et al. (18 additional authors not shown)

    Abstract: We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order… ▽ More

    Submitted 8 December, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  30. arXiv:2401.17699  [pdf, other

    cs.CV

    Unified Physical-Digital Face Attack Detection

    Authors: Hao Fang, Ajian Liu, Haocheng Yuan, Junze Zheng, Dingheng Zeng, Yanhong Liu, Jiankang Deng, Sergio Escalera, Xiaoming Liu, Jun Wan, Zhen Lei

    Abstract: Face Recognition (FR) systems can suffer from physical (i.e., print photo) and digital (i.e., DeepFake) attacks. However, previous related work rarely considers both situations at the same time. This implies the deployment of multiple models and thus more computational burden. The main reasons for this lack of an integrated model are caused by two factors: (1) The lack of a dataset including both… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures

  31. arXiv:2401.05166  [pdf, other

    cs.CV

    REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge

    Authors: Siyang Song, Micol Spitale, Cheng Luo, Cristina Palmero, German Barquero, Hengde Zhu, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes

    Abstract: In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previous… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    MSC Class: 68T40

  32. arXiv:2312.13377  [pdf, other

    cs.CV

    SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization

    Authors: David Pujol-Perich, Albert Clapés, Sergio Escalera

    Abstract: Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications. These scenarios, despite realistic, are often neglected in the literature, exposing these solutions to important performance degradation. In this work, we tackle this issue by introducing, for the first time, an appr… ▽ More

    Submitted 22 February, 2025; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted to WACV 2025

  33. arXiv:2312.05840  [pdf, other

    cs.LG math.AT

    Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

    Authors: Rubén Ballester, Carles Casacuberta, Sergio Escalera

    Abstract: This survey provides a comprehensive exploration of applications of Topological Data Analysis (TDA) within neural network analysis. Using TDA tools such as persistent homology and Mapper, we delve into the intricate structures and behaviors of neural networks and their datasets. We discuss different strategies to obtain topological information from data and neural networks by means of TDA. Additio… ▽ More

    Submitted 3 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: 70 pages, 7 figures. 4 references added. Minor changes in the text. Part of generative models reestructured to improve generality and clarity of exposition

    MSC Class: 62R40; 55N31; 68T07 ACM Class: I.2.6

  34. arXiv:2311.05567  [pdf, other

    cs.CV cs.HC cs.LG

    Exploring Emotion Expression Recognition in Older Adults Interacting with a Virtual Coach

    Authors: Cristina Palmero, Mikel deVelasco, Mohamed Amine Hmani, Aymen Mtibaa, Leila Ben Letaifa, Pau Buch-Cardona, Raquel Justo, Terry Amorese, Eduardo González-Fraile, Begoña Fernández-Ruanova, Jofre Tenorio-Laranga, Anna Torp Johansen, Micaela Rodrigues da Silva, Liva Jenny Martinussen, Maria Stylianou Korsnes, Gennaro Cordasco, Anna Esposito, Mounim A. El-Yacoubi, Dijana Petrovska-Delacrétaz, M. Inés Torres, Sergio Escalera

    Abstract: The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition m… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  35. arXiv:2311.02700  [pdf, other

    cs.CV

    A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping

    Authors: Hunor Laczkó, Meysam Madadi, Sergio Escalera, Jordi Gonzalez

    Abstract: RGB cloth generation has been deeply studied in the related literature, however, 3D garment generation remains an open problem. In this paper, we build a conditional variational autoencoder for 3D garment generation and draping. We propose a pyramid network to add garment details progressively in a canonical space, i.e. unposing and unshaping the garments w.r.t. the body. We study conditioning the… ▽ More

    Submitted 15 January, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: WACV24, IEEE copyright

  36. ICML 2023 Topological Deep Learning Challenge : Design and Results

    Authors: Mathilde Papillon, Mustafa Hajij, Helen Jenne, Johan Mathe, Audun Myers, Theodore Papamarkou, Tolga Birdal, Tamal Dey, Tim Doster, Tegan Emerson, Gurusankar Gopalakrishnan, Devendra Govil, Aldo Guzmán-Sáenz, Henry Kvinge, Neal Livesay, Soham Mukherjee, Shreyas N. Samaga, Karthikeyan Natesan Ramamurthy, Maneel Reddy Karri, Paul Rosen, Sophia Sanborn, Robin Walters, Jens Agerberg, Sadrodin Barikbin, Claudio Battiloro , et al. (31 additional authors not shown)

    Abstract: This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The chal… ▽ More

    Submitted 18 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  37. SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  38. arXiv:2308.04870  [pdf, other

    cs.LG math.AT stat.ML

    Decorrelating neurons using persistence

    Authors: Rubén Ballester, Carles Casacuberta, Sergio Escalera

    Abstract: We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 15 pages, 4 figures

    MSC Class: 55N31; 68T07 ACM Class: I.2.6

  39. arXiv:2308.04657  [pdf, other

    cs.CV

    Which Tokens to Use? Investigating Token Reduction in Vision Transformers

    Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

    Abstract: Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 NIVT Workshop. Project webpage https://vap.aau.dk/tokens

  40. arXiv:2307.14768  [pdf, other

    cs.CV

    Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining

    Authors: Benjia Zhou, Zhigang Chen, Albert Clapés, Jun Wan, Yanyan Liang, Sergio Escalera, Zhen Lei, Du Zhang

    Abstract: Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation, i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT). However, the scarcity of… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV'23

  41. arXiv:2306.14658  [pdf, other

    cs.CV cs.AI cs.LG

    Beyond AUROC & co. for evaluating out-of-distribution detection performance

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: While there has been a growing research interest in developing out-of-distribution (OOD) detection methods, there has been comparably little discussion around how these methods should be evaluated. Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. In this work, we take a closer look at the go-t… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: published in SAIAD CVPRW'23 (Safe Artificial Intelligence for All Domains CVPR workshop)

  42. arXiv:2306.06583  [pdf, other

    cs.CV

    REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge

    Authors: Siyang Song, Micol Spitale, Cheng Luo, German Barquero, Cristina Palmero, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes

    Abstract: The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the firs… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    MSC Class: 68T40

  43. arXiv:2304.07580  [pdf, other

    cs.CV

    Surveillance Face Presentation Attack Detection Challenge

    Authors: Hao Fang, Ajian Liu, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Zhen Lei

    Abstract: Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, most of the studies lacked consideration of long-distance scenarios. Specifically, compared with FAS in traditional scenes such as phone unlocking, face payment, and self-service security inspection, FAS in long-distance such as station squares, parks, and self-service supermarkets are… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: 8 pages, 7 figures

  44. arXiv:2304.05753  [pdf, other

    cs.CV cs.AI

    Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results

    Authors: Dong Wang, Jia Guo, Qiqi Shao, Haochi He, Zhian Chen, Chuanbao Xiao, Ajian Liu, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Jun Wan, Jiankang Deng

    Abstract: Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets, which often leads to overfitting during trainin… ▽ More

    Submitted 4 May, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW2023

  45. arXiv:2303.08639  [pdf, other

    cs.CV

    Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images

    Authors: Hugo Bertiche, Niloy J. Mitra, Kuldeep Kulkarni, Chun-Hao Paul Huang, Tuanfeng Y. Wang, Meysam Madadi, Sergio Escalera, Duygu Ceylan

    Abstract: Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We inv… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  46. arXiv:2302.08909  [pdf, other

    cs.CV

    Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification

    Authors: Ihsan Ullah, Dustin Carrión-Ojeda, Sergio Escalera, Isabelle Guyon, Mike Huisman, Felix Mohr, Jan N van Rijn, Haozhe Sun, Joaquin Vanschoren, Phan Anh Vu

    Abstract: We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks., NeurIPS, Nov 2022, New Orleans, United States

  47. arXiv:2301.00975  [pdf, other

    cs.CV

    Surveillance Face Anti-spoofing

    Authors: Hao Fang, Ajian Liu, Jun Wan, Sergio Escalera, Chenxu Zhao, Xu Zhang, Stan Z. Li, Zhen Lei

    Abstract: Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveilla… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: 15 pages, 9 figures

  48. arXiv:2212.11220  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Cloth Simulation

    Authors: Hugo Bertiche, Meysam Madadi, Sergio Escalera

    Abstract: We present a general framework for the garment animation problem through unsupervised deep learning inspired in physically based simulation. Existing trends in the literature already explore this possibility. Nonetheless, these approaches do not handle cloth dynamics. Here, we propose the first methodology able to learn realistic cloth dynamics unsupervisedly, and henceforth, a general formulation… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Journal ref: Neural Cloth Simulation. ACM Trans. Graph. 41, 6, Article 220 (December 2022), 14 pages

  49. arXiv:2212.08568  [pdf, other

    cs.CV cs.LG

    Biomedical image analysis competitions: The state of current participation practice

    Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

    Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More

    Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  50. arXiv:2211.14304  [pdf, other

    cs.CV

    BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction

    Authors: German Barquero, Sergio Escalera, Cristina Palmero

    Abstract: Stochastic human motion prediction (HMP) has generally been tackled with generative adversarial networks and variational autoencoders. Most prior works aim at predicting highly diverse movements in terms of the skeleton joints' dispersion. This has led to methods predicting fast and motion-divergent movements, which are often unrealistic and incoherent with past motion. Such methods also neglect c… ▽ More

    Submitted 2 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: ICCV 2023 Camera-ready version. Project page: https://barquerogerman.github.io/BeLFusion/

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023