Skip to main content

Showing 1–50 of 64 results for author: Brémond, F

.
  1. arXiv:2506.01373  [pdf, ps, other

    cs.CV

    No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond

    Authors: Tomasz Stanczyk, Seongro Yoon, Francois Bremond

    Abstract: Multi-object tracking (MOT) is essential for sports analytics, enabling performance evaluation and tactical insights. However, tracking in sports is challenging due to fast movements, occlusions, and camera shifts. Traditional tracking-by-detection methods require extensive tuning, while segmentation-based approaches struggle with track processing. We propose McByte, a tracking-by-detection framew… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2505.18175  [pdf, ps, other

    eess.SP cs.AI cs.CV cs.HC cs.LG

    Evaluation in EEG Emotion Recognition: State-of-the-Art Review and Unified Framework

    Authors: Natia Kukhilava, Tatia Tsmindashvili, Rapael Kalandadze, Anchit Gupta, Sofio Katamadze, François Brémond, Laura M. Ferrari, Philipp Müller, Benedikt Emanuel Wirth

    Abstract: Electroencephalography-based Emotion Recognition (EEG-ER) has become a growing research area in recent years. Analyzing 216 papers published between 2018 and 2023, we uncover that the field lacks a unified evaluation protocol, which is essential to fairly define the state of the art, compare new approaches and to track the field's progress. We report the main inconsistencies between the used evalu… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  3. arXiv:2505.13123  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Just Dance with $π$! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection

    Authors: Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Egor Bondarev, Francois Bremond

    Abstract: Weakly-supervised methods for video anomaly detection (VAD) are conventionally based merely on RGB spatio-temporal features, which continues to limit their reliability in real-world scenarios. This is due to the fact that RGB-features are not sufficiently distinctive in setting apart categories such as shoplifting from visually similar events. Therefore, towards robust complex real-world VAD, it i… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  4. arXiv:2502.03459  [pdf, other

    cs.CV

    SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living

    Authors: Arkaprava Sinha, Dominick Reilly, Francois Bremond, Pu Wang, Srijan Das

    Abstract: The introduction of vision-language models like CLIP has enabled the development of foundational video models capable of generalizing to unseen videos and human actions. However, these models are typically trained on web videos, which often fail to capture the challenges present in Activities of Daily Living (ADL) videos. Existing works address ADL-specific challenges, such as similar appearances,… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  5. arXiv:2502.00654  [pdf, other

    cs.CV

    EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis

    Authors: Junuk Cha, Seongro Yoon, Valeriya Strizhkova, Francois Bremond, Seungryul Baek

    Abstract: 3D Gaussian splatting-based talking head synthesis has recently gained attention for its ability to render high-fidelity images with real-time inference speed. However, since it is typically trained on only a short video that lacks the diversity in facial emotions, the resultant talking heads struggle to represent a wide range of emotions. To address this issue, we propose a lip-aligned emotional… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: 22 pages

  6. arXiv:2501.03332  [pdf, other

    cs.CV

    CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets

    Authors: Tanay Agrawal, Mohammed Guermal, Michal Balazia, Francois Bremond

    Abstract: Challenges in cross-learning involve inhomogeneous or even inadequate amount of training data and lack of resources for retraining large pretrained models. Inspired by transfer learning techniques in NLP, adapters and prefix tuning, this paper presents a new model-agnostic plugin architecture for cross-learning, called CM3T, that adapts transformer-based models to new or missing information. We in… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Preprint. Final paper accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, February, 2025. 10 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  7. arXiv:2501.03103  [pdf, other

    cs.CV

    MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

    Authors: Valeriya Strizhkova, Hadi Kachmar, Hava Chaptoukaev, Raphael Kalandadze, Natia Kukhilava, Tatia Tsmindashvili, Nibras Abo-Alzahab, Maria A. Zuluaga, Michal Balazia, Antitza Dantcheva, François Brémond, Laura Ferrari

    Abstract: Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differ… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Preprint. Final paper accepted at Affective Behavior Analysis in-the-Wild (ABAW) at IEEE/CVF European Conference on Computer Vision (ECCV), Milan, September, 2024. 17 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  8. arXiv:2501.02618  [pdf, other

    cs.CV

    Identifying Surgical Instruments in Pedagogical Cataract Surgery Videos through an Optimized Aggregation Network

    Authors: Sanya Sinha, Michal Balazia, Francois Bremond

    Abstract: Instructional cataract surgery videos are crucial for ophthalmologists and trainees to observe surgical details repeatedly. This paper presents a deep learning model for real-time identification of surgical instruments in these videos, using a custom dataset scraped from open-access sources. Inspired by the architecture of YOLOV9, the model employs a Programmable Gradient Information (PGI) mechani… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: Preprint. Full paper accepted at the IEEE International Conference on Image Processing Applications and Systems (IPAS), Lyon, France, Jan 2025. 6 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  9. Anti-Forgetting Adaptation for Unsupervised Person Re-identification

    Authors: Hao Chen, Francois Bremond, Nicu Sebe, Shiliang Zhang

    Abstract: Regular unsupervised domain adaptive person re-identification (ReID) focuses on adapting a model from a source domain to a fixed target domain. However, an adapted ReID model can hardly retain previously-acquired knowledge and generalize to unseen data. In this paper, we propose a Dual-level Joint Adaptation and Anti-forgetting (DJAA) framework, which incrementally adapts a model to new domains wi… ▽ More

    Submitted 11 April, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: Accepted to TPAMI

  10. arXiv:2411.02065  [pdf, other

    cs.CV

    AM Flow: Adapters for Temporal Processing in Action Recognition

    Authors: Tanay Agrawal, Abid Ali, Antitza Dantcheva, Francois Bremond

    Abstract: Deep learning models, in particular \textit{image} models, have recently gained generalisability and robustness. %are becoming more general and robust by the day. In this work, we propose to exploit such advances in the realm of \textit{video} classification. Video foundation models suffer from the requirement of extensive pretraining and a large training time. Towards mitigating such limitations,… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  11. arXiv:2410.17149  [pdf, other

    cs.CV

    Are Visual-Language Models Effective in Action Recognition? A Comparative Study

    Authors: Mahmoud Ali, Di Yang, François Brémond

    Abstract: Current vision-language foundation models, such as CLIP, have recently shown significant improvement in performance across various downstream tasks. However, whether such foundation models significantly improve more complex fine-grained action recognition tasks is still an open question. To answer this question and better find out the future research direction on human behavior analysis in-the-wil… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  12. arXiv:2409.20270  [pdf, other

    cs.CV

    Loose Social-Interaction Recognition in Real-world Therapy Scenarios

    Authors: Abid Ali, Rui Dai, Ashish Marisetty, Guillaume Astruc, Monique Thonnat, Jean-Marc Odobez, Susanne Thümmler, Francois Bremond

    Abstract: The computer vision community has explored dyadic interactions for atomic actions such as pushing, carrying-object, etc. However, with the advancement in deep learning models, there is a need to explore more complex dyadic situations such as loose interactions. These are interactions where two people perform certain atomic activities to complete a global action irrespective of temporal synchronisa… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Journal ref: IEEE/CVF Winter Conference on Applications of Computer Vision 2025

  13. arXiv:2409.14220  [pdf, other

    cs.CV

    Temporally Propagated Masks and Bounding Boxes: Combining the Best of Both Worlds for Multi-Object Tracking

    Authors: Tomasz Stanczyk, Francois Bremond

    Abstract: Multi-object tracking (MOT) involves identifying and consistently tracking objects across video sequences. Traditional tracking-by-detection methods, while effective, often require extensive tuning and lack generalizability. On the other hand, segmentation mask-based methods are more generic but struggle with tracking management, making them unsuitable for MOT. We propose a novel approach, McByte,… ▽ More

    Submitted 22 November, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

  14. arXiv:2409.04205  [pdf, other

    cs.CV

    Introducing Gating and Context into Temporal Action Detection

    Authors: Aglind Reka, Diana Laura Borza, Dominick Reilly, Michal Balazia, Francois Bremond

    Abstract: Temporal Action Detection (TAD), the task of localizing and classifying actions in untrimmed video, remains challenging due to action overlaps and variable action durations. Recent findings suggest that TAD performance is dependent on the structural design of transformers rather than on the self-attention mechanism. Building on this insight, we propose a refined feature extraction process through… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Accepted for publication at the ECCV 2024 ABAW Workshop

  15. MultiMediate'24: Multi-Domain Engagement Estimation

    Authors: Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Anna Penzkofer, Dominik Schiller, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

    Abstract: Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.08256

  16. arXiv:2408.05562  [pdf, other

    cs.CV

    What Matters in Autonomous Driving Anomaly Detection: A Weakly Supervised Horizon

    Authors: Utkarsh Tiwari, Snehashis Majhi, Michal Balazia, François Brémond

    Abstract: Video anomaly detection (VAD) in autonomous driving scenario is an important task, however it involves several challenges due to the ego-centric views and moving camera. Due to this, it remains largely under-explored. While recent developments in weakly-supervised VAD methods have shown remarkable progress in detecting critical real-world anomalies in static camera scenario, the development and va… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  17. arXiv:2407.09159  [pdf, other

    cs.CV

    Weakly-supervised Autism Severity Assessment in Long Videos

    Authors: Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, Séverine Dubuisson, Francois Bremond, Susanne Thümmler

    Abstract: Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-tempor… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Journal ref: https://cbmi2024.org/

  18. arXiv:2406.09390  [pdf, other

    cs.CV cs.LG

    LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

    Authors: Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das

    Abstract: Current Large Language Vision Models (LLVMs) trained on web videos perform well in general video understanding but struggle with fine-grained details, complex human-object interactions (HOI), and view-invariant representation learning essential for Activities of Daily Living (ADL). This limitation stems from a lack of specialized ADL video instruction-tuning datasets and insufficient modality inte… ▽ More

    Submitted 25 March, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: CVPR 2025 Camera Ready

  19. arXiv:2311.02432  [pdf, other

    cs.CV

    P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification

    Authors: Abid Ali, Ashish Marisetty, Francois Bremond

    Abstract: Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Journal ref: WACV 2024

  20. arXiv:2309.06130  [pdf, other

    cs.CV cs.AI

    JOADAA: joint online action detection and action anticipation

    Authors: Mohammed Guermal, Francois Bremond, Rui Dai, Abid Ali

    Abstract: Action anticipation involves forecasting future actions by connecting past events to future ones. However, this reasoning ignores the real-life hierarchy of events which is considered to be composed of three main parts: past, present, and future. We argue that considering these three main parts and their dependencies could improve performance. On the other hand, online action detection is the task… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  21. arXiv:2309.00696  [pdf, other

    cs.CV

    AAN: Attributes-Aware Network for Temporal Action Detection

    Authors: Rui Dai, Srijan Das, Michael S. Ryoo, Francois Bremond

    Abstract: The challenge of long-term video understanding remains constrained by the efficient extraction of object semantics and the modelling of their relationships for downstream tasks. Although the CLIP visual features exhibit discriminative properties for various vision tasks, particularly in object encoding, they are suboptimal for long-term video understanding. To address this issue, we present the At… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  22. arXiv:2308.14500  [pdf, other

    cs.CV

    LAC: Latent Action Composition for Skeleton-based Action Segmentation

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions. However, their performances remain limited as the visual features cannot sufficiently express composable actions. In thi… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  23. MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions

    Authors: Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Dominik Schiller, Mohammed Guermal, Dominike Thomas, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

    Abstract: Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes t… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: ACM MultiMedia'23

  24. arXiv:2305.06437  [pdf, other

    cs.CV cs.AI

    Self-Supervised Video Representation Learning via Latent Time Navigation

    Authors: Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time. This leads to loss of pertinent information related to temporal relationships, rendering actions such as `enter' and `leave' to be indistinguishable. To mitigate this limitation, we propose Latent Time Navigation (LTN), a… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: AAAI 2023

  25. arXiv:2301.07923  [pdf

    cs.CV

    Human-Scene Network: A Novel Baseline with Self-rectifying Loss for Weakly supervised Video Anomaly Detection

    Authors: Snehashis Majhi, Rui Dai, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Video anomaly detection in surveillance systems with only video-level labels (i.e. weakly-supervised) is challenging. This is due to, (i) the complex integration of human and scene based anomalies comprising of subtle and sharp spatio-temporal cues in real-world scenarios, (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we propose a Human-S… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  26. Learning Invariance from Generated Variance for Unsupervised Person Re-identification

    Authors: Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond

    Abstract: This work focuses on unsupervised representation learning in person re-identification (ReID). Recent self-supervised contrastive learning methods learn invariance by maximizing the representation similarity between two augmented views of a same image. However, traditional data augmentation may bring to the fore undesirable distortions on identity features, which is not always favorable in id-sensi… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: Extension of conference paper arXiv:2012.09071. Accepted to TPAMI. Project page: https://github.com/chenhao2345/GCL-extended

  27. arXiv:2212.03968  [pdf, other

    cs.CV

    Multimodal Vision Transformers with Forced Attention for Behavior Analysis

    Authors: Tanay Agrawal, Michal Balazia, Philipp Müller, François Brémond

    Abstract: Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transfor… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Preprint. Full paper accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, Jan 2023. 11 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  28. arXiv:2209.00065  [pdf, other

    cs.CV

    ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings. When dealing with estimated skeleton data in real-world videos, such methods perform poorly due to the large variations across subjects and camera viewpoints. To address this issue, we introduce ViA, a novel View-In… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: project website: https://walker-a11y.github.io/ViA-project

  29. arXiv:2208.09191  [pdf, other

    cs.CV

    Synthetic Data in Human Analysis: A Survey

    Authors: Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch, Francois Bremond, Antitza Dantcheva

    Abstract: Deep neural networks have become prevalent in human analysis, boosting the performance of applications, such as biometric recognition, action recognition, as well as person re-identification. However, the performance of such networks scales with the available training data. In human analysis, the demand for large-scale datasets poses a severe challenge, as data collection is tedious, time-expensiv… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  30. Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation

    Authors: Michal Balazia, Philipp Müller, Ákos Levente Tánczos, August von Liechtenstein, François Brémond

    Abstract: Body language is an eye-catching social signal and its automatic analysis can significantly advance artificial intelligence systems to understand and actively participate in social interactions. While computer vision has made impressive progress in low-level tasks like head and body pose estimation, the detection of more subtle behaviors such as gesturing, grooming, or fumbling is not well explore… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: Preprint. Full paper accepted at the ACM International Conference on Multimedia (ACMMM), Lisbon, Portugal, October 2022. 10 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  31. arXiv:2204.09468  [pdf, other

    cs.CV

    THORN: Temporal Human-Object Relation Network for Action Recognition

    Authors: Mohammed Guermal, Rui Dai, Francois Bremond

    Abstract: Most action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network:… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  32. arXiv:2203.09043  [pdf, other

    cs.CV

    Latent Image Animator: Learning to Animate Images via Latent Space Navigation

    Authors: Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva

    Abstract: Due to the remarkable progress of deep generative models, animating images has become increasingly efficient, whereas associated results have become increasingly realistic. Current animation-approaches commonly exploit structure representation extracted from driving videos. Such structure representation is instrumental in transferring motion from driving videos to still images. However, such appro… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: ICLR 2022, project link https://wyhsirius.github.io/LIA-project

  33. arXiv:2203.06468  [pdf, other

    cs.CV

    Unsupervised Lifelong Person Re-identification via Contrastive Rehearsal

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: Existing unsupervised person re-identification (ReID) methods focus on adapting a model trained on a source domain to a fixed target domain. However, an adapted ReID model usually only works well on a certain target domain, but can hardly memorize the source domain knowledge and generalize to upcoming unseen data. In this paper, we propose unsupervised lifelong person ReID, which focuses on contin… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.

  34. Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

    Authors: Tanay Agrawal, Dhruv Agarwal, Michal Balazia, Neelabh Sinha, Francois Bremond

    Abstract: Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, w… ▽ More

    Submitted 12 January, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Preprint. Final paper accepted at the 17th International Conference on Computer Vision Theory and Applications (VISAPP), virtual, February, 2022. 8 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  35. arXiv:2112.03902  [pdf, other

    cs.CV

    MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

    Authors: Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond

    Abstract: Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. The temporal relation is complex in those datasets, including challenges like composite action, and co-occurring action. For detecting actions in those complex videos, efficiently capturing both short-term and long-term temporal information in the video is critical. To this end, we… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: Accepted in CVPR 2022

  36. arXiv:2110.13473  [pdf, other

    cs.CV cs.AI

    CTRN: Class-Temporal Relational Network for Action Detection

    Authors: Rui Dai, Srijan Das, Francois Bremond

    Abstract: Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. There are many real-world challenges in those datasets, such as composite action, co-occurring action, and high temporal variation of instance duration. For handling these challenges, we propose to explore both the class and temporal relations of detected actions. In this work, we i… ▽ More

    Submitted 11 July, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  37. arXiv:2110.08270  [pdf, other

    cs.LG cs.CL

    From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

    Authors: Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, François Bremond

    Abstract: Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modali… ▽ More

    Submitted 19 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 10 pages

  38. FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

    Authors: Neelabh Sinha, Michal Balazia, Francois Bremond

    Abstract: 3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way o… ▽ More

    Submitted 7 December, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), virtual, November 2021. 8 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  39. arXiv:2108.08996  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Joint Anomaly Detection and Classification

    Authors: Snehashis Majhi, Srijan Das, Francois Bremond, Ratnakar Dash, Pankaj Kumar Sa

    Abstract: Anomaly activities such as robbery, explosion, accidents, etc. need immediate actions for preventing loss of human life and property in real world surveillance systems. Although the recent automation in surveillance systems are capable of detecting the anomalies, but they still need human efforts for categorizing the anomalies and taking necessary preventive actions. This is due to the lack of met… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: Provisionally accepted in the first round of FG 2021

  40. arXiv:2108.03619  [pdf

    cs.CV

    Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

    Authors: Rui Dai, Srijan Das, Francois Bremond

    Abstract: In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only categorizing actions, but also localizing them in untrimmed videos. Therefore, transferring knowledge pertaining to temporal relations is critical for this task which is m… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

  41. arXiv:2107.08580  [pdf, other

    cs.CV

    UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Action recognition based on skeleton data has recently witnessed increasing attention and progress. State-of-the-art approaches adopting Graph Convolutional networks (GCNs) can effectively extract features on human skeletons relying on the pre-defined human topology. Despite associated progress, GCN-based methods have difficulties to generalize across domains, especially with different human topol… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: Code is available at: https://github.com/YangDi666/UNIK

  42. arXiv:2105.08141  [pdf, other

    cs.CV cs.AI

    VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

    Authors: Srijan Das, Rui Dai, Di Yang, Francois Bremond

    Abstract: Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: submitted to a journal

  43. arXiv:2104.04546  [pdf, other

    eess.SP cs.LG stat.AP

    One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

    Authors: Laura M. Ferrari, Guy Abi Hanna, Paolo Volpe, Esma Ismailova, François Bremond, Maria A. Zuluaga

    Abstract: A limiting factor towards the wide routine use of wearables devices for continuous healthcare monitoring is their cumbersome and obtrusive nature. This is particularly true for electroencephalography (EEG) recordings, which require the placement of multiple electrodes in contact with the scalp. In this work, we propose to identify the optimal wearable EEG electrode set-up, in terms of minimal numb… ▽ More

    Submitted 19 May, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

  44. arXiv:2103.16364  [pdf, other

    cs.CV

    ICE: Inter-instance Contrastive Encoding for Unsupervised Person Re-identification

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: Unsupervised person re-identification (ReID) aims at learning discriminative identity features without annotations. Recently, self-supervised contrastive learning has gained increasing attention for its effectiveness in unsupervised representation learning. The main idea of instance contrastive learning is to match a same instance in different augmented views. However, the relationship between dif… ▽ More

    Submitted 18 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  45. How Unique Is a Face: An Investigative Study

    Authors: Michal Balazia, S L Happy, Francois Bremond, Antitza Dantcheva

    Abstract: Face recognition has been widely accepted as a means of identification in applications ranging from border control to security in the banking sector. Surprisingly, while widely accepted, we still lack the understanding of uniqueness or distinctiveness of faces as biometric modality. In this work, we study the impact of factors such as image resolution, feature representation, database size, age an… ▽ More

    Submitted 7 December, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: Preprint. Full paper accepted at the IEEE/IAPR International Conference on Pattern Recognition (ICPR), Milan, Italy, January 2021. 6 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  46. arXiv:2101.03049  [pdf, other

    cs.CV

    InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation

    Authors: Yaohui Wang, Francois Bremond, Antitza Dantcheva

    Abstract: In this work, we introduce an unconditional video generative model, InMoDeGAN, targeted to (a) generate high quality videos, as well as to (b) allow for interpretation of the latent space. For the latter, we place emphasis on interpreting and manipulating motion. Towards this, we decompose motion into semantic sub-spaces, which allow for control of generated samples. We design the architecture of… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: Please visit https://wyhsirius.github.io/InMoDeGAN/ for introductions and more

  47. arXiv:2012.09071  [pdf, other

    cs.CV

    Joint Generative and Contrastive Learning for Unsupervised Person Re-identification

    Authors: Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond

    Abstract: Recent self-supervised contrastive learning provides an effective approach for unsupervised person re-identification (ReID) by learning invariance from different views (transformed versions) of an input. In this paper, we incorporate a Generative Adversarial Network (GAN) and a contrastive learning module into one joint training framework. While the GAN provides online data augmentation for contra… ▽ More

    Submitted 30 March, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: CVPR 2021. Source code: https://github.com/chenhao2345/GCL

  48. arXiv:2011.13776  [pdf, other

    cs.CV

    Enhancing Diversity in Teacher-Student Networks via Asymmetric branches for Unsupervised Person Re-identification

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: The objective of unsupervised person re-identification (Re-ID) is to learn discriminative features without labor-intensive identity annotations. State-of-the-art unsupervised Re-ID methods assign pseudo labels to unlabeled images in the target domain and learn from these noisy pseudo labels. Recently introduced Mean Teacher Model is a promising way to mitigate the label noise. However, during the… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: WACV 2021

  49. arXiv:2011.05358  [pdf, other

    cs.CV

    Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

    Authors: Di Yang, Rui Dai, Yaohui Wang, Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond

    Abstract: Taking advantage of human pose data for understanding human activities has attracted much attention these days. However, state-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to occlusion, truncation and low-resolution in real-world un-annotated videos. Hence, in this work, we propose 1) a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refin… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: WACV2021

  50. arXiv:2010.14982  [pdf

    cs.CV

    Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection

    Authors: Rui Dai, Srijan Das, Saurav Sharma, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, Gianpiero Francesca

    Abstract: Designing activity detection systems that can be successfully deployed in daily-living environments requires datasets that pose the challenges typical of real-world scenarios. In this paper, we introduce a new untrimmed daily-living dataset that features several real-world challenges: Toyota Smarthome Untrimmed (TSU). TSU contains a wide variety of activities performed in a spontaneous manner. The… ▽ More

    Submitted 10 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Toyota Smarthome Untrimmed dataset, project page: https://project.inria.fr/toyotasmarthome