Skip to main content

Showing 1–13 of 13 results for author: Plizzari, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08553  [pdf, other

    cs.CV

    From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge

    Authors: Agnese Taluzzi, Davide Gesualdi, Riccardo Santambrogio, Chiara Plizzari, Francesca Palermo, Simone Mentasti, Matteo Matteucci

    Abstract: This report presents SceneNet and KnowledgeNet, our approaches developed for the HD-EPIC VQA Challenge 2025. SceneNet leverages scene graphs generated with a multi-modal large language model (MLLM) to capture fine-grained object interactions, spatial relationships, and temporally grounded events. In parallel, KnowledgeNet incorporates ConceptNet's external commonsense knowledge to introduce high-l… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Technical report for the HD-EPIC VQA Challenge 2025 (1st place)

  2. arXiv:2503.13646  [pdf, other

    cs.CV

    Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

    Authors: Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari

    Abstract: Understanding fine-grained temporal dynamics is crucial in egocentric videos, where continuous streams capture frequent, close-up interactions with objects. In this work, we bring to light that current egocentric video question-answering datasets often include questions that can be answered using only few frames or commonsense reasoning, without being necessarily grounded in the actual video. Our… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025. Dataset and code are available at https://github.com/google-research-datasets/egotempo.git

  3. arXiv:2404.05072  [pdf, other

    cs.CV

    Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

    Authors: Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen

    Abstract: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of their sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera.… ▽ More

    Submitted 21 January, 2025; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted at 3DV 2025. 14 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/

  4. arXiv:2308.07123  [pdf, other

    cs.CV

    An Outlook into the Future of Egocentric Vision

    Authors: Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi

    Abstract: What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through e… ▽ More

    Submitted 7 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1w

  5. arXiv:2307.12837  [pdf, other

    cs.CV cs.AI cs.RO

    EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge: Mixed Sequences Prediction

    Authors: Amirshayan Nasirimajd, Simone Alberto Peirone, Chiara Plizzari, Barbara Caputo

    Abstract: This report presents the technical details of our approach for the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. Our approach is based on the idea that the order in which actions are performed is similar between the source and target domains. Based on this, we generate a modified sequence by randomly combining actions from the source and target domains. As… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 2nd place in the 2023 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

  6. arXiv:2306.08713  [pdf, other

    cs.CV

    What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

    Authors: Chiara Plizzari, Toby Perrett, Barbara Caputo, Dima Damen

    Abstract: We propose and address a new generalisation problem: can a model trained for action recognition successfully classify actions when they are performed within a previously unseen scenario and in a previously unseen location? To answer this question, we introduce the Action Recognition Generalisation Over scenarios and locations dataset (ARGO1M), which contains 1.1M video clips from the large-scale E… ▽ More

    Submitted 24 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at ICCV 2023. Project page: https://chiaraplizz.github.io/what-can-a-cook/

  7. arXiv:2112.03596  [pdf, other

    cs.CV

    E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

    Authors: Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo

    Abstract: Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of "events". Due to their sensing mechanism, event cameras have little to no motion blur, a very high temporal resolution and require significantly less power and memory than traditional frame-based cameras. These characteristics make them a perfect fit to several real-world applica… ▽ More

    Submitted 3 April, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: To be presented at CVPR2022

  8. arXiv:2110.10101  [pdf, other

    cs.CV

    Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition

    Authors: Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo

    Abstract: First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios,… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted at WACV 2022. arXiv admin note: substantial text overlap with arXiv:2106.01689

  9. arXiv:2107.00337  [pdf, other

    cs.CV

    PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

    Authors: Chiara Plizzari, Mirco Planamente, Emanuele Alberti, Barbara Caputo

    Abstract: In this report, we describe the technical details of our submission to the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. To tackle the domain-shift which exists under the UDA setting, we first exploited a recent Domain Generalization (DG) technique, called Relative Norm Alignment (RNA). It consists in designing a model able to generalize well to any unseen… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: 3rd place in the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

  10. arXiv:2106.01689  [pdf, other

    cs.CV

    Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment

    Authors: Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo

    Abstract: First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic environmental bias. This strongly affects the ability to generalize to unseen scenarios, limitin… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: 11 pages, 7 figures

  11. arXiv:2103.12768  [pdf, other

    cs.CV

    DA4Event: towards bridging the Sim-to-Real Gap for Event Cameras using Domain Adaptation

    Authors: Mirco Planamente, Chiara Plizzari, Marco Cannici, Marco Ciccone, Francesco Strada, Andrea Bottino, Matteo Matteucci, Barbara Caputo

    Abstract: Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of "events". The innovative way they acquire data presents several advantages over standard devices, especially in poor lighting and high-speed motion conditions. However, the novelty of these sensors results in the lack of a large amount of training data capable of fully unlocking… ▽ More

    Submitted 29 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted at IROS21

  12. Spatial Temporal Transformer Network for Skeleton-based Action Recognition

    Authors: Chiara Plizzari, Marco Cannici, Matteo Matteucci

    Abstract: Skeleton-based human action recognition has achieved a great interest in recent years, as skeleton data has been demonstrated to be robust to illumination changes, body scales, dynamic camera views, and complex background. Nevertheless, an effective encoding of the latent information underlying the 3D skeleton is still an open problem. In this work, we propose a novel Spatial-Temporal Transformer… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: Accepted as ICPRW2020 (FBE2020, Workshop on Facial and Body Expressions, micro-expressions and behavior recognition) 8 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2008.07404

    Journal ref: Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science, Springer, vol 12663, 694-701, ISBN: 978-3-030-68796-0

  13. Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks

    Authors: Chiara Plizzari, Marco Cannici, Matteo Matteucci

    Abstract: Skeleton-based Human Activity Recognition has achieved great interest in recent years as skeleton data has demonstrated being robust to illumination changes, body scales, dynamic camera views, and complex background. In particular, Spatial-Temporal Graph Convolutional Networks (ST-GCN) demonstrated to be effective in learning both spatial and temporal dependencies on non-Euclidean data such as ske… ▽ More

    Submitted 22 June, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: Accepted at Computer Vision and Image Understanding (CVIU) 12 pages, 8 figures

    Journal ref: Computer Vision and Image Understanding, Volumes 208-209 (2021), 103219, ISSN 1077-3142