Skip to main content

Showing 1–46 of 46 results for author: Furnari, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16450  [pdf, ps, other

    cs.CV

    How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?

    Authors: Giuseppe Lando, Rosario Forte, Giovanni Maria Farinella, Antonino Furnari

    Abstract: We investigate whether off-the-shelf Multimodal Large Language Models (MLLMs) can tackle Online Episodic-Memory Video Question Answering (OEM-VQA) without additional training. Our pipeline converts a streaming egocentric video into a lightweight textual memory, only a few kilobytes per minute, via an MLLM descriptor module, and answers multiple-choice questions by querying this memory with an LLM… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  2. arXiv:2506.05787  [pdf, ps, other

    cs.CV

    EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

    Authors: Ivan Rodin, Tz-Ying Wu, Kyle Min, Sharath Nittur Sridhar, Antonino Furnari, Subarna Tripathi, Giovanni Maria Farinella

    Abstract: We introduce EASG-Bench, a question-answering benchmark for egocentric videos where the question-answering pairs are created from spatio-temporally grounded dynamic scene graphs capturing intricate relationships among actors, actions, and objects. We propose a systematic evaluation framework and evaluate several language-only and video large language models (video-LLMs) on this benchmark. We obser… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2502.17753  [pdf, other

    cs.CV

    Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

    Authors: Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

    Abstract: We introduce a gradient-based approach for learning task graphs from procedural activities, improving over hand-crafted methods. Our method directly optimizes edge weights via maximum likelihood, enabling integration into neural architectures. We validate our approach on CaptainCook4D, EgoPER, and EgoProceL, achieving +14.5%, +10.2%, and +13.6% F1-score improvements. Our feature-based approach for… ▽ More

    Submitted 26 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: arXiv admin note: text overlap with arXiv:2406.01486

  4. arXiv:2411.16934  [pdf, ps, other

    cs.CV

    Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

    Authors: Zaira Manigrasso, Matteo Dunnhofer, Antonino Furnari, Moritz Nottebaum, Antonio Finocchiaro, Davide Marana, Giovanni Maria Farinella, Christian Micheloni

    Abstract: Episodic memory retrieval aims to enable wearable devices with the ability to recollect from past video observations objects or events that have been observed (e.g., "where did I last see my smartphone?"). Despite the clear relevance of the task for a wide range of assistive systems, current task formulations are based on the "offline" assumption that the full video history can be accessed when th… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  5. arXiv:2411.15557  [pdf, other

    cs.CV cs.AI cs.LG

    LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces

    Authors: Anxhelo Diko, Antonino Furnari, Luigi Cinque, Giovanni Maria Farinella

    Abstract: Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their dra… ▽ More

    Submitted 27 March, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

  6. arXiv:2411.02570  [pdf, ps, other

    cs.CV

    TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos

    Authors: Leonardo Plini, Luca Scofano, Edoardo De Matteis, Guido Maria D'Amely di Melendugno, Alessandro Flaborea, Andrea Sanchietti, Giovanni Maria Farinella, Fabio Galasso, Antonino Furnari

    Abstract: Identifying procedural errors online from egocentric videos is a critical yet challenging task across various domains, including manufacturing, healthcare, and skill-based training. The nature of such mistakes is inherently open-set, as unforeseen or novel errors may occur, necessitating robust detection systems that do not rely on prior examples of failure. Currently, however, no technique effect… ▽ More

    Submitted 4 July, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

  7. arXiv:2406.08379  [pdf, other

    cs.CV

    Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

    Authors: Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella

    Abstract: We address the challenge of unsupervised mistake detection in egocentric video of skilled human activities through the analysis of gaze signals. While traditional methods rely on manually labeled mistakes, our approach does not require mistake annotations, hence overcoming the need of domain-specific labeled data. Based on the observation that eye movements closely follow object manipulation activ… ▽ More

    Submitted 25 November, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.01486  [pdf, other

    cs.CV

    Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

    Authors: Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Procedural activities are sequences of key-steps aimed at achieving specific goals. They are crucial to build intelligent agents able to assist users effectively. In this context, task graphs have emerged as a human-understandable representation of procedural activities, encoding a partial ordering over the key-steps. While previous works generally relied on hand-crafted procedures to extract task… ▽ More

    Submitted 9 January, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

  9. arXiv:2406.01194  [pdf, other

    cs.CV

    AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

    Authors: Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise an… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  10. arXiv:2404.01933  [pdf, other

    cs.CV

    PREGO: online mistake detection in PRocedural EGOcentric videos

    Authors: Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso

    Abstract: Promptly identifying procedural errors from egocentric videos in an online setting is highly challenging and valuable for detecting mistakes as soon as they happen. This capability has a wide range of applications across various fields, such as manufacturing and healthcare. The nature of procedural mistakes is open-set since novel types of failures might occur, which calls for one-class classifier… ▽ More

    Submitted 17 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  11. arXiv:2312.03391  [pdf, other

    cs.CV

    Action Scene Graphs for Long-Form Understanding of Egocentric Videos

    Authors: Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella

    Abstract: We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how a… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  12. arXiv:2312.02672  [pdf, other

    cs.CV

    Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

    Authors: Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella

    Abstract: In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10%… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  13. arXiv:2312.02638  [pdf, other

    cs.CV

    Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs

    Authors: Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella

    Abstract: We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodolog… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 25 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Expanded manuscript (compared to arxiv v1 from Nov 2023 and CVPR 2024 paper from June 2024) for more comprehensive dataset and benchmark presentation, plus new results on v2 data release

  15. arXiv:2309.14809  [pdf, other

    cs.CV

    ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

    Authors: Francesco Ragusa, Rosario Leonardi, Michele Mazzamuto, Claudia Bonanno, Rosario Scavo, Antonino Furnari, Giovanni Maria Farinella

    Abstract: ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial do… ▽ More

    Submitted 27 November, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  16. arXiv:2308.07123  [pdf, other

    cs.CV

    An Outlook into the Future of Egocentric Vision

    Authors: Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi

    Abstract: What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through e… ▽ More

    Submitted 7 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1w

  17. Streaming egocentric action anticipation: An evaluation scheme and approach

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation aims to predict the future actions the camera wearer will perform from the observation of the past. While predictions about the future should be available before the predicted events take place, most approaches do not pay attention to the computational time required to make such predictions. As a result, current evaluation schemes assume that predictions are availabl… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Published in Computer Vision and Image Understanding, 2023. arXiv admin note: text overlap with arXiv:2110.05386

  18. arXiv:2306.12152  [pdf, other

    cs.CV

    Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

    Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new mu… ▽ More

    Submitted 11 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  19. arXiv:2304.03959  [pdf, other

    cs.CV

    StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation

    Authors: Francesco Ragusa, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneo… ▽ More

    Submitted 18 March, 2024; v1 submitted 8 April, 2023; originally announced April 2023.

  20. A Multi Camera Unsupervised Domain Adaptation Pipeline for Object Detection in Cultural Sites through Adversarial Learning and Self-Training

    Authors: Giovanni Pasqualino, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Object detection algorithms allow to enable many interesting applications which can be implemented in different devices, such as smartphones and wearable devices. In the context of a cultural site, implementing these algorithms in a wearable device, such as a pair of smart glasses, allow to enable the use of augmented reality (AR) to show extra information about the artworks and enrich the visitor… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  21. Visual Object Tracking in First Person Vision

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:2108.13665

  22. MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

    Authors: Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.05654

    Journal ref: Computer Vision and Image Understanding 2023

  23. arXiv:2204.07090  [pdf, other

    cs.CV

    Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

    Authors: Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  24. arXiv:2204.07069  [pdf, other

    cs.CV

    Panoptic Segmentation using Synthetic and Real Data

    Authors: Camillo Quattrocchi, Daniele Di Mauro, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  25. arXiv:2204.07061  [pdf, other

    cs.CV

    Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

    Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection,… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  26. arXiv:2202.04132  [pdf, other

    cs.CV

    Untrimmed Action Anticipation

    Authors: Ivan Rodin, Antonino Furnari, Dimitrios Mavroeidis, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation consists in predicting a future action the camera wearer will perform from egocentric video. While the task has recently attracted the attention of the research community, current approaches assume that the input videos are "trimmed", meaning that a short video sequence is sampled a fixed time before the beginning of the action. We argue that, despite the recent adva… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  27. arXiv:2202.01069  [pdf, other

    cs.RO cs.CV

    Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

    Authors: Marco Rosano, Antonino Furnari, Luigi Gulino, Corrado Santoro, Giovanni Maria Farinella

    Abstract: Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunate… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Paper accepted for submission in Autonomous Robots

  28. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  29. arXiv:2110.05386  [pdf, other

    cs.CV

    Towards Streaming Egocentric Action Anticipation

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation is the task of predicting the future actions a camera wearer will likely perform based on past video observations. While in a real-world system it is fundamental to output such predictions before the action begins, past works have not generally paid attention to model runtime during evaluation. Indeed, current evaluation schemes assume that predictions can be made of… ▽ More

    Submitted 10 May, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to the 26th International Conference on Pattern Recognition (ICPR 2022)

  30. Is First Person Vision Challenging for Object Tracking?

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects a… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV) 2021, Visual Object Tracking Challenge VOT2021 workshop. arXiv admin note: text overlap with arXiv:2011.12263

  31. arXiv:2107.13411  [pdf, other

    cs.CV

    Predicting the Future from First Person (Egocentric) Vision: A Survey

    Authors: Ivan Rodin, Antonino Furnari, Dimitrios Mavroedis, Giovanni Maria Farinella

    Abstract: Egocentric videos can bring a lot of information about how humans perceive the world and interact with the environment, which can be beneficial for the analysis of human behaviour. The research in egocentric video analysis is developing rapidly thanks to the increasing availability of wearable devices and the opportunities offered by new large-scale egocentric datasets. As computer vision techniqu… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: Computer Vision and Image Understanding, 2021

  32. arXiv:2106.11650  [pdf, other

    cs.RO cs.CV

    A Survey on Human-aware Robot Navigation

    Authors: Ronja Möller, Antonino Furnari, Sebastiano Battiato, Aki Härmä, Giovanni Maria Farinella

    Abstract: Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainmen… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Robotics and Autonomous Systems, 2021

  33. arXiv:2011.12263  [pdf, other

    cs.CV

    Is First Person Vision Challenging for Object Tracking?

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art visual trackers in this domain is still… ▽ More

    Submitted 24 September, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Extended Abstract accepted by the EPIC workshop at ICCV 2021. The full version of this paper is available at arXiv:2108.13665

  34. arXiv:2010.13439  [pdf, other

    cs.RO cs.CV cs.LG

    On Embodied Visual Navigation in Real Environments Through Habitat

    Authors: Marco Rosano, Antonino Furnari, Luigi Gulino, Giovanni Maria Farinella

    Abstract: Visual navigation models based on deep learning can learn effective policies when trained on large amounts of visual observations through reinforcement learning. Unfortunately, collecting the required experience in the real world requires the deployment of a robotic platform, which is expensive and time-consuming. To deal with this limitation, several simulation platforms have been proposed in ord… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Published in International Conference on Pattern Recognition (ICPR), 2020

  35. arXiv:2010.05654  [pdf, other

    cs.CV

    The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

    Authors: Francesco Ragusa, Antonino Furnari, Salvatore Livatino, Giovanni Maria Farinella

    Abstract: Wearable cameras allow to collect images and videos of humans interacting with the world. While human-object interactions have been thoroughly investigated in third person vision, the problem has been understudied in egocentric settings and in industrial scenarios. To fill this gap, we introduce MECCANO, the first dataset of egocentric videos to study human-object interactions in industrial-like s… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  36. arXiv:2008.01882  [pdf, other

    cs.CV

    An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites

    Authors: Giovanni Pasqualino, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in o… ▽ More

    Submitted 21 December, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

  37. Rescaling Egocentric Vision

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a nov… ▽ More

    Submitted 17 September, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Accepted at the International Journal of Computer Vision (IJCV). Dataset available from: http://epic-kitchens.github.io/

  38. SceneAdapt: Scene-based domain adaptation for semantic segmentation using adversarial learning

    Authors: Daniele Di Mauro, Antonino Furnari, Giuseppe Patanè, Sebastiano Battiato, Giovanni Maria Farinella

    Abstract: Semantic segmentation methods have achieved outstanding performance thanks to deep learning. Nevertheless, when such algorithms are deployed to new contexts not seen during training, it is necessary to collect and label scene-specific data in order to adapt them to the new domain using fine-tuning. This process is required whenever an already installed camera is moved or a new camera is introduced… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Journal ref: Pattern Recognition Letters, Volume 136, August 2020, Pages 175-182

  39. Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: In this paper, we tackle the problem of egocentric action anticipation, i.e., predicting what actions the camera wearer will perform in the near future and which objects they will interact with. Specifically, we contribute Rolling-Unrolling LSTM, a learning architecture to anticipate actions from egocentric videos. The method is based on three components: 1) an architecture comprised of two LSTMs… ▽ More

    Submitted 8 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.09035

    Journal ref: Published in IEEE Transaction on Pattern Analysis and Machine Interaction, 2020

  40. arXiv:2005.00343  [pdf, other

    cs.CV

    The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions.… ▽ More

    Submitted 29 April, 2020; originally announced May 2020.

    Comments: Preprint for paper at IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1804.02748

  41. arXiv:2004.07711  [pdf, other

    cs.CV cs.LG

    Knowledge Distillation for Action Anticipation via Label Smoothing

    Authors: Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria Farinella, Lamberto Ballan

    Abstract: Human capability to anticipate near future from visual observations and non-verbal cues is essential for developing intelligent systems that need to interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving need to foresee future events to avoid crashes or help people. Egocentric scenarios are classic examples where action anticipati… ▽ More

    Submitted 18 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted to ICPR 2020

  42. EGO-CH: Dataset and Fundamental Tasks for Visitors BehavioralUnderstanding using Egocentric Vision

    Authors: Francesco Ragusa, Antonino Furnari, Sebastiano Battiato, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: Equipping visitors of a cultural site with a wearable device allows to easily collect information about their preferences which can be exploited to improve the fruition of cultural goods with augmented reality. Moreover, egocentric video can be processed using computer vision and machine learning to enable an automated analysis of visitors' behavior. The inferred information can be used both onlin… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Journal ref: Pattern Recognition Letters 2020

  43. arXiv:1905.09035  [pdf, other

    cs.CV cs.AI

    What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation consists in understanding which objects the camera wearer will interact with in the near future and which actions they will perform. We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future. The input video is processed considering thr… ▽ More

    Submitted 5 August, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: Accepted as oral to ICCV [International Conference on Computer Vision] 2019

  44. arXiv:1904.05264  [pdf, other

    cs.CV

    Egocentric Visitors Localization in Cultural Sites

    Authors: Francesco Ragusa, Antonino Furnari, Sebastiano Battiato, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: We consider the problem of localizing visitors in a cultural site from egocentric (first person) images. Localization information can be useful both to assist the user during his visit (e.g., by suggesting where to go and what to see next) and to provide behavioral information to the manager of the cultural site (e.g., how much time has been spent by visitors at a given location? What has been lik… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: To appear in ACM Journal on Computing and Cultural Heritage (JOCCH), 2019

    Journal ref: ACM Journal on Computing and Cultural Heritage (JOCCH), 2019

  45. Next-Active-Object prediction from Egocentric Videos

    Authors: Antonino Furnari, Sebastiano Battiato, Kristen Grauman, Giovanni Maria Farinella

    Abstract: Although First Person Vision systems can sense the environment from the user's perspective, they are generally unable to predict his intentions and goals. Since human activities can be decomposed in terms of atomic actions and interactions with objects, intelligent wearable systems would benefit from the ability to anticipate user-object interactions. Even if this task is not trivial, the First Pe… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Journal ref: Journal of Visual Communication and Image Representation, Volume 49, 2017, Pages 401-411, ISSN 1047-3203

  46. arXiv:1804.02748  [pdf, other

    cs.CV

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen… ▽ More

    Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.io