Skip to main content

Showing 1–8 of 8 results for author: Cartillier, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.11419  [pdf, other

    cs.CV

    SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

    Authors: Vincent Cartillier, Grant Schindler, Irfan Essa

    Abstract: We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alig… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2403.13190  [pdf, other

    cs.CV

    3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D

    Authors: Vincent Cartillier, Neha Jain, Irfan Essa

    Abstract: We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the f… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 8pages

  3. arXiv:2205.01652  [pdf, other

    cs.CV cs.AI

    Episodic Memory Question Answering

    Authors: Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

    Abstract: Egocentric augmented reality devices such as wearable glasses passively capture visual data as a human wearer tours a home environment. We envision a scenario wherein the human communicates with an AI agent powering such a device by asking questions (e.g., where did you last see my keys?). In order to succeed at this task, the egocentric AI assistant must (1) construct semantically rich and effici… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Published at CVPR 2022 (Oral presentation)

  4. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  5. arXiv:2010.01191  [pdf, other

    cs.CV

    Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

    Authors: Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra

    Abstract: We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?") from egocentric observations of an RGB-D camera with known pose (via localization sensors). Towards this goal, we present SemanticMapNet (SMNet), which consists of: (1) an Egoc… ▽ More

    Submitted 10 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

  6. arXiv:1901.09107  [pdf, other

    cs.CV

    Audio-Visual Scene-Aware Dialog

    Authors: Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

    Abstract: We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural response to a question about a scene, given video and audio of the scene and the history of previous turns in the dialog. To answer successfully, agents must ground concepts from the question in the video while leveraging contextual cues from the dialog history. To benchmark this task, we introduce the Audi… ▽ More

    Submitted 8 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  7. arXiv:1806.08409  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

    Authors: Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

    Abstract: Dialog systems need to understand dynamic visual scenes in order to have conversations with users about the objects and events around them. Scene-aware dialog systems for real-world applications could be developed by integrating state-of-the-art technologies from multiple research areas, including: end-to-end dialog technologies, which generate system responses using models trained from dialog dat… ▽ More

    Submitted 29 June, 2018; v1 submitted 21 June, 2018; originally announced June 2018.

    Comments: A prototype system for the Audio Visual Scene-aware Dialog (AVSD) at DSTC7

  8. arXiv:1806.00525  [pdf, other

    cs.CL cs.CV

    Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

    Authors: Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori

    Abstract: Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, wh… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.