Skip to main content

Showing 1–50 of 65 results for author: Savva, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08334  [pdf, ps, other

    cs.GR cs.CV

    Generalizable Articulated Object Reconstruction from Casually Captured RGBD Videos

    Authors: Weikun Peng, Jun Lv, Cewu Lu, Manolis Savva

    Abstract: Articulated objects are prevalent in daily life. Understanding their kinematic structure and reconstructing them have numerous applications in embodied AI and robotics. However, current methods require carefully captured data for training or inference, preventing practical, scalable, and generalizable reconstruction of articulated objects. We focus on reconstruction of an articulated object from a… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Project website can be found at https://3dlg-hcvc.github.io/video2articulation/

  2. arXiv:2505.18926  [pdf, ps, other

    cs.LG physics.flu-dyn

    Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time

    Authors: Jingxuan Xu, Hong Huang, Chuhang Zou, Manolis Savva, Yunchao Wei, Wuyang Chen

    Abstract: We propose a neural physics system for real-time, interactive fluid simulations. Traditional physics-based methods, while accurate, are computationally intensive and suffer from latency issues. Recent machine-learning methods reduce computational costs while preserving fidelity; yet most still fail to satisfy the latency constraints for real-time use and lack support for interactive applications.… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  3. arXiv:2504.13159  [pdf, other

    cs.CV

    Digital Twin Generation from Visual Data: A Survey

    Authors: Andrew Melnik, Benjamin Alt, Giang Nguyen, Artur Wilkowski, Maciej StefaƄczyk, Qirui Wu, Sinan Harms, Helge Rhodin, Manolis Savva, Michael Beetz

    Abstract: This survey explores recent developments in generating digital twins from videos. Such digital twins can be used for robotics application, media content creation, or design and construction works. We analyze various approaches, including 3D Gaussian Splatting, generative in-painting, semantic segmentation, and foundation models highlighting their advantages and limitations. Additionally, we discus… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  4. arXiv:2503.16848  [pdf, other

    cs.GR cs.CV

    HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation

    Authors: Hou In Derek Pun, Hou In Ivan Tam, Austin T. Wang, Xiaoliang Huo, Angel X. Chang, Manolis Savva

    Abstract: Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods primarily focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text d… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 23 pages, 7 figures

  5. arXiv:2503.14756  [pdf, ps, other

    cs.GR cs.CV

    SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis

    Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

    Abstract: Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics primarily assess the realism of generated scenes by comparing them to a set of ground-truth scenes, often overlooking alignment with the input text - a critical factor in determining how effectively a method meets user requirements. We present SceneEval, an… ▽ More

    Submitted 11 June, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Expanded dataset to 500 annotated scene descriptions with new scene types; added validation via extended manual evaluation and a new user study; clarified distinctions from prior metrics; included results using an open-source VLM; stated intent to release code and data; corrected terminology and typos. 24 pages with 8 figures and 6 tables

  6. arXiv:2503.04496  [pdf, other

    cs.GR cs.CV cs.LG

    Learning Object Placement Programs for Indoor Scene Synthesis with Iterative Self Training

    Authors: Adrian Chang, Kai Wang, Yuanbo Li, Manolis Savva, Angel X. Chang, Daniel Ritchie

    Abstract: Data driven and autoregressive indoor scene synthesis systems generate indoor scenes automatically by suggesting and then placing objects one at a time. Empirical observations show that current systems tend to produce incomplete next object location distributions. We introduce a system which addresses this problem. We design a Domain Specific Language (DSL) that specifies functional constraints. P… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 21 pages, 20 figures Subjects: Graphics (cs.GR), Computer Vision and Pattern Recognition (cs.CV), Machine Learning (cs.LG)

    ACM Class: I.3.6

  7. arXiv:2411.19492  [pdf, other

    cs.CV cs.LG

    Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

    Authors: Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang

    Abstract: Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either expensive yet inaccurate real-world annotations or controllable yet monotonous synthetic data that do not generalize well to unseen objects or domains. We present… ▽ More

    Submitted 14 March, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

  8. arXiv:2410.16499  [pdf, other

    cs.CV

    SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects

    Authors: Jiayi Liu, Denys Iliash, Angel X. Chang, Manolis Savva, Ali Mahdavi-Amiri

    Abstract: We address the challenge of creating 3D assets for household articulated objects from a single image. Prior work on articulated object creation either requires multi-view multi-state input, or only allows coarse control over the generation process. These limitations hinder the scalability and practicality for articulated object modeling. In this work, we propose a method to generate articulated ob… ▽ More

    Submitted 19 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: Project page: https://3dlg-hcvc.github.io/singapo

  9. arXiv:2409.18896  [pdf, other

    cs.CV

    S2O: Static to Openable Enhancement for Articulated 3D Objects

    Authors: Denys Iliash, Hanxiao Jiang, Yiming Zhang, Manolis Savva, Angel X. Chang

    Abstract: Despite much progress in large 3D datasets there are currently few interactive 3D object datasets, and their scale is limited due to the manual effort required in their construction. We introduce the static to openable (S2O) task which creates interactive articulated 3D objects from static counterparts through openable part detection, motion prediction, and interior geometry completion. We formula… ▽ More

    Submitted 15 March, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

  10. arXiv:2408.02211  [pdf, ps, other

    cs.GR

    SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements

    Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

    Abstract: Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We present SceneMotifCoder (SMC), an example-driven framework for generating 3D object arrangements through visual program learning. SMC leverages large language m… ▽ More

    Submitted 3 June, 2025; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at 3DV 2025 (Oral). Project page: https://3dlg-hcvc.github.io/smc/. Minor revisions for camera-ready version

  11. arXiv:2403.14937  [pdf, other

    cs.CV

    Survey on Modeling of Human-made Articulated Objects

    Authors: Jiayi Liu, Manolis Savva, Ali Mahdavi-Amiri

    Abstract: 3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts, and create realistic models that reflect articulated objects in the real world. This survey provides a comprehensive overview of the current state-of-the-art i… ▽ More

    Submitted 19 March, 2025; v1 submitted 21 March, 2024; originally announced March 2024.

  12. arXiv:2403.13289  [pdf, other

    cs.CV

    Text-to-3D Shape Generation

    Authors: Han-Hung Lee, Manolis Savva, Angel X. Chang

    Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  13. arXiv:2403.12301  [pdf, other

    cs.CV

    R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

    Authors: Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang

    Abstract: We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining ta… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2401.00405  [pdf, other

    cs.CV

    Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

    Authors: Qirui Wu, Daniel Ritchie, Manolis Savva, Angel X. Chang

    Abstract: Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. Prior work that has studied this task has not focused on evaluating how realistic occlusions impact performance, and how shape retrieval methods generalize to scenarios where either the target 3D shape database contains unseen shapes, or the input image contains unseen objects.… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  15. arXiv:2312.09570  [pdf, other

    cs.CV

    CAGE: Controllable Articulation GEneration

    Authors: Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, Manolis Savva

    Abstract: We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://3dlg-hcvc.github.io/cage/

  16. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 25 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Expanded manuscript (compared to arxiv v1 from Nov 2023 and CVPR 2024 paper from June 2024) for more comprehensive dataset and benchmark presentation, plus new results on v2 data release

  17. arXiv:2310.13135  [pdf, other

    cs.CV

    LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning

    Authors: Pedram Agand, Mohammad Mahdavian, Manolis Savva, Mo Chen

    Abstract: In end-to-end autonomous driving, the utilization of existing sensor fusion techniques and navigational control methods for imitation learning proves inadequate in challenging situations that involve numerous dynamic agents. To address this issue, we introduce LeTFuser, a lightweight transformer-based algorithm for fusing multiple RGB-D camera representations. To perform perception and control tas… ▽ More

    Submitted 1 December, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 11 pages, 2 figures, 3 tables. CVPR Workshops (VCAD). 2023

  18. arXiv:2308.07391  [pdf, other

    cs.CV cs.AI cs.GR

    PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects

    Authors: Jiayi Liu, Ali Mahdavi-Amiri, Manolis Savva

    Abstract: We address the task of simultaneous part-level reconstruction and motion parameter estimation for articulated objects. Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from the static part and reconstruct shape and appearance while predicting the motion parameters. To tackle this problem, we present PARIS: a self-supervised, end-to-en… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Presented at ICCV 2023. Project website: https://3dlg-hcvc.github.io/paris/

  19. arXiv:2306.11565  [pdf, other

    cs.RO cs.AI cs.CV

    HomeRobot: Open-Vocabulary Mobile Manipulation

    Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

    Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it invol… ▽ More

    Submitted 10 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 37 pages, 22 figures, 8 tables

  20. arXiv:2306.11290  [pdf, other

    cs.CV

    Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

    Authors: Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva

    Abstract: We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set of 18,656 models of real-world objects. We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  21. arXiv:2305.18557  [pdf, other

    cs.CV

    Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance

    Authors: Supriya Gadi Patil, Angel X. Chang, Manolis Savva

    Abstract: This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a database of 3D indoor scenes, where objects occur in different orientations. We benchmark different methods for feature extraction and classification in the context… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 20th Conference on Robots and Vision (CRV) 2023

  22. arXiv:2304.03696  [pdf, other

    cs.RO cs.CV

    MOPA: Modular Object Navigation with PointGoal Agents

    Authors: Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Angel X. Chang

    Abstract: We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration m… ▽ More

    Submitted 27 January, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  23. arXiv:2304.03188  [pdf, other

    cs.GR

    Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes

    Authors: Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

    Abstract: This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene mod… ▽ More

    Submitted 21 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Published in Computer Graphics Forum, Aug 2023

  24. arXiv:2303.14087  [pdf, other

    cs.CV

    OPDMulti: Openable Part Detection for Multiple Objects

    Authors: Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang

    Abstract: Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset ba… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  25. arXiv:2301.13261  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Emergence of Maps in the Memories of Blind Navigation Agents

    Authors: Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

    Abstract: Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks ac… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted to ICLR 2023

  26. arXiv:2210.06849  [pdf, other

    cs.CV

    Retrospectives on the Embodied AI Workshop

    Authors: Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi , et al. (14 additional authors not shown)

    Abstract: We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of… ▽ More

    Submitted 4 December, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  27. arXiv:2210.05633  [pdf, other

    cs.CV

    Habitat-Matterport 3D Semantics Dataset

    Authors: Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot

    Abstract: We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object instance annotations across 216 3D spaces and 3,100 rooms within those spaces. The scale, quality, and diversity of object annotations far exceed those of prior… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 15 Pages, 11 Figures, 6 Tables

  28. arXiv:2209.05612  [pdf, other

    cs.CV

    Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges

    Authors: Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel X. Chang, Manolis Savva

    Abstract: Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person manipulating the object. We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video, and carry out a systematic benchmark of f… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: 3DV 2022

  29. arXiv:2205.03797  [pdf, other

    cs.NI

    Fuzzy-Logic Based IDS for Detecting Jamming Attacks in Wireless Mesh IoT Networks

    Authors: Michael Savva, Iacovos Ioannou, Vasos Vassiliou

    Abstract: The investigation in this paper targets the design and the evaluation of jamming intrusion detection based on Fuzzy Logic in wireless mesh IoT Networks in a distributed manner. Our approach uses information collected at local nodes and from the sink as input to the fuzzy logic controller. In order to find the best set of inputs, distributed or centralized, we made a comparison between five differe… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  30. arXiv:2203.16421  [pdf, other

    cs.CV

    OPD: Single-view 3D Openable Part Detection

    Authors: Hanxiao Jiang, Yongsen Mao, Manolis Savva, Angel X. Chang

    Abstract: We address the task of predicting what parts of an object can open and how they move when they do so. The input is a single image of an object, and as output we detect what parts of the object can open, and the motion parameters describing the articulation of each openable part. To tackle this task, we create two datasets of 3D objects: OPDSynth based on existing synthetic objects, and OPDReal bas… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  31. arXiv:2112.07022  [pdf, other

    cs.GR cs.CV cs.LG

    Learning Body-Aware 3D Shape Generative Models

    Authors: Bryce Blinn, Alexander Ding, R. Kenny Jones, Manolis Savva, Srinath Sridhar, Daniel Ritchie

    Abstract: The shape of many objects in the built environment is dictated by their relationships to the human body: how will a person interact with this object? Existing data-driven generative models of 3D shapes produce plausible objects but do not reason about the relationship of those objects to the human body. In this paper, we learn body-aware generative models of 3D shapes. Specifically, we train gener… ▽ More

    Submitted 20 January, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 11 pages, 8 figures

  32. Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms

    Authors: Kai Wang, Xianghao Xu, Leon Lei, Selena Ling, Natalie Lindsay, Angel X. Chang, Manolis Savva, Daniel Ritchie

    Abstract: Realistic 3D indoor scene datasets have enabled significant recent progress in computer vision, scene understanding, autonomous navigation, and 3D reconstruction. But the scale, diversity, and customizability of existing datasets is limited, and it is time-consuming and expensive to scan and annotate more. Fortunately, combinatorics is on our side: there are enough individual rooms in existing 3D… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: Symposium on Geometry Processing (SGP) 2021

    Journal ref: Computer Graphics Forum, 40: 57-69 (2021)

  33. arXiv:2110.05769  [pdf, other

    cs.CV cs.AI cs.LG cs.MA

    Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

    Authors: Shivansh Patel, Saim Wani, Unnat Jain, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang

    Abstract: Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment infor… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Project page: https://shivanshpatel35.github.io/comon/ ; the first three authors contributed equally

  34. arXiv:2109.08238  [pdf, other

    cs.CV cs.AI

    Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

    Authors: Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra

    Abstract: We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces. HM3D surpasses existing datasets available for academic research in te… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 21 pages, 14 figures

  35. arXiv:2108.08420  [pdf, other

    cs.CV

    D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

    Authors: Xiang Xu, Hanbyul Joo, Greg Mori, Manolis Savva

    Abstract: We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions. Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints. Each manipulated object (e.g., microwave oven) is represented with a matching 3D parametric model. This data allows us to… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

  36. arXiv:2106.14405  [pdf, other

    cs.LG cs.RO

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat

    Authors: Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

    Abstract: We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spa… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  37. arXiv:2106.06629  [pdf, other

    cs.CV

    Mirror3D: Depth Refinement for Mirror Surfaces

    Authors: Jiaqi Tan, Weijie Lin, Angel X. Chang, Manolis Savva

    Abstract: Despite recent progress in depth sensing and 3D reconstruction, mirror surfaces are a significant source of errors. To address this problem, we create the Mirror3D dataset: a 3D mirror plane dataset based on three RGBD datasets (Matterport3D, NYUv2 and ScanNet) containing 7,011 mirror instance masks and 3D planes. We then develop Mirror3DNet: a module that refines raw sensor depth or estimated dep… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: Paper presented at CVPR 2021. For code, data and pretrained models, see https://3dlg-hcvc.github.io/mirror3d/

  38. arXiv:2106.05375  [pdf, other

    cs.CV cs.GR

    Plan2Scene: Converting Floorplans to 3D Scenes

    Authors: Madhawa Vidanapathirana, Qirui Wu, Yasutaka Furukawa, Angel X. Chang, Manolis Savva

    Abstract: We address the task of converting a floorplan and a set of associated photos of a residence into a textured 3D mesh model, a task which we call Plan2Scene. Our system 1) lifts a floorplan image to a 3D mesh model; 2) synthesizes surface textures based on the input photos; and 3) infers textures for unobserved surfaces using a graph neural network architecture. To train and evaluate our system we c… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: This paper is accepted to CVPR 2021. For code, data and pretrained models, see https://3dlg-hcvc.github.io/plan2scene/

  39. arXiv:2103.07013  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    Large Batch Simulation for Deep Reinforcement Learning

    Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

    Abstract: We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at ICLR 2021

  40. arXiv:2012.06547  [pdf, other

    cs.CV cs.IR

    LayoutGMN: Neural Graph Matching for Structural Layout Similarity

    Authors: Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

    Abstract: We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). Our network, coined LayoutGMN, learns the layout metric via neural graph matching, using an attention-based GMN designed under a triplet network setting. To train our network, we utilize weak labels obtained by pixel-wise Intersection-over-Union (IoUs) to define the tri… ▽ More

    Submitted 5 April, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

  41. arXiv:2012.03912  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

    Authors: Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang, Manolis Savva

    Abstract: Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requi… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: Project page: https://shivanshpatel35.github.io/multi-ON/ ; the first three authors contributed equally

  42. arXiv:2011.01975  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Rearrangement: A Challenge for Embodied AI

    Authors: Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

    Abstract: We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specifie… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Authors are listed in alphabetical order

  43. arXiv:2007.02919  [pdf, other

    cs.CV

    MCMI: Multi-Cycle Image Translation with Mutual Information Constraints

    Authors: Xiang Xu, Megha Nawhal, Greg Mori, Manolis Savva

    Abstract: We present a mutual information-based framework for unsupervised image-to-image translation. Our MCMI approach treats single-cycle image translation models as modules that can be used recurrently in a multi-cycle translation setting where the translation process is bounded by mutual information constraints between the input and output images. The proposed mutual information constraints can improve… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  44. arXiv:2006.13171  [pdf, other

    cs.CV cs.RO

    ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

    Authors: Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

    Abstract: We revisit the problem of Object-Goal Navigation (ObjectNav). In its simplest form, ObjectNav is defined as the task of navigating to an object, specified by its label, in an unexplored environment. In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e.g., find a chair, by navigating to it. As the community… ▽ More

    Submitted 30 August, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

  45. arXiv:1912.06321  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

    Authors: Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra

    Abstract: Does progress in simulation translate to progress on robots? If one method outperforms another in simulation, how likely is that trend to hold in reality on a robot? We examine this question for embodied PointGoal navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity. First, we develop Habitat-PyRobot Bridge (HaPy), a library for s… ▽ More

    Submitted 16 August, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Journal ref: IEEE Robotics and Automation Letters (RA-L) 2020

  46. arXiv:1911.00357  [pdf, other

    cs.CV cs.AI cs.LG

    DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

    Authors: Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

    Abstract: We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtua… ▽ More

    Submitted 19 January, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

  47. arXiv:1909.13165  [pdf, other

    cs.RO cs.AI cs.LG

    Relational Graph Learning for Crowd Navigation

    Authors: Changan Chen, Sha Hu, Payam Nikdel, Greg Mori, Manolis Savva

    Abstract: We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent's state representation, which is subsequently leve… ▽ More

    Submitted 3 August, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

    Comments: Accepted to IROS 2020. Added links to codes and video demo

  48. arXiv:1906.05797  [pdf, other

    cs.CV cs.GR eess.IV

    The Replica Dataset: A Digital Replica of Indoor Spaces

    Authors: Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra , et al. (5 additional authors not shown)

    Abstract: We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometr… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  49. arXiv:1904.01201  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Habitat: A Platform for Embodied AI Research

    Authors: Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

    Abstract: We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast -- when rendering a scen… ▽ More

    Submitted 24 November, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  50. arXiv:1903.03757  [pdf, other

    cs.CV

    Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction

    Authors: Yifei Shi, Angel Xuan Chang, Zhelun Wu, Manolis Savva, Kai Xu

    Abstract: Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up enc… ▽ More

    Submitted 10 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: CVPR 2019