Skip to main content

Showing 1–49 of 49 results for author: Tremblay, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  2. arXiv:2412.08445  [pdf, other

    cs.AI

    TapeAgents: a Holistic Framework for Agent Development and Optimization

    Authors: Dzmitry Bahdanau, Nicolas Gontier, Gabriel Huang, Ehsan Kamalloo, Rafael Pardinas, Alex Piché, Torsten Scholak, Oleh Shliazhko, Jordan Prince Tremblay, Karam Ghanem, Soham Parikh, Mitul Tiwari, Quaizar Vohra

    Abstract: We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The env… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  3. arXiv:2411.16537  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

    Authors: Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield

    Abstract: Spatial understanding is a crucial capability that enables robots to perceive their surroundings, reason about their environment, and interact with it meaningfully. In modern robotics, these capabilities are increasingly provided by vision-language models. However, these models face significant challenges in spatial reasoning tasks, as their training data are based on general-purpose image dataset… ▽ More

    Submitted 5 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: CVPR 2025 (Oral); Project Website: https://chanh.ee/RoboSpatial

  4. arXiv:2410.20220  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Neural Fields in Robotics: A Survey

    Authors: Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Rares Ambrus, Zsolt Kira, Jonathan Tremblay

    Abstract: Neural Fields have emerged as a transformative approach for 3D scene representation in computer vision and robotics, enabling accurate inference of geometry, 3D semantics, and dynamics from posed 2D data. Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sens… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

    Comments: 20 pages, 20 figures. Project Page: https://robonerf.github.io

  5. arXiv:2410.15536  [pdf, other

    cs.RO cs.AI

    GRS: Generating Robotic Simulation Tasks from Real-World Images

    Authors: Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay

    Abstract: We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects w… ▽ More

    Submitted 4 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

  6. arXiv:2410.01925  [pdf, other

    cs.RO

    Topological mapping for traversability-aware long-range navigation in off-road terrain

    Authors: Jean-François Tremblay, Julie Alhosh, Louis Petit, Faraz Lotfi, Lara Landauro, David Meger

    Abstract: Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been studied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topologi… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  7. arXiv:2409.17652  [pdf, other

    cs.AI cs.RO

    FactorSim: Generative Simulation via Factorized Representation

    Authors: Fan-Yun Sun, S. I. Harini, Angela Yi, Yihan Zhou, Alex Zook, Jonathan Tremblay, Logan Cross, Jiajun Wu, Nick Haber

    Abstract: Generating simulations to train intelligent agents in game-playing and robotics from natural language input, from user input or task documentation, remains an open-ended challenge. Existing approaches focus on parts of this challenge, such as generating reward functions or task hyperparameters. Unlike previous work, we introduce FACTORSIM that generates full simulations in code from language input… ▽ More

    Submitted 11 November, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: neurips 2024, project website: https://cs.stanford.edu/~sunfanyun/factorsim/

  8. arXiv:2406.10543  [pdf, other

    cs.CV cs.AI

    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

    Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

    Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  9. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  10. arXiv:2403.20275  [pdf, other

    cs.CV cs.RO

    Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

    Authors: Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison

    Abstract: Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 17 pages

  11. arXiv:2312.00215  [pdf, other

    cs.RO cs.AI

    Learning active tactile perception through belief-space control

    Authors: Jean-François Tremblay, David Meger, Francois Hogan, Gregory Dudek

    Abstract: Robots operating in an open world will encounter novel objects with unknown physical properties, such as mass, friction, or size. These robots will need to sense these properties through interaction prior to performing downstream tasks with the objects. We propose a method that autonomously learns tactile exploration policies by developing a generative world model that is leveraged to 1) estimate… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 10 pages + references, 6 figures

  12. arXiv:2310.00463  [pdf, other

    cs.CV cs.RO

    Diff-DOPE: Differentiable Deep Object Pose Estimation

    Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield

    Abstract: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Submitted to ICRA 2023. Project page is at https://diffdope.github.io

  13. arXiv:2308.01477  [pdf, other

    cs.RO cs.CV

    HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions

    Authors: Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Tremblay, Stephen Tyree, Jeffrey Smith, Stan Birchfield

    Abstract: We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: IROS 2023. Project page: https://nvlabs.github.io/HANDAL/

  14. arXiv:2304.00673  [pdf, other

    cs.CV

    Partial-View Object View Synthesis via Filtered Inversion

    Authors: Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

    Abstract: We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this,… ▽ More

    Submitted 17 August, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: project website: http://cs.stanford.edu/~sunfanyun/finv

  15. arXiv:2303.16730  [pdf, other

    cs.CV

    TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

    Authors: Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-Jin Yoon

    Abstract: Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project page: https://taeyeop.com/ttacope

  16. arXiv:2303.14158  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

    Authors: Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield

    Abstract: We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is ma… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  17. arXiv:2212.06870  [pdf, other

    cs.CV cs.RO

    MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

    Authors: Yann Labbé, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, Dieter Fox, Josef Sivic

    Abstract: We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which c… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: CoRL 2022

  18. arXiv:2210.12126  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    One-Shot Neural Fields for 3D Object Understanding

    Authors: Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-Jin Yoon, Dieter Fox, Stan Birchfield

    Abstract: We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from… ▽ More

    Submitted 8 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on XRNeRF: Advances in NeRF for the Metaverse 2023

  19. arXiv:2210.11668  [pdf, other

    cs.RO cs.CV

    RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control

    Authors: Zhenggang Tang, Balakumar Sundaralingam, Jonathan Tremblay, Bowen Wen, Ye Yuan, Stephen Tyree, Charles Loop, Alexander Schwing, Stan Birchfield

    Abstract: We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function… ▽ More

    Submitted 10 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: ICRA 2023. Project page at https://ngp-mpc.github.io/

  20. arXiv:2210.10108  [pdf, other

    cs.CV cs.RO

    Parallel Inversion of Neural Radiance Fields for Robust Pose Estimation

    Authors: Yunzhi Lin, Thomas Müller, Jonathan Tremblay, Bowen Wen, Stephen Tyree, Alex Evans, Patricio A. Vela, Stan Birchfield

    Abstract: We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a moment… ▽ More

    Submitted 10 March, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: ICRA 2023. Project page at https://pnerfp.github.io/

  21. arXiv:2209.11302  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

    Authors: Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg

    Abstract: Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumeratin… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  22. arXiv:2206.07707  [pdf, other

    cs.CV cs.GR cs.LG cs.MM

    Variable Bitrate Neural Fields

    Authors: Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas Müller, Morgan McGuire, Alec Jacobson, Sanja Fidler

    Abstract: Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. State-of-the-art results are obtained by conditioning a neural approximation with a lookup from trainable feature grids that take on part of the learning task and allow for smaller, more efficient neural networks. Unfortunately, these fea… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: SIGGRAPH 2022. Project Page: https://nv-tlabs.github.io/vqad/

  23. arXiv:2205.11047  [pdf, other

    cs.CV cs.RO

    Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributio… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: ICRA 2022. Project site is at https://sites.google.com/view/centerposetrack

  24. arXiv:2205.07058  [pdf, other

    cs.CV

    RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

    Authors: Jonathan Tremblay, Moustafa Meshry, Alex Evans, Jan Kautz, Alexander Keller, Sameh Khamis, Thomas Müller, Charles Loop, Nathan Morrical, Koki Nagano, Towaki Takikawa, Stan Birchfield

    Abstract: We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct… ▽ More

    Submitted 24 October, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: ECCV 2022 Workshop on Learning to Generate 3D Shapes and Scenes. Project page at http://www.cs.umd.edu/~mmeshry/projects/rtmv

  25. arXiv:2203.05701  [pdf, other

    cs.RO cs.CV

    6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark

    Authors: Stephen Tyree, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Jeffrey Smith, Stan Birchfield

    Abstract: We present a new dataset for 6-DoF pose estimation of known objects, with a focus on robotic manipulation research. We propose a set of toy grocery objects, whose physical instantiations are readily available for purchase and are appropriately sized for robotic grasping and manipulation. We provide 3D scanned textured models of these objects, suitable for generating synthetic training data, as wel… ▽ More

    Submitted 15 December, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: IROS 2022. Project page is at https://github.com/swtyree/hope-dataset

  26. arXiv:2112.11347  [pdf, other

    cs.CV

    Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

    Authors: Atsuhiro Noguchi, Umar Iqbal, Jonathan Tremblay, Tatsuya Harada, Orazio Gallo

    Abstract: Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to wor… ▽ More

    Submitted 6 April, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: CVPR2022, 16 pages, Project page: https://nvlabs.github.io/watch-it-move

  27. arXiv:2112.07945  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Efficient Geometry-aware 3D Generative Adversarial Networks

    Authors: Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, Gordon Wetzstein

    Abstract: Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape… ▽ More

    Submitted 27 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Project page: https://matthew-a-chan.github.io/EG3D

  28. arXiv:2109.06161  [pdf, other

    cs.CV cs.RO

    Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for cate… ▽ More

    Submitted 12 May, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: ICRA 2022. Project page at https://sites.google.com/view/centerpose

  29. arXiv:2105.13962  [pdf, other

    cs.CV cs.RO

    NViSII: A Scriptable Tool for Photorealistic Image Generation

    Authors: Nathan Morrical, Jonathan Tremblay, Yunzhi Lin, Stephen Tyree, Stan Birchfield, Valerio Pascucci, Ingo Wald

    Abstract: We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning. Our tool enables the description and manipulation of complex dynamic 3D scenes containing object meshes, materials, textures, lighting, volumetric data (e.g., smoke), and backgrounds. Metadata,… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: SDG Workshop at ICLR 2021. Project page is at https://github.com/owl-project/NVISII

  30. arXiv:2104.04631  [pdf, other

    cs.CV

    DexYCB: A Benchmark for Capturing Hand Grasping of Objects

    Authors: Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox

    Abstract: We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating saf… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021

  31. arXiv:2103.13539  [pdf, other

    cs.RO

    Multi-View Fusion for Multi-Level Robotic Scene Understanding

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); an… ▽ More

    Submitted 14 October, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Presented at IROS 2021. Video is at https://youtu.be/FuqMxuODGlw

  32. arXiv:2012.07277  [pdf, other

    cs.RO

    Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

    Authors: Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Yuke Zhu

    Abstract: We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation ser… ▽ More

    Submitted 29 March, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to ICRA 2021

  33. arXiv:2011.11751  [pdf, other

    cs.RO

    Multimodal dynamics modeling for off-road autonomous vehicles

    Authors: Jean-François Tremblay, Travis Manderson, Aurélio Noca, Gregory Dudek, David Meger

    Abstract: Dynamics modeling in outdoor and unstructured environments is difficult because different elements in the environment interact with the robot in ways that can be hard to predict. Leveraging multiple sensors to perceive maximal information about the robot's environment is thus crucial when building a model to perform predictions about the robot's dynamics with the goal of doing motion planning. We… ▽ More

    Submitted 29 March, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

  34. arXiv:2011.07748  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Fast Uncertainty Quantification for Deep Object Pose Estimation

    Authors: Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu

    Abstract: Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer. Efficient and robust uncertainty quantification (UQ) in pose estimators is critically needed in many robotic tasks. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose esti… ▽ More

    Submitted 26 March, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: Video and code are available at https://sites.google.com/view/fastuq

    Journal ref: International Conferenceon Robotics and Automation (ICRA), 2021

  35. arXiv:2011.06332  [pdf, other

    cs.RO

    Joint Space Control via Deep Reinforcement Learning

    Authors: Visak Kumar, David Hoeller, Balakumar Sundaralingam, Jonathan Tremblay, Stan Birchfield

    Abstract: The dominant way to control a robot manipulator uses hand-crafted differential equations leveraging some form of inverse kinematics / dynamics. We propose a simple, versatile joint-level controller that dispenses with differential equations entirely. A deep neural network, trained via model-free reinforcement learning, is used to map from task space to joint space. Experiments show the method capa… ▽ More

    Submitted 20 August, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Presented at IROS 2021. Video is at https://youtu.be/ICfve-GTTp8

  36. arXiv:2008.11822  [pdf, other

    cs.RO

    Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

    Authors: Jonathan Tremblay, Stephen Tyree, Terry Mosier, Stan Birchfield

    Abstract: We present a robotic grasping system that uses a single external monocular RGB camera as input. The object-to-robot pose is computed indirectly by combining the output of two neural networks: one that estimates the object-to-camera pose, and another that estimates the robot-to-camera pose. Both networks are trained entirely on synthetic data, relying on domain randomization to bridge the sim-to-re… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: IROS 2020. Video at https://youtu.be/E0J91llX-ys

  37. arXiv:2006.16235  [pdf, other

    cs.RO

    Vision-Based Goal-Conditioned Policies for Underwater Navigation in the Presence of Obstacles

    Authors: Travis Manderson, Juan Camilo Gamboa Higuera, Stefan Wapnick, Jean-François Tremblay, Florian Shkurti, David Meger, Gregory Dudek

    Abstract: We present Nav2Goal, a data-efficient and end-to-end learning method for goal-conditioned visual navigation. Our technique is used to train a navigation policy that enables a robot to navigate close to sparse geographic waypoints provided by a user without any prior map, all while avoiding obstacles and choosing paths that cover user-informed regions of interest. Our approach is based on recent ad… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: RSS 2020. Video and project details can be found at http://www.cim.mcgill.ca/mrl/nav2goal/

  38. arXiv:2005.10872  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

    Authors: Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

    Abstract: Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this w… ▽ More

    Submitted 26 May, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Journal ref: International Conference in Robotics and Automation 2020

  39. arXiv:2005.00673  [pdf, other

    cs.CV cs.LG eess.IV

    PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

    Authors: Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang

    Abstract: In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention. Vehicle ReID is challenging due to 1) high intra-class variability (caused by the dependency of shape and appearance on viewpoint), and 2) small inter-class variability (caused by the similarity in shape and appearance between vehicles produced by d… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted by ICCV 2019

  40. arXiv:1911.09233  [pdf, other

    cs.RO

    Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

    Authors: Visak Kumar, Tucker Hermans, Dieter Fox, Stan Birchfield, Jonathan Tremblay

    Abstract: Using simulation to train robot manipulation policies holds the promise of an almost unlimited amount of training data, generated safely out of harm's way. One of the key challenges of using simulation, to date, has been to bridge the reality gap, so that policies trained in simulation can be deployed in the real world. We explore the reality gap in the context of learning a contextual policy for… ▽ More

    Submitted 24 November, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

  41. arXiv:1911.09231  [pdf, other

    cs.RO

    Camera-to-Robot Pose Estimation from a Single Image

    Authors: Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, Stan Birchfield

    Abstract: We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then… ▽ More

    Submitted 23 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: ICRA 2020. Project page is at https://research.nvidia.com/publication/2020-03_DREAM

  42. arXiv:1909.02075  [pdf, other

    cs.RO

    Toward Sim-to-Real Directional Semantic Grasping

    Authors: Shariq Iqbal, Jonathan Tremblay, Thang To, Jia Cheng, Erik Leitch, Andy Campbell, Kirby Leung, Duncan McKay, Stan Birchfield

    Abstract: We address the problem of directional semantic grasping, that is, grasping a specific object from a specific direction. We approach the problem using deep reinforcement learning via a double deep Q-network (DDQN) that learns to map downsampled RGB input images from a wrist-mounted camera to Q-values, which are then translated into Cartesian robot control commands via the cross-entropy method (CEM)… ▽ More

    Submitted 18 August, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: ICRA 2020. Video is at https://youtu.be/bjJLtNdVj9w

  43. arXiv:1905.04957  [pdf, other

    cs.CV

    Few-Shot Viewpoint Estimation

    Authors: Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz

    Abstract: Viewpoint estimation for known categories of objects has been improved significantly thanks to deep networks and large datasets, but generalization to unknown categories is still very challenging. With an aim towards improving performance on unknown categories, we introduce the problem of category-level few-shot viewpoint estimation. We design a novel framework to successfully train viewpoint netw… ▽ More

    Submitted 31 July, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

    Comments: BMVC 2019

  44. arXiv:1904.07960  [pdf

    cs.NI

    Softwire Hub and Spoke Deployment Framework with Layer Two Tunneling Protocol Version 2 (L2TPv2)

    Authors: Bill Storer, Carlos Pignataro, Maria Alice Dos Santos, Bruno Stévant, Laurent Toutain, Jean-François Tremblay

    Abstract: This document describes the framework of the Softwire ''Hub and Spoke'' solution with the Layer Two Tunneling Protocol version 2 (L2TPv2). The implementation details specified in this document should be followed to achieve interoperability among different vendor implementations.

    Submitted 6 March, 2019; originally announced April 2019.

  45. arXiv:1904.05281  [pdf, other

    cs.RO

    Automatic 3D Mapping for Tree Diameter Measurements in Inventory Operations

    Authors: Jean-François Tremblay, Martin Béland, François Pomerleau, Richard Gagnon, Philippe Giguère

    Abstract: Forestry is a major industry in many parts of the world. It relies on forest inventory, which consists of measuring tree attributes. We propose to use 3D mapping, based on the iterative closest point algorithm, to automatically measure tree diameters in forests from mobile robot observations. While previous studies showed the potential for such technology, they lacked a rigorous analysis of diamet… ▽ More

    Submitted 11 July, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

  46. arXiv:1809.10790  [pdf, other

    cs.RO

    Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects

    Authors: Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, Stan Birchfield

    Abstract: Using synthetic data for training deep neural networks for robotic manipulation holds the promise of an almost unlimited amount of pre-labeled training data, generated safely out of harm's way. One of the key challenges of synthetic data, to date, has been to bridge the so-called reality gap, so that networks trained on synthetic data operate correctly when exposed to real-world data. We explore t… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: Conference on Robot Learning (CoRL) 2018

  47. arXiv:1805.07054  [pdf, other

    cs.RO

    Synthetically Trained Neural Networks for Learning Human-Readable Plans from Real-World Demonstrations

    Authors: Jonathan Tremblay, Thang To, Artem Molchanov, Stephen Tyree, Jan Kautz, Stan Birchfield

    Abstract: We present a system to infer and execute a human-readable program from a real-world demonstration. The system consists of a series of neural networks to perform perception, program generation, and program execution. Leveraging convolutional pose machines, the perception network reliably detects the bounding cuboids of objects in real images even when severely occluded, after training only on synth… ▽ More

    Submitted 10 July, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2018. For associated video, see https://youtu.be/B7ZT5oSnRys

  48. arXiv:1804.06534  [pdf, other

    cs.CV

    Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation

    Authors: Jonathan Tremblay, Thang To, Stan Birchfield

    Abstract: We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. By synthetically combining object models and backgrounds of complex composition and high graphical quality, we are able to generate photorealistic images with accurate 3D pose annotations for all objects in all images. Our dataset contains… ▽ More

    Submitted 10 July, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: CVPR 2018 Workshop on Real World Challenges and New Benchmarks for Deep Learning in Robotic Vision

  49. arXiv:1804.06516  [pdf, other

    cs.CV

    Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization

    Authors: Jonathan Tremblay, Aayush Prakash, David Acuna, Mark Brophy, Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, Stan Birchfield

    Abstract: We present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator$-$such as lighting, pose, object textures, etc.$-$are randomized in non-realistic ways to force the neural network to learn the essential features of th… ▽ More

    Submitted 23 April, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: CVPR 2018 Workshop on Autonomous Driving