Skip to main content

Showing 1–50 of 62 results for author: Birchfield, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  2. arXiv:2501.09898  [pdf, other

    cs.CV cs.LG cs.RO

    FoundationStereo: Zero-Shot Stereo Matching

    Authors: Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield

    Abstract: Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot gener… ▽ More

    Submitted 3 April, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

    Comments: CVPR 2025

  3. arXiv:2411.16537  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

    Authors: Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield

    Abstract: Spatial understanding is a crucial capability that enables robots to perceive their surroundings, reason about their environment, and interact with it meaningfully. In modern robotics, these capabilities are increasingly provided by vision-language models. However, these models face significant challenges in spatial reasoning tasks, as their training data are based on general-purpose image dataset… ▽ More

    Submitted 5 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: CVPR 2025 (Oral); Project Website: https://chanh.ee/RoboSpatial

  4. arXiv:2411.00965  [pdf, other

    cs.RO

    SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

    Authors: Cheng-Chun Hsu, Bowen Wen, Jie Xu, Yashraj Narang, Xiaolong Wang, Yuke Zhu, Joydeep Biswas, Stan Birchfield

    Abstract: We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations… ▽ More

    Submitted 13 May, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

  5. arXiv:2410.15536  [pdf, other

    cs.RO cs.AI

    GRS: Generating Robotic Simulation Tasks from Real-World Images

    Authors: Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay

    Abstract: We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects w… ▽ More

    Submitted 4 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

  6. arXiv:2409.16663  [pdf, other

    cs.RO cs.CV cs.LG eess.SY

    Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models

    Authors: Alexander Popov, Alperen Degirmenci, David Wehr, Shashank Hegde, Ryan Oldja, Alexey Kamenev, Bertrand Douillard, David Nistér, Urs Muller, Ruchi Bhargava, Stan Birchfield, Nikolai Smolyanskiy

    Abstract: We propose the use of latent space generative world models to address the covariate shift problem in autonomous driving. A world model is a neural network capable of predicting an agent's next state given past states and actions. By leveraging a world model during training, the driving policy effectively mitigates covariate shift without requiring an excessive amount of training data. During end-t… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: 8 pages, 6 figures, updated in March 2025, original published in September 2024, for ICRA 2025 submission, for associated video file, see https://youtu.be/7m3bXzlVQvU

    MSC Class: 68T40 (Primary) 68T05; 68T45 (Secondary) ACM Class: I.2.9; I.2.6; I.2.10; I.6

  7. arXiv:2406.10543  [pdf, other

    cs.CV cs.AI

    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

    Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

    Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  8. arXiv:2405.17383  [pdf, other

    cs.CL

    Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

    Authors: Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong

    Abstract: We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint.… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  9. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  10. arXiv:2312.08344  [pdf, other

    cs.CV cs.AI cs.RO

    FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

    Authors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield

    Abstract: We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit represen… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  11. arXiv:2312.00583  [pdf, other

    cs.CV cs.RO

    DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation

    Authors: Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Jenny Seidenschwarz, Mike Zheng Shou, Deva Ramanan, Shuran Song, Stan Birchfield, Bowen Wen, Jeffrey Ichnowski

    Abstract: Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided r… ▽ More

    Submitted 30 August, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  12. arXiv:2310.00463  [pdf, other

    cs.CV cs.RO

    Diff-DOPE: Differentiable Deep Object Pose Estimation

    Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield

    Abstract: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Submitted to ICRA 2023. Project page is at https://diffdope.github.io

  13. arXiv:2308.01477  [pdf, other

    cs.RO cs.CV

    HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions

    Authors: Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Tremblay, Stephen Tyree, Jeffrey Smith, Stan Birchfield

    Abstract: We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: IROS 2023. Project page: https://nvlabs.github.io/HANDAL/

  14. arXiv:2304.00673  [pdf, other

    cs.CV

    Partial-View Object View Synthesis via Filtered Inversion

    Authors: Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

    Abstract: We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this,… ▽ More

    Submitted 17 August, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: project website: http://cs.stanford.edu/~sunfanyun/finv

  15. arXiv:2303.16730  [pdf, other

    cs.CV

    TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

    Authors: Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-Jin Yoon

    Abstract: Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project page: https://taeyeop.com/ttacope

  16. arXiv:2303.14158  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

    Authors: Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield

    Abstract: We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is ma… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  17. arXiv:2303.12538  [pdf, other

    cs.CV cs.RO

    Affordance Diffusion: Synthesizing Hand-Object Interactions

    Authors: Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu

    Abstract: Recent successes in image synthesis are powered by large-scale diffusion models. However, most methods are currently limited to either text- or image-conditioned generation for synthesizing an entire image, texture transfer or inserting objects into a user-specified region. In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object. Given… ▽ More

    Submitted 20 May, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: accepted to CVPR22, change fig 2 from .pdf to .jpg for adobe compatibility

  18. arXiv:2301.13190  [pdf, other

    cs.CV

    Audio-Visual Segmentation with Semantics

    Authors: Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

    Abstract: We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark, i.e., AVSBench, providing pixel-wise annotations for sounding objects in audible videos. It contains three subsets: AVSBench-obje… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Submitted to TPAMI as a journal extension of ECCV 2022. Jinxing Zhou, Xuyang Shen, and Jianyuan Wang contribute equally to this work. Meng Wang and Yiran Zhong are the corresponding authors. Code is available at https://github.com/OpenNLPLab/AVSBench. Online benchmark is available at http://www.avlbench.opennlplab.cn. arXiv admin note: substantial text overlap with arXiv:2207.05042

  19. arXiv:2212.06870  [pdf, other

    cs.CV cs.RO

    MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

    Authors: Yann Labbé, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, Dieter Fox, Josef Sivic

    Abstract: We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which c… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: CoRL 2022

  20. arXiv:2210.12126  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    One-Shot Neural Fields for 3D Object Understanding

    Authors: Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-Jin Yoon, Dieter Fox, Stan Birchfield

    Abstract: We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from… ▽ More

    Submitted 8 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on XRNeRF: Advances in NeRF for the Metaverse 2023

  21. arXiv:2210.11668  [pdf, other

    cs.RO cs.CV

    RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control

    Authors: Zhenggang Tang, Balakumar Sundaralingam, Jonathan Tremblay, Bowen Wen, Ye Yuan, Stephen Tyree, Charles Loop, Alexander Schwing, Stan Birchfield

    Abstract: We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function… ▽ More

    Submitted 10 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: ICRA 2023. Project page at https://ngp-mpc.github.io/

  22. arXiv:2210.10108  [pdf, other

    cs.CV cs.RO

    Parallel Inversion of Neural Radiance Fields for Robust Pose Estimation

    Authors: Yunzhi Lin, Thomas Müller, Jonathan Tremblay, Bowen Wen, Stephen Tyree, Alex Evans, Patricio A. Vela, Stan Birchfield

    Abstract: We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a moment… ▽ More

    Submitted 10 March, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: ICRA 2023. Project page at https://pnerfp.github.io/

  23. arXiv:2207.05042  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    Audio-Visual Segmentation

    Authors: Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

    Abstract: We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with… ▽ More

    Submitted 17 February, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code is available at https://github.com/OpenNLPLab/AVSBench

  24. arXiv:2206.10552  [pdf, other

    cs.CV

    Vicinity Vision Transformer

    Authors: Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

    Abstract: Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Although linear attention was introduced in natural language processing (NLP) tasks to mitigate a similar issue, dire… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: code: https://github.com/OpenNLPLab/Vicinity-Vision-Transformer

  25. arXiv:2205.11047  [pdf, other

    cs.CV cs.RO

    Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributio… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: ICRA 2022. Project site is at https://sites.google.com/view/centerposetrack

  26. arXiv:2205.07058  [pdf, other

    cs.CV

    RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

    Authors: Jonathan Tremblay, Moustafa Meshry, Alex Evans, Jan Kautz, Alexander Keller, Sameh Khamis, Thomas Müller, Charles Loop, Nathan Morrical, Koki Nagano, Towaki Takikawa, Stan Birchfield

    Abstract: We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct… ▽ More

    Submitted 24 October, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: ECCV 2022 Workshop on Learning to Generate 3D Shapes and Scenes. Project page at http://www.cs.umd.edu/~mmeshry/projects/rtmv

  27. arXiv:2203.05701  [pdf, other

    cs.RO cs.CV

    6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark

    Authors: Stephen Tyree, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Jeffrey Smith, Stan Birchfield

    Abstract: We present a new dataset for 6-DoF pose estimation of known objects, with a focus on robotic manipulation research. We propose a set of toy grocery objects, whose physical instantiations are readily available for purchase and are appropriately sized for robotic grasping and manipulation. We provide 3D scanned textured models of these objects, suitable for generating synthetic training data, as wel… ▽ More

    Submitted 15 December, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: IROS 2022. Project page is at https://github.com/swtyree/hope-dataset

  28. arXiv:2109.11094  [pdf, other

    cs.RO cs.LG

    PredictionNet: Real-Time Joint Probabilistic Traffic Prediction for Planning, Control, and Simulation

    Authors: Alexey Kamenev, Lirui Wang, Ollin Boer Bohan, Ishwar Kulkarni, Bilal Kartal, Artem Molchanov, Stan Birchfield, David Nistér, Nikolai Smolyanskiy

    Abstract: Predicting the future motion of traffic agents is crucial for safe and efficient autonomous driving. To this end, we present PredictionNet, a deep neural network (DNN) that predicts the motion of all surrounding traffic agents together with the ego-vehicle's motion. All predictions are probabilistic and are represented in a simple top-down rasterization that allows an arbitrary number of agents. C… ▽ More

    Submitted 19 May, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: 7 pages, 7 figures, accepted to ICRA 2022 conference, for associated video file, see https://youtu.be/C7Nb3DRjFP0

    MSC Class: 68T07 (Primary) 68T37; 68T40 (Secondary) ACM Class: I.2.9; I.2.6; I.6

  29. arXiv:2109.06161  [pdf, other

    cs.CV cs.RO

    Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for cate… ▽ More

    Submitted 12 May, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: ICRA 2022. Project page at https://sites.google.com/view/centerpose

  30. arXiv:2105.13962  [pdf, other

    cs.CV cs.RO

    NViSII: A Scriptable Tool for Photorealistic Image Generation

    Authors: Nathan Morrical, Jonathan Tremblay, Yunzhi Lin, Stephen Tyree, Stan Birchfield, Valerio Pascucci, Ingo Wald

    Abstract: We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning. Our tool enables the description and manipulation of complex dynamic 3D scenes containing object meshes, materials, textures, lighting, volumetric data (e.g., smoke), and backgrounds. Metadata,… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: SDG Workshop at ICLR 2021. Project page is at https://github.com/owl-project/NVISII

  31. arXiv:2104.04631  [pdf, other

    cs.CV

    DexYCB: A Benchmark for Capturing Hand Grasping of Objects

    Authors: Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox

    Abstract: We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating saf… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021

  32. arXiv:2104.00556  [pdf, other

    cs.CV

    Deep Two-View Structure-from-Motion Revisited

    Authors: Jianyuan Wang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang, Nikolai Smolyanskiy, Hongdong Li

    Abstract: Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM. Existing deep learning-based approaches formulate the problem by either recovering absolute pose scales from two consecutive frames or predicting a depth map from a single image, both of which are ill-posed problems. In contrast, we propose to revisit the problem of deep two-view SfM by leveraging the wel… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR 2021; Yiran Zhong and Jianyuan Wang contribute equally to this work and the name listed in alphabetical order

  33. arXiv:2103.13539  [pdf, other

    cs.RO

    Multi-View Fusion for Multi-Level Robotic Scene Understanding

    Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield

    Abstract: We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); an… ▽ More

    Submitted 14 October, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Presented at IROS 2021. Video is at https://youtu.be/FuqMxuODGlw

  34. arXiv:2012.07277  [pdf, other

    cs.RO

    Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs

    Authors: Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Yuke Zhu

    Abstract: We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation ser… ▽ More

    Submitted 29 March, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to ICRA 2021

  35. arXiv:2012.00899  [pdf, other

    cs.CV

    Displacement-Invariant Cost Computation for Efficient Stereo Matching

    Authors: Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz

    Abstract: Although deep learning-based methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy, their inference time is typically slow, on the order of seconds for a pair of 540p images. The main reason is that the leading methods employ time-consuming 3D convolutions applied to a 4D feature volume. A common way to speed up the computation is to downsample the featur… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 8 pages

  36. arXiv:2011.14488  [pdf, other

    cs.CV

    Self-Supervised Real-to-Sim Scene Generation

    Authors: Aayush Prakash, Shoubhik Debnath, Jean-Francois Lafleche, Eric Cameracci, Gavriel State, Stan Birchfield, Marc T. Law

    Abstract: Synthetic data is emerging as a promising solution to the scalability issue of supervised deep learning, especially when real data are difficult to acquire or hard to annotate. Synthetic data generation, however, can itself be prohibitively expensive when domain experts have to manually and painstakingly oversee the process. Moreover, neural networks trained on synthetic data often do not perform… ▽ More

    Submitted 18 August, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: accepted at ICCV 2021. Project page: https://research.nvidia.com/publication/2021-08_Sim2SG

  37. arXiv:2011.07748  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Fast Uncertainty Quantification for Deep Object Pose Estimation

    Authors: Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu

    Abstract: Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer. Efficient and robust uncertainty quantification (UQ) in pose estimators is critically needed in many robotic tasks. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose esti… ▽ More

    Submitted 26 March, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: Video and code are available at https://sites.google.com/view/fastuq

    Journal ref: International Conferenceon Robotics and Automation (ICRA), 2021

  38. arXiv:2011.06332  [pdf, other

    cs.RO

    Joint Space Control via Deep Reinforcement Learning

    Authors: Visak Kumar, David Hoeller, Balakumar Sundaralingam, Jonathan Tremblay, Stan Birchfield

    Abstract: The dominant way to control a robot manipulator uses hand-crafted differential equations leveraging some form of inverse kinematics / dynamics. We propose a simple, versatile joint-level controller that dispenses with differential equations entirely. A deep neural network, trained via model-free reinforcement learning, is used to map from task space to joint space. Experiments show the method capa… ▽ More

    Submitted 20 August, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Presented at IROS 2021. Video is at https://youtu.be/ICfve-GTTp8

  39. arXiv:2008.11822  [pdf, other

    cs.RO

    Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera

    Authors: Jonathan Tremblay, Stephen Tyree, Terry Mosier, Stan Birchfield

    Abstract: We present a robotic grasping system that uses a single external monocular RGB camera as input. The object-to-robot pose is computed indirectly by combining the output of two neural networks: one that estimates the object-to-camera pose, and another that estimates the robot-to-camera pose. Both networks are trained entirely on synthetic data, relying on domain randomization to bridge the sim-to-re… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: IROS 2020. Video at https://youtu.be/E0J91llX-ys

  40. arXiv:2008.11098  [pdf, other

    cs.CV

    Improving Deep Stereo Network Generalization with Geometric Priors

    Authors: Jialiang Wang, Varun Jampani, Deqing Sun, Charles Loop, Stan Birchfield, Jan Kautz

    Abstract: End-to-end deep learning methods have advanced stereo vision in recent years and obtained excellent results when the training and test data are similar. However, large datasets of diverse real-world scenes with dense ground truth are difficult to obtain and currently not publicly available to the research community. As a result, many algorithms rely on small real-world datasets of similar scenes o… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

  41. arXiv:2007.14256  [pdf, other

    cs.RO

    RMPflow: A Geometric Framework for Generation of Multi-Task Motion Policies

    Authors: Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

    Abstract: Generating robot motion for multiple tasks in dynamic environments is challenging, requiring an algorithm to respond reactively while accounting for complex nonlinear relationships between tasks. In this paper, we develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.07049

  42. arXiv:2006.05518  [pdf, other

    cs.CV cs.RO

    MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

    Authors: Ke Chen, Ryan Oldja, Nikolai Smolyanskiy, Stan Birchfield, Alexander Popov, David Wehr, Ibrahim Eden, Joachim Pehserl

    Abstract: Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud proj… ▽ More

    Submitted 17 August, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: IROS 2020 conference (submitted March 1st, 2020). For accompanying video, see https://youtu.be/2ck5_sToayc

    ACM Class: I.2.6; I.4.6; I.5.1

  43. arXiv:2005.00673  [pdf, other

    cs.CV cs.LG eess.IV

    PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

    Authors: Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang

    Abstract: In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention. Vehicle ReID is challenging due to 1) high intra-class variability (caused by the dependency of shape and appearance on viewpoint), and 2) small inter-class variability (caused by the similarity in shape and appearance between vehicles produced by d… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted by ICCV 2019

  44. arXiv:1911.09233  [pdf, other

    cs.RO

    Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies

    Authors: Visak Kumar, Tucker Hermans, Dieter Fox, Stan Birchfield, Jonathan Tremblay

    Abstract: Using simulation to train robot manipulation policies holds the promise of an almost unlimited amount of training data, generated safely out of harm's way. One of the key challenges of using simulation, to date, has been to bridge the reality gap, so that policies trained in simulation can be deployed in the real world. We explore the reality gap in the context of learning a contextual policy for… ▽ More

    Submitted 24 November, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

  45. arXiv:1911.09231  [pdf, other

    cs.RO

    Camera-to-Robot Pose Estimation from a Single Image

    Authors: Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, Stan Birchfield

    Abstract: We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then… ▽ More

    Submitted 23 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: ICRA 2020. Project page is at https://research.nvidia.com/publication/2020-03_DREAM

  46. arXiv:1910.03135  [pdf, other

    cs.CV cs.LG cs.RO

    DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System

    Authors: Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, Dieter Fox

    Abstract: Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system… ▽ More

    Submitted 14 October, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: 17 pages, first version of DexPilot

  47. arXiv:1909.02075  [pdf, other

    cs.RO

    Toward Sim-to-Real Directional Semantic Grasping

    Authors: Shariq Iqbal, Jonathan Tremblay, Thang To, Jia Cheng, Erik Leitch, Andy Campbell, Kirby Leung, Duncan McKay, Stan Birchfield

    Abstract: We address the problem of directional semantic grasping, that is, grasping a specific object from a specific direction. We approach the problem using deep reinforcement learning via a double deep Q-network (DDQN) that learns to map downsampled RGB input images from a wrist-mounted camera to Q-values, which are then translated into Cartesian robot control commands via the cross-entropy method (CEM)… ▽ More

    Submitted 18 August, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: ICRA 2020. Video is at https://youtu.be/bjJLtNdVj9w

  48. arXiv:1905.04957  [pdf, other

    cs.CV

    Few-Shot Viewpoint Estimation

    Authors: Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz

    Abstract: Viewpoint estimation for known categories of objects has been improved significantly thanks to deep networks and large datasets, but generalization to unknown categories is still very challenging. With an aim towards improving performance on unknown categories, we introduce the problem of category-level few-shot viewpoint estimation. We design a novel framework to successfully train viewpoint netw… ▽ More

    Submitted 31 July, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

    Comments: BMVC 2019

  49. arXiv:1903.09254  [pdf, other

    cs.CV

    CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification

    Authors: Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang

    Abstract: Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. To the bes… ▽ More

    Submitted 5 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

    Comments: Accepted for oral presentation at CVPR 2019 with review ratings of 2 strong accepts and 1 accept (work done during an internship at NVIDIA)

  50. arXiv:1811.07049  [pdf, other

    cs.RO eess.SY

    RMPflow: A Computational Graph for Automatic Motion Policy Generation

    Authors: Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

    Abstract: We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can consistently combine these local polic… ▽ More

    Submitted 5 April, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: WAFR 2018