Skip to main content

Showing 1–21 of 21 results for author: Hodan, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  2. arXiv:2411.19167  [pdf, other

    cs.CV cs.AI cs.RO

    HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

    Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

    Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects. In addition to simple pick-up, observe, and put-down actions, the subjects perform actions typical for a kitchen, office, and living room environment. The recordings inclu… ▽ More

    Submitted 30 April, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

    Comments: CVPR 2025

  3. arXiv:2406.09598  [pdf, other

    cs.CV

    Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

    Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

    Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of object… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2403.17827  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

    Authors: Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin

    Abstract: Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible and semantically meaningful. Furthermore, generalization to unseen objects is hindered by the limited scale of available hand-object interaction datasets. In this paper, we propose a novel method, dubbed DiffH2O, which can synthesize realistic, one or tw… ▽ More

    Submitted 23 December, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://diffh2o.github.io/

    Journal ref: SIGGRAPH Asia Conference Papers, Article 145, 2024

  5. arXiv:2403.09799  [pdf, other

    cs.CV cs.RO

    BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

    Authors: Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 c… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.13075

  6. arXiv:2311.18809  [pdf, other

    cs.CV cs.RO

    FoundPose: Unseen Object Pose Estimation with Foundation Features

    Authors: Evin Pınar Örnek, Yann Labbé, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, Tomas Hodan

    Abstract: We propose FoundPose, a model-based method for 6D pose estimation of unseen objects from a single RGB image. The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training. In contrast, existing methods typically pre-train on large-scale, task-specific datasets in order to generalize to new objects and to bridge the image-to-model domain ga… ▽ More

    Submitted 19 July, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  7. arXiv:2307.11067  [pdf, other

    cs.CV

    CNOS: A Strong Baseline for CAD-based Novel Object Segmentation

    Authors: Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin, Vincent Lepetit, Tomas Hodan

    Abstract: We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along wi… ▽ More

    Submitted 25 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: ICCV 2023, R6D Workshop

  8. arXiv:2304.12301  [pdf, other

    cs.CV

    AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

    Authors: Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, Cem Keskin

    Abstract: We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities with challenging hand-object interactions. The dataset includes synchronized egocentric and exocentric images sampled from the recent Assembly101 dataset, in which participants assemble and disassemble take-apart toys. To obtain high-quality 3D hand pos… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page: https://assemblyhands.github.io/

  9. arXiv:2302.13075  [pdf, other

    cs.CV

    BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects

    Authors: Martin Sundermeyer, Tomas Hodan, Yann Labbe, Gu Wang, Eric Brachmann, Bertram Drost, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2022, the fourth in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB/RGB-D image. In 2022, we witnessed another significant improvement in the pose estimation accuracy -- the state of the art, which was 56.9 AR$_C$ in 2019 (Vidal et… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2009.07378

  10. arXiv:2211.16193  [pdf, other

    cs.CV

    In-Hand 3D Object Scanning from an RGB Sequence

    Authors: Shreyas Hampali, Tomas Hodan, Luan Tran, Lingni Ma, Cem Keskin, Vincent Lepetit

    Abstract: We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. Our method relies on a neural implicit surface representation that captures both the geometry and the appearance of the object, however, by contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known. Instead, we simultaneously optimize both the object shape and the… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  11. UmeTrack: Unified multi-view end-to-end hand tracking for VR

    Authors: Shangchen Han, Po-chen Wu, Yubo Zhang, Beibei Liu, Linguang Zhang, Zheng Wang, Weiguang Si, Peizhao Zhang, Yujun Cai, Tomas Hodan, Randi Cabezas, Luan Tran, Muzaffer Akbay, Tsz-Ho Yu, Cem Keskin, Robert Wang

    Abstract: Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role in VR interaction. Existing work in this space are limited to either producing root-relative (versus world space) 3D pose or rely on multiple stages such as generating heatmaps and kinematic optimization to obtain 3D pose. Moreover, the typical VR scenario, which involves multi-view tracking from… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: SIGGRAPH Asia 2022 Conference Papers, 8 pages

  12. arXiv:2208.00113  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Neural Correspondence Field for Object Pose Estimation

    Authors: Lin Huang, Tomas Hodan, Lingni Ma, Linguang Zhang, Luan Tran, Christopher Twigg, Po-Chen Wu, Junsong Yuan, Cem Keskin, Robert Wang

    Abstract: We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum. The move from pixels to 3D points, which is inspired by recent PIFu-… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: Accepted to ECCV 2022

  13. arXiv:2204.01695  [pdf, other

    cs.CV

    LISA: Learning Implicit Shape and Appearance of Hands

    Authors: Enric Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris Sweeney, Richard Newcombe, Lingni Ma

    Abstract: This paper proposes a do-it-all neural model of human hands, named LISA. The model can capture accurate hand shape and appearance, generalize to arbitrary hand subjects, provide dense surface correspondences, be reconstructed from images in the wild and easily animated. We train LISA by minimizing the shape and appearance losses on a large set of multi-view RGB image sequences annotated with coars… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Published at CVPR 2022

  14. arXiv:2112.15075  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Pose Estimation of Specific Rigid Objects

    Authors: Tomas Hodan

    Abstract: In this thesis, we address the problem of estimating the 6D pose of rigid objects from a single RGB or RGB-D input image, assuming that 3D models of the objects are available. This problem is of great importance to many application fields such as robotic manipulation, augmented reality, and autonomous driving. First, we propose EPOS, a method for 6D object pose estimation from an RGB image. The ke… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: Tomas Hodan's PhD thesis defended on July 7, 2021. Supervisor: Prof. Jiri Matas. Reviewers: Prof. Vincent Lepetit, Prof. Markus Vincze, Dr. Slobodan Ilic. A recording of the defense: https://youtu.be/WAQmubEXCRM

    Report number: http://hdl.handle.net/10467/93910

  15. arXiv:2009.07378  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    BOP Challenge 2020 on 6D Object Localization

    Authors: Tomas Hodan, Martin Sundermeyer, Bertram Drost, Yann Labbe, Eric Brachmann, Frank Michel, Carsten Rother, Jiri Matas

    Abstract: This paper presents the evaluation methodology, datasets, and results of the BOP Challenge 2020, the third in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB-D image. In 2020, to reduce the domain gap between synthetic training and real test RGB images, the participants were provided 350K photorealistic trainin… ▽ More

    Submitted 13 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: In ECCV 2020 Workshops Proceedings

  16. arXiv:2007.00799  [pdf, other

    cs.CV

    Learning Surrogates via Deep Embedding

    Authors: Yash Patel, Tomas Hodan, Jiri Matas

    Abstract: This paper proposes a technique for training a neural network by minimizing a surrogate loss that approximates the target evaluation metric, which may be non-differentiable. The surrogate is learned via a deep embedding where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. The effectiveness of the proposed technique is demonstra… ▽ More

    Submitted 17 July, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: ECCV 2020 camera-ready version

  17. arXiv:2004.00605  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    EPOS: Estimating 6D Pose of Objects with Symmetries

    Authors: Tomas Hodan, Daniel Barath, Jiri Matas

    Abstract: We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries. An object is represented by compact surface fragments which allow handling symmetries in a systematic manner. Correspondences between densely sampled pixels and… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020

  18. arXiv:1902.03334  [pdf, other

    cs.CV cs.AI cs.RO

    Photorealistic Image Synthesis for Object Instance Detection

    Authors: Tomas Hodan, Vibhav Vineet, Ran Gal, Emanuel Shalev, Jon Hanzelka, Treb Connell, Pedro Urbina, Sudipta N. Sinha, Brian Guenter

    Abstract: We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images. The proposed approach has three key ingredients: (1) 3D object models are rendered in 3D models of complete scenes with realistic materials and lighting, (2) plausible geometric configuration of objects and cameras in… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  19. A Summary of the 4th International Workshop on Recovering 6D Object Pose

    Authors: Tomas Hodan, Rigas Kouskouridas, Tae-Kyun Kim, Federico Tombari, Kostas Bekris, Bertram Drost, Thibault Groueix, Krzysztof Walas, Vincent Lepetit, Ales Leonardis, Carsten Steger, Frank Michel, Caner Sahin, Carsten Rother, Jiri Matas

    Abstract: This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both acade… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: In: Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8-9 and 14, 2018, Proceedings

  20. arXiv:1808.08319  [pdf, other

    cs.CV cs.AI cs.RO

    BOP: Benchmark for 6D Object Pose Estimation

    Authors: Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, Carsten Rother

    Abstract: We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation met… ▽ More

    Submitted 24 August, 2018; originally announced August 2018.

    Comments: ECCV 2018

  21. arXiv:1701.05498  [pdf, other

    cs.CV cs.AI cs.RO

    T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

    Authors: Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, Xenophon Zabulis

    Abstract: We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that… ▽ More

    Submitted 19 January, 2017; originally announced January 2017.

    Comments: WACV 2017