Skip to main content

Showing 1–50 of 72 results for author: Oswald, M R

.
  1. arXiv:2506.08710  [pdf, ps, other

    cs.CV

    SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

    Authors: Mengjiao Ma, Qi Ma, Yue Li, Jiahuan Cheng, Runyi Yang, Bin Ren, Nikola Popovic, Mingqiang Wei, Nicu Sebe, Luc Van Gool, Theo Gevers, Martin R. Oswald, Danda Pani Paudel

    Abstract: 3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) general… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 15 pages, codes, data and benchmark will be released

  2. arXiv:2506.06909  [pdf, ps, other

    cs.CV

    Gaussian Mapping for Evolving Scenes

    Authors: Vladimir Yugay, Thies Kersten, Luca Carlone, Theo Gevers, Martin R. Oswald, Lukas Schmid

    Abstract: Mapping systems with novel view synthesis (NVS) capabilities are widely used in computer vision, with augmented reality, robotics, and autonomous driving applications. Most notably, 3D Gaussian Splatting-based systems show high NVS performance; however, many current approaches are limited to static scenes. While recent works have started addressing short-term dynamics (motion within the view of th… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  3. arXiv:2504.16545  [pdf, other

    cs.CV

    ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration

    Authors: Andrea Conti, Matteo Poggi, Valerio Cambareri, Martin R. Oswald, Stefano Mattoccia

    Abstract: Time-of-Flight (ToF) sensors provide efficient active depth sensing at relatively low power budgets; among such designs, only very sparse measurements from low-resolution sensors are considered to meet the increasingly limited power constraints of mobile and AR/VR devices. However, such extreme sparsity levels limit the seamless usage of ToF depth in SLAM. In this work, we propose ToF-Splatting, t… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  4. arXiv:2504.13167  [pdf, other

    cs.CV

    ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

    Authors: Zetong Zhang, Manuel Kaufmann, Lixin Xue, Jie Song, Martin R. Oswald

    Abstract: Creating a photorealistic scene and human reconstruction from a single monocular in-the-wild video figures prominently in the perception of a human-centric 3D world. Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses, and days of training time. In this work, we introduce a novel unified framework that simultaneously p… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025

    ACM Class: I.4.5

  5. arXiv:2504.01358  [pdf, other

    cs.GR cs.CV

    3D Gaussian Inverse Rendering with Approximated Global Illumination

    Authors: Zirui Wu, Jianteng Chen, Laijian Li, Shaoteng Wu, Zhikai Zhu, Kang Xu, Martin R. Oswald, Jie Song

    Abstract: 3D Gaussian Splatting shows great potential in reconstructing photo-realistic 3D scenes. However, these methods typically bake illumination into their representations, limiting their use for physically-based rendering and scene editing. Although recent inverse rendering approaches aim to decompose scenes into material and lighting components, they often rely on simplifying assumptions that fail wh… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  6. arXiv:2503.18052  [pdf, ps, other

    cs.CV

    SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

    Authors: Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel

    Abstract: Recognizing arbitrary or previously unseen categories is essential for comprehensive real-world 3D scene understanding. Currently, all existing methods rely on 2D or textual modalities during training or together at inference. This highlights the clear absence of a model capable of processing 3D data alone for learning semantics end-to-end, along with the necessary data to train such a model. Mean… ▽ More

    Submitted 3 June, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: Our code, model, and dataset will be released at https://unique1i.github.io/SceneSplat_webpage/

  7. arXiv:2503.17491  [pdf, other

    cs.RO cs.CV

    Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping

    Authors: Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti, Giorgio Grisetti, Martin R. Oswald

    Abstract: LiDARs provide accurate geometric measurements, making them valuable for ego-motion estimation and reconstruction tasks. Although its success, managing an accurate and lightweight representation of the environment still poses challenges. Both classic and NeRF-based solutions have to trade off accuracy over memory and processing times. In this work, we build on recent advancements in Gaussian Splat… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: submitted to ICCV 2025

  8. arXiv:2503.12572  [pdf, other

    cs.CV cs.AI cs.LG

    Deblur Gaussian Splatting SLAM

    Authors: Francesco Girlanda, Denys Rozumnyi, Marc Pollefeys, Martin R. Oswald

    Abstract: We present Deblur-SLAM, a robust RGB SLAM pipeline designed to recover sharp reconstructions from motion-blurred inputs. The proposed method bridges the strengths of both frame-to-frame and frame-to-model approaches to model sub-frame camera trajectories that lead to high-fidelity reconstructions in motion-blurred settings. Moreover, our pipeline incorporates techniques such as online loop closure… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  9. arXiv:2501.02771  [pdf, other

    cs.CV

    WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation

    Authors: Tianjian Jiang, Johsan Billingham, Sebastian Müksch, Juan Zarate, Nicolas Evans, Martin R. Oswald, Marc Pollefeys, Otmar Hilliges, Manuel Kaufmann, Jie Song

    Abstract: We present WorldPose, a novel dataset for advancing research in multi-person global pose estimation in the wild, featuring footage from the 2022 FIFA World Cup. While previous datasets have primarily focused on local poses, often limited to a single person or in constrained, indoor settings, the infrastructure deployed for this sporting event allows access to multiple fixed and moving cameras in d… ▽ More

    Submitted 20 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

  10. arXiv:2411.16785  [pdf, other

    cs.CV cs.AI cs.RO

    MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM

    Authors: Vladimir Yugay, Theo Gevers, Martin R. Oswald

    Abstract: Simultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limited to single-agent operation. Recent work has addressed this problem using a distributed neural scene representation. Unfortunately, existing methods are slow,… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  11. arXiv:2411.15043  [pdf, other

    cs.CV cs.RO

    Open-Vocabulary Online Semantic Mapping for SLAM

    Authors: Tomas Berriel Martins, Martin R. Oswald, Javier Civera

    Abstract: This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than… ▽ More

    Submitted 10 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

  12. arXiv:2410.10491  [pdf, other

    cs.CV

    TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning

    Authors: Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma, Yuki M. Asano, Martin R. Oswald, Cees G. M. Snoek

    Abstract: Spatial awareness is key to enable embodied multimodal AI systems. Yet, without vast amounts of spatial supervision, current Multimodal Large Language Models (MLLMs) struggle at this task. In this paper, we introduce TWIST & SCOUT, a framework that equips pre-trained MLLMs with visual grounding ability without forgetting their existing image and language understanding skills. To this end, we propo… ▽ More

    Submitted 20 March, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

  13. arXiv:2406.09415  [pdf, other

    cs.CV cs.LG

    An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

    Authors: Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen

    Abstract: This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias of locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by directly treating each individual pixel as a token and achieve highly performant results. This is substantially different from the popular design in… ▽ More

    Submitted 13 March, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: In Proceeding of ICLR'2025

  14. arXiv:2406.09126  [pdf, other

    cs.CV

    3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation

    Authors: Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald

    Abstract: Open-Vocabulary Segmentation (OVS) methods offer promising capabilities in detecting unseen object categories, but the category must be known and needs to be provided by a human, either via a text prompt or pre-labeled datasets, thus limiting their scalability. We propose 3D-AVS, a method for Auto-Vocabulary Segmentation of 3D point clouds for which the vocabulary is unknown and auto-generated for… ▽ More

    Submitted 30 March, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: v3 is the camera-ready version for CVPR 2025, while v2 serves as both a preview and the camera-ready version for the CVPR 2024 OpenSun3D Workshop

  15. arXiv:2405.16544  [pdf, other

    cs.CV

    Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

    Authors: Erik Sandström, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc Van Gool, Martin R. Oswald, Federico Tombari

    Abstract: 3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neur… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 21 pages

  16. arXiv:2403.19549  [pdf, other

    cs.CV cs.RO

    GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

    Authors: Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald

    Abstract: Recent advancements in RGB-only dense Simultaneous Localization and Mapping (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without ne… ▽ More

    Submitted 27 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  17. arXiv:2402.13255  [pdf, other

    cs.CV cs.RO

    How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

    Authors: Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi

    Abstract: Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Spla… ▽ More

    Submitted 27 March, 2025; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Updated to November 2024

  18. arXiv:2402.09944  [pdf, other

    cs.CV

    Loopy-SLAM: Dense Neural SLAM with Loop Closures

    Authors: Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc Van Gool, Martin R. Oswald

    Abstract: Neural RGBD SLAM techniques have shown promise in dense Simultaneous Localization And Mapping (SLAM), yet face challenges such as error accumulation during camera tracking resulting in distorted maps. In response, we introduce Loopy-SLAM that globally optimizes poses and the dense 3D model. We use frame-to-model tracking using a data-driven point-based submap generation method and trigger loop clo… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  19. arXiv:2401.10786  [pdf, other

    cs.CV

    Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

    Authors: Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, Martin R. Oswald

    Abstract: Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly focused on image or video generation, lacking exploration into the adaptability of scene generation for arbitrary views. Existing 3D generation works either ope… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Journal ref: CVPR 2024

  20. arXiv:2401.03771  [pdf, other

    cs.CV

    NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

    Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald

    Abstract: The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and d… ▽ More

    Submitted 15 September, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  21. arXiv:2312.10217  [pdf, other

    cs.CV

    T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning

    Authors: Weijie Wei, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald

    Abstract: The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-trai… ▽ More

    Submitted 22 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024

  22. arXiv:2312.10070  [pdf, other

    cs.CV cs.RO

    Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

    Authors: Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald

    Abstract: We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos. To this end, we propose a novel effective strategy for seeding new Gaussians for newly explored areas and their effective online optimization that is ind… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  23. arXiv:2312.04539  [pdf, other

    cs.CV

    Auto-Vocabulary Semantic Segmentation

    Authors: Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

    Abstract: Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, without training or fine-tuning. However, OVS methods typically require a human in the loop to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce Auto-Vocabulary Semantic Segmentation (AVS), advancing open-ended i… ▽ More

    Submitted 12 March, 2025; v1 submitted 7 December, 2023; originally announced December 2023.

  24. arXiv:2311.18512  [pdf, other

    cs.CV cs.LG

    Union-over-Intersections: Object Detection beyond Winner-Takes-All

    Authors: Aritra Bhowmik, Pascal Mettes, Martin R. Oswald, Cees G. M. Snoek

    Abstract: This paper revisits the problem of predicting box locations in object detection architectures. Typically, each box proposal or box query aims to directly maximize the intersection-over-union score with the ground truth, followed by a winner-takes-all non-maximum suppression where only the highest scoring box in each region is retained. We observe that both steps are sub-optimal: the first involves… ▽ More

    Submitted 19 December, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: 17 pages, 6 figures, 12 tables

  25. arXiv:2311.18068  [pdf, other

    cs.CV

    ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction

    Authors: Silvan Weder, Francis Engelmann, Johannes L. Schönberger, Akihito Seki, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the… ▽ More

    Submitted 3 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  26. arXiv:2310.07573  [pdf, other

    cs.CV

    Relational Prior Knowledge Graphs for Detection and Instance Segmentation

    Authors: Osman Ülger, Yu Wang, Ysbrand Galama, Sezer Karaoglu, Theo Gevers, Martin R. Oswald

    Abstract: Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal featu… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Published in ICCV2023 SG2RL Workshop

  27. arXiv:2310.05920  [pdf, other

    cs.CV

    SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation

    Authors: Duy-Kien Nguyen, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and pyramid designs remain a key factor for their empirical success. In this paper, we show that shifting the multiscale inductive… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: In Proceeding of TMLR'2025

  28. arXiv:2310.00401  [pdf, other

    cs.LG cs.RO

    Learning High-level Semantic-Relational Concepts for SLAM

    Authors: Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez

    Abstract: Recent works on SLAM extend their pose graphs with higher-level semantic concepts like Rooms exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs+), a pioneer in jointly leveraging semantic relationships in the factor optimizati… ▽ More

    Submitted 22 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  29. arXiv:2309.17162  [pdf, other

    cs.CV

    APNet: Urban-level Scene Segmentation of Aerial Images and Point Clouds

    Authors: Weijie Wei, Martin R. Oswald, Fatemeh Karimi Nejadasl, Theo Gevers

    Abstract: In this paper, we focus on semantic segmentation method for point clouds of urban scenes. Our fundamental concept revolves around the collaborative utilization of diverse scene representations to benefit from different context information and network architectures. To this end, the proposed network architecture, called APNet, is split into two branches: a point cloud branch and an aerial image bra… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV Workshop 2023 and selected as an oral

  30. Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery

    Authors: Florentin Liebmann, Marco von Atzigen, Dominik Stütz, Julian Wolf, Lukas Zingg, Daniel Suter, Laura Leoty, Hooman Esfandiari, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Mazda Farshad, Philipp Fürnstahl

    Abstract: Established surgical navigation systems for pedicle screw placement have been proven to be accurate, but still reveal limitations in registration or surgical guidance. Registration of preoperative data to the intraoperative anatomy remains a time-consuming, error-prone task that includes exposure to harmful radiation. Surgical guidance through conventional displays has well-known drawbacks, as inf… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  31. arXiv:2306.16917  [pdf, other

    cs.CV cs.LG cs.RO

    The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes

    Authors: David Recasens, Martin R. Oswald, Marc Pollefeys, Javier Civera

    Abstract: Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pi… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  32. arXiv:2306.11048  [pdf, other

    cs.CV

    UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM

    Authors: Erik Sandström, Kevin Ta, Luc Van Gool, Martin R. Oswald

    Abstract: We present an uncertainty learning framework for dense neural simultaneous localization and mapping (SLAM). Estimating pixel-wise uncertainties for the depth input of dense SLAM methods allows re-weighing the tracking and mapping losses towards image regions that contain more suitable information that is more reliable for SLAM. To this end, we propose an online framework for sensor uncertainty est… ▽ More

    Submitted 6 September, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: ICCV 2023 Workshop. 20 pages, 9 figures

  33. arXiv:2306.05411  [pdf, other

    cs.CV

    R-MAE: Regions Meet Masked Autoencoders

    Authors: Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

    Abstract: In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions. Specifically, we design an architecture which efficiently addresses the one-to-many mapping between images and regions,… ▽ More

    Submitted 4 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  34. arXiv:2305.02398  [pdf, other

    cs.CV

    Learning-based Relational Object Matching Across Views

    Authors: Cathrin Elich, Iro Armeni, Martin R. Oswald, Marc Pollefeys, Joerg Stueckler

    Abstract: Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium vie… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in IEEE International Conference on Robotics and Automation (ICRA), 2023

    MSC Class: 68T45 ACM Class: I.2.10; I.4.8

  35. arXiv:2304.06419  [pdf, other

    cs.CV cs.GR

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin R. Oswald

    Abstract: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  36. arXiv:2304.04278  [pdf, other

    cs.CV

    Point-SLAM: Dense Neural Point Cloud-based SLAM

    Authors: Erik Sandström, Yue Li, Luc Van Gool, Martin R. Oswald

    Abstract: We propose a dense neural simultaneous localization and mapping (SLAM) approach for monocular RGBD input which anchors the features of a neural scene representation in a point cloud that is iteratively generated in an input-dependent data-driven manner. We demonstrate that both tracking and mapping can be performed with the same point-based neural scene representation by minimizing an RGBD-based r… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: ICCV 2023. 18 Pages, 12 Figures

  37. arXiv:2303.17209  [pdf, other

    cs.CV

    Human from Blur: Human Pose Tracking from Blurry Images

    Authors: Yiming Zhao, Denys Rozumnyi, Jie Song, Otmar Hilliges, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose a method to estimate 3D human poses from substantially blurred images. The key idea is to tackle the inverse problem of image deblurring by modeling the forward problem with a 3D human model, a texture map, and a sequence of poses to describe human motion. The blurring process is then modeled by a temporal image aggregation step. Using a differentiable renderer, we can solve the inverse… ▽ More

    Submitted 25 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: typos and minor error fixed

  38. arXiv:2302.03594  [pdf, other

    cs.CV

    NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

    Authors: Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R. Oswald, Andreas Geiger, Marc Pollefeys

    Abstract: Neural implicit representations have recently become popular in simultaneous localization and mapping (SLAM), especially in dense visual SLAM. However, previous works in this direction either rely on RGB-D sensors, or require a separate monocular SLAM approach for camera tracking and do not produce high-fidelity dense 3D scene reconstruction. In this paper, we present NICER-SLAM, a dense RGB SLAM… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Video: https://youtu.be/tUXzqEZWg2w

  39. arXiv:2212.12395  [pdf, other

    cs.CV

    Detecting Objects with Context-Likelihood Graphs and Graph Refinement

    Authors: Aritra Bhowmik, Yu Wang, Nora Baka, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The goal of this paper is to detect objects by exploiting their interrelationships. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly. We first propose a novel way of creating a graphical representation of an image from inter-object relation priors and initial class predictions, we call a context-likelihood… ▽ More

    Submitted 27 September, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures. In Proceedings of International Conference on Computer Vision (ICCV) 2023

  40. arXiv:2212.07766  [pdf, other

    cs.CV

    DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients

    Authors: Rémi Pautrat, Daniel Barath, Viktor Larsson, Martin R. Oswald, Marc Pollefeys

    Abstract: Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at CVPR 2023

  41. NeuralMeshing: Differentiable Meshing of Implicit Neural Representations

    Authors: Mathias Vetsch, Sandro Lombardi, Marc Pollefeys, Martin R. Oswald

    Abstract: The generation of triangle meshes from point clouds, i.e. meshing, is a core task in computer graphics and computer vision. Traditional techniques directly construct a surface mesh using local decision heuristics, while some recent methods based on neural implicit representations try to leverage data-driven approaches for this meshing process. However, it is challenging to define a learnable repre… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in "44th DAGM German Conference on Pattern Recognition (GCPR 2022), Konstanz, Germany, September 27-30, 2022, Proceedings", and is available at https://doi.org/10.1007/978-3-031-16788-1_20

  42. arXiv:2207.11467  [pdf, other

    cs.CV cs.AI

    CompNVS: Novel View Synthesis with Scene Completion

    Authors: Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald

    Abstract: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. While generative neural approaches have demonstrated spectacular results on 2D images, they have not yet achieved similar photorealistic results in combination with scene completion where a spatial 3D scene understanding is essential. To this end, we propose a generative pipeline pe… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  43. arXiv:2204.03353  [pdf, other

    cs.CV

    Learning Online Multi-Sensor Depth Fusion

    Authors: Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool

    Abstract: Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics… ▽ More

    Submitted 21 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to ECCV 2022. 31 pages, 17 figures, 15 Tables

  44. arXiv:2203.15601  [pdf, other

    cs.CV cs.LG eess.IV

    Photographic Visualization of Weather Forecasts with Generative Adversarial Networks

    Authors: Christian Sigg, Flavia Cavallaro, Tobias Günther, Martin R. Oswald

    Abstract: Outdoor webcam images are an information-dense yet accessible visualization of past and present weather conditions, and are consulted by meteorologists and the general public alike. Weather forecasts, however, are still communicated as text, pictograms or charts. We therefore introduce a novel method that uses photographic images to also visualize future weather conditions. This is challenging,… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  45. NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis

    Authors: Zuria Bauer, Zuoyue Li, Sergio Orts-Escolano, Miguel Cazorla, Marc Pollefeys, Martin R. Oswald

    Abstract: Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 8 pages (main paper), 9 pages (supplementary material), 14 figures, 4 tables

    Journal ref: 2021 International Conference on 3D Vision (3DV)

  46. arXiv:2112.12130  [pdf, other

    cs.CV

    NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

    Authors: Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys

    Abstract: Neural implicit representations have recently shown encouraging results in various domains, including promising progress in simultaneous localization and mapping (SLAM). Nevertheless, existing methods produce over-smoothed scene reconstructions and have difficulty scaling up to large scenes. These limitations are mainly due to their simple fully-connected network architecture that does not incorpo… ▽ More

    Submitted 21 April, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: CVPR 2022, first two authors contributed equally. Project page: https://pengsongyou.github.io/nice-slam

  47. arXiv:2111.14465  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

    Abstract: We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using dif… ▽ More

    Submitted 7 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 camera-ready

    Journal ref: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  48. arXiv:2111.13087  [pdf, other

    cs.CV

    BoxeR: Box-Attention for 2D and 3D Transformers

    Authors: Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M. Snoek

    Abstract: In this paper, we propose a simple attention mechanism, we call box-attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on… ▽ More

    Submitted 25 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: In Proceeding of CVPR'2022

  49. arXiv:2110.06436  [pdf, other

    cs.CV

    Non-local Recurrent Regularization Networks for Multi-view Stereo

    Authors: Qingshan Xu, Martin R. Oswald, Wenbing Tao, Marc Pollefeys, Zhaopeng Cui

    Abstract: In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation. Since 3D cost volume filtering is usually memory-consuming, recurrent 2D cost map regularization has recently become popular and has shown great potential in reconstructing 3D models of different scales. However, existing recurrent methods only model the local dependencies in the depth domain,… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  50. arXiv:2108.13995  [pdf, other

    cs.CV

    RealisticHands: A Hybrid Model for 3D Hand Reconstruction

    Authors: Michael Seeber, Roi Poranne, Marc Polleyfeys, Martin R. Oswald

    Abstract: Estimating 3D hand meshes from RGB images robustly is a highly desirable task, made challenging due to the numerous degrees of freedom, and issues such as self similarity and occlusions. Previous methods generally either use parametric 3D hand models or follow a model-free approach. While the former can be considered more robust, e.g. to occlusions, they are less expressive. We propose a hybrid ap… ▽ More

    Submitted 1 February, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: International Conference on 3D Vision (3DV) 2021