Skip to main content

Showing 1–13 of 13 results for author: Antsfeld, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21348  [pdf, ps, other

    cs.CV

    PanSt3R: Multi-view Consistent Panoptic Segmentation

    Authors: Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka

    Abstract: Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  2. arXiv:2503.08306  [pdf, other

    cs.RO cs.CV cs.LG

    Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach

    Authors: Steeven Janny, Hervé Poirier, Leonid Antsfeld, Guillaume Bono, Gianluca Monaci, Boris Chidlovskii, Francesco Giuliari, Alessio Del Bue, Christian Wolf

    Abstract: Progress in Embodied AI has made it possible for end-to-end-trained agents to navigate in photo-realistic environments with high-level reasoning and zero-shot or language-conditioned behavior, but benchmarks are still dominated by simulation. In this work, we focus on the fine-grained behavior of fast-moving real robots and present a large-scale experimental study involving \numepisodes{} navigati… ▽ More

    Submitted 15 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Journal ref: Computer Vision and Pattern Recognition Conference (CVPR) 2025

  3. arXiv:2503.01661  [pdf, other

    cs.CV

    MUSt3R: Multi-view Network for Stereo 3D Reconstruction

    Authors: Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, Vincent Leroy

    Abstract: DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system.… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  4. arXiv:2412.05881  [pdf, other

    cs.CV cs.AI

    3D-Consistent Image Inpainting with Diffusion Models

    Authors: Leonid Antsfeld, Boris Chidlovskii

    Abstract: We address the problem of 3D inconsistency of image inpainting based on diffusion models. We propose a generative model using image pairs that belong to the same scene. To achieve the 3D-consistent and semantically coherent inpainting, we modify the generative diffusion model by incorporating an alternative point of view of the scene into the denoising process. This creates an inductive bias that… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 8 pages, 9 figures, 4 tables

  5. arXiv:2406.11019  [pdf, other

    cs.CV

    Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

    Authors: Boris Chidlovskii, Leonid Antsfeld

    Abstract: For the task of simultaneous monocular depth and visual odometry estimation, we propose learning self-supervised transformer-based models in two steps. Our first step consists in a generic pretraining to learn 3D geometry, using cross-view completion objective (CroCo), followed by self-supervised finetuning on non-annotated videos. We show that our self-supervised models can reach state-of-the-art… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 8 pages, to appear in ICRA'24

  6. arXiv:2402.13848  [pdf, other

    cs.CV cs.RO

    Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps

    Authors: Gianluca Monaci, Leonid Antsfeld, Boris Chidlovskii, Christian Wolf

    Abstract: Bird's-eye view (BEV) maps are an important geometrically structured representation widely used in robotics, in particular self-driving vehicles and terrestrial robots. Existing algorithms either require depth information for the geometric projection, which is not always reliably available, or are trained end-to-end in a fully supervised way to map visual first-person observations to BEV represent… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  7. arXiv:2401.14349  [pdf, other

    cs.RO cs.CV

    Learning to navigate efficiently and precisely in real environments

    Authors: Guillaume Bono, Hervé Poirier, Leonid Antsfeld, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

    Abstract: In the context of autonomous navigation of terrestrial robots, the creation of realistic models for agent dynamics and sensing is a widespread habit in the robotics literature and in commercial applications, where they are used for model based control and/or for localization and mapping. The more recent Embodied AI literature, on the other hand, focuses on modular or end-to-end agents trained in s… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  8. arXiv:2309.16634  [pdf, other

    cs.CV

    End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

    Authors: Guillaume Bono, Leonid Antsfeld, Boris Chidlovskii, Philippe Weinzaepfel, Christian Wolf

    Abstract: Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectN… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  9. arXiv:2306.03857  [pdf, other

    cs.RO cs.CV

    Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction

    Authors: Guillaume Bono, Leonid Antsfeld, Assem Sadek, Gianluca Monaci, Christian Wolf

    Abstract: Agents navigating in 3D environments require some form of memory, which should hold a compact and actionable representation of the history of observations useful for decision taking and planning. In most end-to-end learning approaches the representation is latent and usually does not have a clearly defined interpretation, whereas classical robotics addresses this with scene reconstruction resultin… ▽ More

    Submitted 29 September, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  10. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  11. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  12. arXiv:2108.11824  [pdf, other

    cs.RO cs.AI

    Magnetic Field Sensing for Pedestrian and Robot Indoor Positioning

    Authors: Leonid Antsfeld, Boris Chidlovskii

    Abstract: In this paper we address the problem of indoor localization using magnetic field data in two setups, when data is collected by (i) human-held mobile phone and (ii) by localization robots that perturb magnetic data with their own electromagnetic field. For the first setup, we revise the state of the art approaches and propose a novel extended pipeline to benefit from the presence of magnetic anomal… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  13. arXiv:2011.10799  [pdf, other

    cs.LG cs.NI cs.RO

    Deep Smartphone Sensors-WiFi Fusion for Indoor Positioning and Tracking

    Authors: Leonid Antsfeld, Boris Chidlovskii, Emilio Sansano-Sansano

    Abstract: We address the indoor localization problem, where the goal is to predict user's trajectory from the data collected by their smartphone, using inertial sensors such as accelerometer, gyroscope and magnetometer, as well as other environment and network sensors such as barometer and WiFi. Our system implements a deep learning based pedestrian dead reckoning (deep PDR) model that provides a high-rate… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    ACM Class: H.4