Skip to main content

Showing 1–27 of 27 results for author: Chidlovskii, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21348  [pdf, ps, other

    cs.CV

    PanSt3R: Multi-view Consistent Panoptic Segmentation

    Authors: Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka

    Abstract: Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  2. arXiv:2503.08306  [pdf, other

    cs.RO cs.CV cs.LG

    Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach

    Authors: Steeven Janny, Hervé Poirier, Leonid Antsfeld, Guillaume Bono, Gianluca Monaci, Boris Chidlovskii, Francesco Giuliari, Alessio Del Bue, Christian Wolf

    Abstract: Progress in Embodied AI has made it possible for end-to-end-trained agents to navigate in photo-realistic environments with high-level reasoning and zero-shot or language-conditioned behavior, but benchmarks are still dominated by simulation. In this work, we focus on the fine-grained behavior of fast-moving real robots and present a large-scale experimental study involving \numepisodes{} navigati… ▽ More

    Submitted 15 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Journal ref: Computer Vision and Pattern Recognition Conference (CVPR) 2025

  3. arXiv:2503.01661  [pdf, other

    cs.CV

    MUSt3R: Multi-view Network for Stereo 3D Reconstruction

    Authors: Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, Vincent Leroy

    Abstract: DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system.… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  4. arXiv:2412.05881  [pdf, other

    cs.CV cs.AI

    3D-Consistent Image Inpainting with Diffusion Models

    Authors: Leonid Antsfeld, Boris Chidlovskii

    Abstract: We address the problem of 3D inconsistency of image inpainting based on diffusion models. We propose a generative model using image pairs that belong to the same scene. To achieve the 3D-consistent and semantically coherent inpainting, we modify the generative diffusion model by incorporating an alternative point of view of the scene into the denoising process. This creates an inductive bias that… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 8 pages, 9 figures, 4 tables

  5. arXiv:2406.11019  [pdf, other

    cs.CV

    Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry

    Authors: Boris Chidlovskii, Leonid Antsfeld

    Abstract: For the task of simultaneous monocular depth and visual odometry estimation, we propose learning self-supervised transformer-based models in two steps. Our first step consists in a generic pretraining to learn 3D geometry, using cross-view completion objective (CroCo), followed by self-supervised finetuning on non-annotated videos. We show that our self-supervised models can reach state-of-the-art… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 8 pages, to appear in ICRA'24

  6. arXiv:2402.13848  [pdf, other

    cs.CV cs.RO

    Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps

    Authors: Gianluca Monaci, Leonid Antsfeld, Boris Chidlovskii, Christian Wolf

    Abstract: Bird's-eye view (BEV) maps are an important geometrically structured representation widely used in robotics, in particular self-driving vehicles and terrestrial robots. Existing algorithms either require depth information for the geometric projection, which is not always reliably available, or are trained end-to-end in a fully supervised way to map visual first-person observations to BEV represent… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  7. arXiv:2401.14349  [pdf, other

    cs.RO cs.CV

    Learning to navigate efficiently and precisely in real environments

    Authors: Guillaume Bono, Hervé Poirier, Leonid Antsfeld, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

    Abstract: In the context of autonomous navigation of terrestrial robots, the creation of realistic models for agent dynamics and sensing is a widespread habit in the robotics literature and in commercial applications, where they are used for model based control and/or for localization and mapping. The more recent Embodied AI literature, on the other hand, focuses on modular or end-to-end agents trained in s… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  8. arXiv:2401.13800  [pdf, other

    cs.RO cs.AI

    Multi-Object Navigation in real environments using hybrid policies

    Authors: Assem Sadek, Guillaume Bono, Boris Chidlovskii, Atilla Baskurt, Christian Wolf

    Abstract: Navigation has been classically solved in robotics through the combination of SLAM and planning. More recently, beyond waypoint planning, problems involving significant components of (visual) high-level reasoning have been explored in simulated environments, mostly addressed with large-scale machine learning, in particular RL, offline-RL or imitation learning. These methods require the agent to le… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  9. arXiv:2312.14132  [pdf, other

    cs.CV

    DUSt3R: Geometric 3D Vision Made Easy

    Authors: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

    Abstract: Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov… ▽ More

    Submitted 2 December, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: fixing the ref for StaticThings3D dataset

  10. arXiv:2309.16634  [pdf, other

    cs.CV

    End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

    Authors: Guillaume Bono, Leonid Antsfeld, Boris Chidlovskii, Philippe Weinzaepfel, Christian Wolf

    Abstract: Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectN… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  11. arXiv:2307.16710  [pdf, other

    cs.RO

    Learning whom to trust in navigation: dynamically switching between classical and neural planning

    Authors: Sombit Dey, Assem Sadek, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

    Abstract: Navigation of terrestrial robots is typically addressed either with localization and mapping (SLAM) followed by classical planning on the dynamically created maps, or by machine learning (ML), often through end-to-end training with reinforcement learning (RL) or imitation learning (IL). Recently, modular designs have achieved promising results, and hybrid algorithms that combine ML with classical… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 8 pages including references. International Conference on Intelligent Robots and Systems (IROS 2023)

  12. arXiv:2302.06378  [pdf, other

    cs.CV

    Semantic Image Segmentation: Two Decades of Research

    Authors: Gabriela Csurka, Riccardo Volpi, Boris Chidlovskii

    Abstract: Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learn… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Pre-print of the book: G. Csurka, R. Volpi and B. Chidlovski: Semantic Image Segmentation: Two Decades of Research, FTCGV (14): No. 1-2, http://dx.doi.org/10.1561/0600000095. The authors retained the copyright and are allowed to post it on arXiv. Research only use, commercial use or systematic downloading (by robots or other automatic processes) is prohibited

  13. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  14. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  15. arXiv:2112.03241  [pdf, other

    cs.CV cs.AI

    Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey

    Authors: Gabriela Csurka, Riccardo Volpi, Boris Chidlovskii

    Abstract: Semantic segmentation plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. Yet, the state-of-the-art models rely on large amount of annotated samples, which are more expensive to obtain than in tasks such as image classification. Since unlabelled data is instead significantly cheaper to obtain, it is not su… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 33 pages

    ACM Class: I.4.6; I.2

  16. arXiv:2111.14666  [pdf, other

    cs.AI cs.RO

    An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments

    Authors: Assem Sadek, Guillaume Bono, Boris Chidlovskii, Christian Wolf

    Abstract: Visual navigation by mobile robots is classically tackled through SLAM plus optimal planning, and more recently through end-to-end training of policies implemented as deep networks. While the former are often limited to waypoint planning, but have proven their efficiency even on real physical environments, the latter solutions are most frequently employed in simulation, but have been shown to be a… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  17. arXiv:2108.11824  [pdf, other

    cs.RO cs.AI

    Magnetic Field Sensing for Pedestrian and Robot Indoor Positioning

    Authors: Leonid Antsfeld, Boris Chidlovskii

    Abstract: In this paper we address the problem of indoor localization using magnetic field data in two setups, when data is collected by (i) human-held mobile phone and (ii) by localization robots that perturb magnetic data with their own electromagnetic field. For the first setup, we revise the state of the art approaches and propose a novel extended pipeline to benefit from the presence of magnetic anomal… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  18. arXiv:2108.11096  [pdf, other

    cs.CV cs.LG

    Learning From Long-Tailed Data With Noisy Labels

    Authors: Shyamgopal Karthik, Jérome Revaud, Boris Chidlovskii

    Abstract: Class imbalance and noisy labels are the norm rather than the exception in many large-scale classification datasets. Nevertheless, most works in machine learning typically assume balanced and clean data. There have been some recent attempts to tackle, on one side, the problem of learning from noisy labels and, on the other side, learning from long-tailed data. Each group of methods make simplifyin… ▽ More

    Submitted 12 September, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

  19. arXiv:2106.11576  [pdf, other

    cs.CV cs.AI

    Universal Domain Adaptation in Ordinal Regression

    Authors: Boris Chidlovskii, Assem Sadek, Christian Wolf

    Abstract: We address the problem of universal domain adaptation (UDA) in ordinal regression (OR), which attempts to solve classification problems in which labels are not independent, but follow a natural order. We show that the UDA techniques developed for classification and based on the clustering assumption, under-perform in OR settings. We propose a method that complements the OR classifier with an auxil… ▽ More

    Submitted 25 August, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

  20. arXiv:2011.10799  [pdf, other

    cs.LG cs.NI cs.RO

    Deep Smartphone Sensors-WiFi Fusion for Indoor Positioning and Tracking

    Authors: Leonid Antsfeld, Boris Chidlovskii, Emilio Sansano-Sansano

    Abstract: We address the indoor localization problem, where the goal is to predict user's trajectory from the data collected by their smartphone, using inertial sensors such as accelerometer, gyroscope and magnetometer, as well as other environment and network sensors such as barometer and WiFi. Our system implements a deep learning based pedestrian dead reckoning (deep PDR) model that provides a high-rate… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    ACM Class: H.4

  21. arXiv:2011.10274  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Synthetic to Real Transfer for Localization and Navigational Tasks

    Authors: Maxime Pietrantoni, Boris Chidlovskii, Tomi Silander

    Abstract: Autonomous navigation consists in an agent being able to navigate without human intervention or supervision, it affects both high level planning and low level control. Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control. This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done… ▽ More

    Submitted 23 November, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

    ACM Class: I.2.9

  22. arXiv:2006.11658  [pdf, other

    cs.CV cs.LG

    Adversarial Transfer of Pose Estimation Regression

    Authors: Boris Chidlovskii, Assem Sadek

    Abstract: We address the problem of camera pose estimation in visual localization. Current regression-based methods for pose estimation are trained and evaluated scene-wise. They depend on the coordinate frame of the training dataset and show a low generalization across scenes and datasets. We identify the dataset shift an important barrier to generalization and consider transfer learning as an alternative… ▽ More

    Submitted 23 November, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: Published in ECCV'20 TASK-CV Workshop

    ACM Class: I.2.10

  23. arXiv:2004.13077  [pdf, other

    cs.CV cs.RO

    Self-Supervised Attention Learning for Depth and Ego-motion Estimation

    Authors: Assem Sadek, Boris Chidlovskii

    Abstract: We address the problem of depth and ego-motion estimation from image sequences. Recent advances in the domain propose to train a deep learning model for both tasks using image reconstruction in a self-supervised manner. We revise the assumptions and the limitations of the current approaches and propose two improvements to boost the performance of the depth and ego-motion estimation. We first use L… ▽ More

    Submitted 5 December, 2022; v1 submitted 27 April, 2020; originally announced April 2020.

  24. arXiv:1909.08962  [pdf, other

    cs.LG stat.ML

    Using Latent Codes for Class Imbalance Problem in Unsupervised Domain Adaptation

    Authors: Boris Chidlovskii

    Abstract: We address the problem of severe class imbalance in unsupervised domain adaptation, when the class spaces in source and target domains diverge considerably. Till recently, domain adaptation methods assumed the aligned class spaces, such that reducing distribution divergence makes the transfer between domains easier. Such an alignment assumption is invalidated in real world scenarios where some sou… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

  25. arXiv:1812.06873  [pdf, other

    cs.AI

    Learning Common Representation from RGB and Depth Images

    Authors: Giorgio Giannone, Boris Chidlovskii

    Abstract: We propose a new deep learning architecture for the tasks of semantic segmentation and depth prediction from RGB-D images. We revise the state of art based on the RGB and depth feature fusion, where both modalities are assumed to be available at train and test time. We propose a new architecture where the feature fusion is replaced with a common deep representation. Combined with an encoder-decode… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

    Comments: 7 pages, 3 figures, 2 tables

  26. arXiv:1712.08164  [pdf, other

    cs.LG cs.AI

    Multi-task learning of time series and its application to the travel demand

    Authors: Boris Chidlovskii

    Abstract: We address the problem of modeling and prediction of a set of temporal events in the context of intelligent transportation systems. To leverage the information shared by different events, we propose a multi-task learning framework. We develop a support vector regression model for joint learning of mutually dependent time series. It is the regularization-based multi-task learning previously develop… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

  27. arXiv:1712.06935  [pdf, other

    cs.AI

    Mining Smart Card Data for Travelers' Mini Activities

    Authors: Boris Chidlovskii

    Abstract: In the context of public transport modeling and simulation, we address the problem of mismatch between simulated transit trips and observed ones. We point to the weakness of the current travel demand modeling process; the trips it generates are over-optimistic and do not reflect the real passenger choices. We introduce the notion of mini activities the travelers do during the trips; they can expla… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.