Skip to main content

Showing 1–21 of 21 results for author: Revaud, J

.
  1. arXiv:2506.21348  [pdf, ps, other

    cs.CV

    PanSt3R: Multi-view Consistent Panoptic Segmentation

    Authors: Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka

    Abstract: Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  2. arXiv:2503.17316  [pdf, other

    cs.CV

    Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

    Authors: Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud

    Abstract: We present Pow3r, a novel large 3D vision regression model that is highly versatile in the input modalities it accepts. Unlike previous feed-forward models that lack any mechanism to exploit known camera or scene priors at test time, Pow3r incorporates any combination of auxiliary information such as intrinsics, relative pose, dense or sparse depth, alongside input images, within a single network.… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  3. arXiv:2503.01661  [pdf, other

    cs.CV

    MUSt3R: Multi-view Network for Stereo 3D Reconstruction

    Authors: Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, Vincent Leroy

    Abstract: DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system.… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  4. arXiv:2409.19152  [pdf, other

    cs.CV

    MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

    Authors: Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jerome Revaud

    Abstract: Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too l… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  5. arXiv:2406.09756  [pdf, other

    cs.CV

    Grounding Image Matching in 3D with MASt3R

    Authors: Vincent Leroy, Yohann Cabon, Jérôme Revaud

    Abstract: Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. I… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2312.14132  [pdf, other

    cs.CV

    DUSt3R: Geometric 3D Vision Made Easy

    Authors: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

    Abstract: Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov… ▽ More

    Submitted 2 December, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: fixing the ref for StaticThings3D dataset

  7. arXiv:2310.01897  [pdf, other

    cs.CV

    MFOS: Model-Free & One-Shot Object Pose Estimation

    Authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud

    Abstract: Existing learning-based methods for object pose estimation in RGB images are mostly model-specific or category based. They lack the capability to generalize to new object categories at test time, hence severely hindering their practicability and scalability. Notably, recent attempts have been made to solve this issue, but they still require accurate 3D data of the object surface at both train and… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  8. arXiv:2310.00632  [pdf, other

    cs.CV

    Win-Win: Training High-Resolution Vision Transformers from Two Windows

    Authors: Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel

    Abstract: Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter so… ▽ More

    Submitted 22 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  9. arXiv:2307.11702  [pdf, other

    cs.CV

    SACReg: Scene-Agnostic Coordinate Regression for Visual Localization

    Authors: Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel

    Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain limited to small scenes memorized during training, and thus hardly scale to realistic datasets and scenarios. In this paper, we propose a generalized SCR model trained once to be deployed in new test scenes, regardless of the… ▽ More

    Submitted 30 November, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

  10. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  11. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  12. arXiv:2108.11096  [pdf, other

    cs.CV cs.LG

    Learning From Long-Tailed Data With Noisy Labels

    Authors: Shyamgopal Karthik, Jérome Revaud, Boris Chidlovskii

    Abstract: Class imbalance and noisy labels are the norm rather than the exception in many large-scale classification datasets. Nevertheless, most works in machine learning typically assume balanced and clean data. There have been some recent attempts to tackle, on one side, the problem of learning from noisy labels and, on the other side, learning from long-tailed data. Each group of methods make simplifyin… ▽ More

    Submitted 12 September, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

  13. arXiv:2007.13867  [pdf, other

    cs.CV cs.LG

    Robust Image Retrieval-based Visual Localization using Kapture

    Authors: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, Jérôme Revaud, Philippe Rerole, Noé Pion, Cesar de Souza, Gabriela Csurka

    Abstract: Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul… ▽ More

    Submitted 7 January, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

  14. arXiv:1906.07589  [pdf, other

    cs.CV

    Learning with Average Precision: Training Image Retrieval with a Listwise Loss

    Authors: Jerome Revaud, Jon Almazan, Rafael Sampaio de Rezende, Cesar Roberto de Souza

    Abstract: Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  15. arXiv:1906.06195  [pdf, other

    cs.CV

    R2D2: Repeatable and Reliable Detector and Descriptor

    Authors: Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger

    Abstract: Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught u… ▽ More

    Submitted 17 June, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

  16. arXiv:1610.07940  [pdf, other

    cs.CV

    End-to-end Learning of Deep Visual Representations for Image Retrieval

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: i) noisy training data, ii) inappropriate deep architecture, and iii) suboptimal trai… ▽ More

    Submitted 5 May, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Accepted for publication at the International Journal of Computer Vision (IJCV). Extended version of our ECCV2016 paper "Deep Image Retrieval: Learning global representations for image search"

  17. arXiv:1604.01325  [pdf, other

    cs.CV

    Deep Image Retrieval: Learning global representations for image search

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the specific task of image retrieval. Our contribution is tw… ▽ More

    Submitted 28 July, 2016; v1 submitted 5 April, 2016; originally announced April 2016.

    Comments: ECCV 2016 version + additional results

  18. arXiv:1508.03755  [pdf, other

    cs.CV

    Beat-Event Detection in Action Movie Franchises

    Authors: Danila Potapov, Matthijs Douze, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid

    Abstract: While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

  19. arXiv:1506.07656  [pdf, other

    cs.CV

    DeepMatching: Hierarchical Deformable Dense Matching

    Authors: Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

    Abstract: We introduce a novel matching algorithm, called DeepMatching, to compute dense correspondences between images. DeepMatching relies on a hierarchical, multi-layer, correlational architecture designed for matching images and was inspired by deep convolutional approaches. The proposed matching algorithm can handle non-rigid deformations and repetitive textures and efficiently determines dense corresp… ▽ More

    Submitted 8 October, 2015; v1 submitted 25 June, 2015; originally announced June 2015.

  20. arXiv:1506.02588  [pdf, other

    cs.CV

    Circulant temporal encoding for video retrieval and temporal alignment

    Authors: Matthijs Douze, Jérôme Revaud, Jakob Verbeek, Hervé Jégou, Cordelia Schmid

    Abstract: We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently co… ▽ More

    Submitted 30 November, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

  21. arXiv:1501.02565  [pdf, other

    cs.CV

    EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

    Authors: Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

    Abstract: We propose a novel approach for optical flow estimation , targeted at large displacements with significant oc-clusions. It consists of two steps: i) dense matching by edge-preserving interpolation from a sparse set of matches; ii) variational energy minimization initialized with the dense matches. The sparse-to-dense interpolation relies on an appropriate choice of the distance, namely an edge-awa… ▽ More

    Submitted 19 May, 2015; v1 submitted 12 January, 2015; originally announced January 2015.