Skip to main content

Showing 1–17 of 17 results for author: Leroy, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.17316  [pdf, other

    cs.CV

    Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

    Authors: Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud

    Abstract: We present Pow3r, a novel large 3D vision regression model that is highly versatile in the input modalities it accepts. Unlike previous feed-forward models that lack any mechanism to exploit known camera or scene priors at test time, Pow3r incorporates any combination of auxiliary information such as intrinsics, relative pose, dense or sparse depth, alongside input images, within a single network.… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  2. arXiv:2503.01661  [pdf, other

    cs.CV

    MUSt3R: Multi-view Network for Stereo 3D Reconstruction

    Authors: Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, Vincent Leroy

    Abstract: DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system.… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  3. arXiv:2409.19152  [pdf, other

    cs.CV

    MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

    Authors: Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jerome Revaud

    Abstract: Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too l… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  4. arXiv:2406.09756  [pdf, other

    cs.CV

    Grounding Image Matching in 3D with MASt3R

    Authors: Vincent Leroy, Yohann Cabon, Jérôme Revaud

    Abstract: Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. I… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  5. arXiv:2312.14132  [pdf, other

    cs.CV

    DUSt3R: Geometric 3D Vision Made Easy

    Authors: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

    Abstract: Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov… ▽ More

    Submitted 2 December, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: fixing the ref for StaticThings3D dataset

  6. arXiv:2311.09104  [pdf, other

    cs.CV

    Cross-view and Cross-pose Completion for 3D Human Understanding

    Authors: Matthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez

    Abstract: Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, colle… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  7. arXiv:2310.00632  [pdf, other

    cs.CV

    Win-Win: Training High-Resolution Vision Transformers from Two Windows

    Authors: Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel

    Abstract: Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter so… ▽ More

    Submitted 22 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  8. arXiv:2309.10748  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction

    Authors: Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Bregier, Matthieu Armando, Jean-Sebastien Franco, Gregory Rogez

    Abstract: Recent hand-object interaction datasets show limited real object variability and rely on fitting the MANO parametric model to obtain groundtruth hand shapes. To go beyond these limitations and spur further research, we introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. Following recent work, we consider a rigid hand-object sce… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Paper and Appendix, Accepted in ACVR workshop at ICCV conference

  9. arXiv:2306.07399  [pdf, other

    cs.CV

    4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements

    Authors: Matthieu Armando, Laurence Boissieux, Edmond Boyer, Jean-Sebastien Franco, Martin Humenberger, Christophe Legras, Vincent Leroy, Mathieu Marsot, Julien Pansiot, Sergi Pujades, Rim Rekik, Gregory Rogez, Anilkumar Swamy, Stefanie Wuhrer

    Abstract: This work presents 4DHumanOutfit, a new dataset of densely sampled spatio-temporal 4D human motion data of different actors, outfits and motions. The dataset is designed to contain different actors wearing different outfits while performing different motions in each outfit. In this way, the dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  10. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  11. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  12. arXiv:2210.00627  [pdf, other

    cs.CV

    MonoNHR: Monocular Neural Human Renderer

    Authors: Hongsuk Choi, Gyeongsik Moon, Matthieu Armando, Vincent Leroy, Kyoung Mu Lee, Gregory Rogez

    Abstract: Existing neural human rendering methods struggle with a single image input due to the lack of information in invisible areas and the depth ambiguity of pixels in visible areas. In this regard, we propose Monocular Neural Human Renderer (MonoNHR), a novel approach that renders robust free-viewpoint images of an arbitrary human given only a single image. MonoNHR is the first method that (i) renders… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: Hongsuk Choi and Gyeongsik Moon contributed equally, 15 pages including the reference and supplementary material

  13. arXiv:2012.02743  [pdf, other

    cs.CV

    SMPLy Benchmarking 3D Human Pose Estimation in the Wild

    Authors: Vincent Leroy, Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Grégory Rogez

    Abstract: Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as i… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 3DV 2020 Oral presentation

  14. arXiv:2008.09457  [pdf, other

    cs.CV

    DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild

    Authors: Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, Grégory Rogez

    Abstract: We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous wo… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  15. arXiv:2007.13867  [pdf, other

    cs.CV cs.LG

    Robust Image Retrieval-based Visual Localization using Kapture

    Authors: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, Jérôme Revaud, Philippe Rerole, Noé Pion, Cesar de Souza, Gabriela Csurka

    Abstract: Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul… ▽ More

    Submitted 7 January, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

  16. arXiv:1609.00222  [pdf, other

    cs.LG cs.AI cs.NE

    Ternary Neural Networks for Resource-Efficient AI Applications

    Authors: Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, Frédéric Pétrot

    Abstract: The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limits their deployability on ubiquitous computing devices such as smart phones, wearables and autonomous drones. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach based on a novel,… ▽ More

    Submitted 26 February, 2017; v1 submitted 1 September, 2016; originally announced September 2016.

  17. arXiv:1603.04792  [pdf, other

    cs.DB

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Authors: Martin Kirchgessner, Vincent Leroy, Sihem Amer-Yahia, Shashwat Mishra

    Abstract: Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy rosé wine" or "customers who buy paté also buy salted butter and sour bread." Unfortunat… ▽ More

    Submitted 15 March, 2016; originally announced March 2016.