Skip to main content

Showing 1–15 of 15 results for author: Firman, M

.
  1. arXiv:2503.22430  [pdf, other

    cs.CV

    MVSAnywhere: Zero-Shot Multi-View Stereo

    Authors: Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, Jamie Watson

    Abstract: Computing accurate depth from multiple views is a fundamental and longstanding challenge in computer vision. However, most existing approaches do not generalize well across different domains and scene types (e.g. indoor vs. outdoor). Training a general-purpose multi-view stereo model is challenging and raises several questions, e.g. how to best make use of transformer-based architectures, how to i… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  2. arXiv:2406.18387  [pdf, other

    cs.CV cs.LG

    DoubleTake: Geometry Guided Depth Estimation

    Authors: Mohamed Sayed, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Guillermo Garcia-Hernando, Gabriel Brostow, Sara Vicente, Michael Firman

    Abstract: Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input… ▽ More

    Submitted 15 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: ECCV 2024 Version

  3. arXiv:2406.08960  [pdf, other

    cs.CV

    AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

    Authors: Jamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, Sara Vicente

    Abstract: Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric me… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  4. arXiv:2305.07014  [pdf, other

    cs.CV

    Virtual Occlusions Through Implicit Depth

    Authors: Jamie Watson, Mohamed Sayed, Zawar Qureshi, Gabriel J. Brostow, Sara Vicente, Oisin Mac Aodha, Michael Firman

    Abstract: For augmented reality (AR), it is important that virtual assets appear to `sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and t… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2023

  5. arXiv:2212.11966  [pdf, other

    cs.CV

    Removing Objects From Neural Radiance Fields

    Authors: Silvan Weder, Guillermo Garcia-Hernando, Aron Monszpart, Marc Pollefeys, Gabriel Brostow, Michael Firman, Sara Vicente

    Abstract: Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  6. arXiv:2208.14743  [pdf, other

    cs.CV

    SimpleRecon: 3D Reconstruction Without 3D Convolutions

    Authors: Mohamed Sayed, John Gibson, Jamie Watson, Victor Prisacariu, Michael Firman, Clément Godard

    Abstract: Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers,… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Comments: ECCV2022 version with improved timings. 14 pages + 5 pages of references

  7. arXiv:2106.02022  [pdf, other

    cs.CV

    Single Image Depth Prediction with Wavelet Decomposition

    Authors: Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit, Daniyar Turmukhambetov

    Abstract: We present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast with previous works, we show th… ▽ More

    Submitted 16 August, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: CVPR 2021

  8. arXiv:2104.14540  [pdf, other

    cs.CV

    The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

    Authors: Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, Michael Firman

    Abstract: Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to impr… ▽ More

    Submitted 14 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  9. arXiv:2104.03962  [pdf, other

    cs.CV

    Panoptic Segmentation Forecasting

    Authors: Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing

    Abstract: Our goal is to forecast the near future given a set of recent observations. We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents which need not only passively analyze an observation but also must react to it in real-time. Importantly, accurate forecasting hinges upon the chosen scene decomposition. We think that superior forecasting can be achiev… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  10. arXiv:2008.10634  [pdf, other

    cs.CV

    DiverseNet: When One Right Answer is not Enough

    Authors: Michael Firman, Neill D. F. Campbell, Lourdes Agapito, Gabriel J. Brostow

    Abstract: Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a sing… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: Presented at CVPR 2018

  11. arXiv:2008.01484  [pdf, other

    cs.CV

    Learning Stereo from Single Images

    Authors: Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J. Brostow, Michael Firman

    Abstract: Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding ste… ▽ More

    Submitted 20 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted as an oral presentation at ECCV 2020

  12. arXiv:2004.06376  [pdf, other

    cs.CV

    Footprints and Free Space from a Single Color Image

    Authors: Jamie Watson, Michael Firman, Aron Monszpart, Gabriel J. Brostow

    Abstract: Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can al… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020 as an oral presentation

  13. arXiv:1909.09051  [pdf, other

    cs.CV

    Self-Supervised Monocular Depth Hints

    Authors: Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov

    Abstract: Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data. The disadvantage, however, is that the photometric reprojection losses used with self-supervised learning typically have multiple local minima. These plausible-looking alternatives to ground truth can restrict wh… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted to ICCV 2019

  14. arXiv:1806.01260  [pdf, other

    cs.CV stat.ML

    Digging Into Self-Supervised Monocular Depth Estimation

    Authors: Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow

    Abstract: Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.… ▽ More

    Submitted 17 August, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: ICCV 19

  15. arXiv:1604.00999  [pdf, other

    cs.CV cs.RO

    RGBD Datasets: Past, Present and Future

    Authors: Michael Firman

    Abstract: Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relev… ▽ More

    Submitted 13 April, 2016; v1 submitted 4 April, 2016; originally announced April 2016.

    Comments: 8 pages excluding references (CVPR style)