Skip to main content

Showing 1–8 of 8 results for author: Szymanowicz, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.14445  [pdf, other

    cs.CV

    Bolt3D: Generating 3D Scenes in Seconds

    Authors: Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, Philipp Henzler

    Abstract: We present a latent diffusion model for fast feed-forward 3D scene generation. Given one or more images, our model Bolt3D directly samples a 3D scene representation in less than seven seconds on a single GPU. We achieve this by leveraging powerful and scalable existing 2D diffusion network architectures to produce consistent high-fidelity 3D scene representations. To train this model, we create a… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Project page: https://szymanowiczs.github.io/bolt3d

  2. arXiv:2406.04343  [pdf, ps, other

    cs.CV

    Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

    Authors: Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi

    Abstract: We propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a… ▽ More

    Submitted 1 June, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.robots.ox.ac.uk/~vgg/research/flash3d/

  3. arXiv:2312.13150  [pdf, other

    cs.CV

    Splatter Image: Ultra-Fast Single-View 3D Reconstruction

    Authors: Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

    Abstract: We introduce the \method, an ultra-efficient approach for monocular 3D object reconstruction. Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images. We apply Gaussian Splatting to monocular reconstruction by learning a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. Our main… ▽ More

    Submitted 16 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://szymanowiczs.github.io/splatter-image.html . Code: https://github.com/szymanowiczs/splatter-image , Demo: https://huggingface.co/spaces/szymanowiczs/splatter_image

  4. arXiv:2306.07881  [pdf, other

    cs.CV

    Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

    Authors: Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

    Abstract: We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally correspondi… ▽ More

    Submitted 1 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: International Conference on Computer Vision 2023

  5. arXiv:2210.11594  [pdf, other

    cs.CV

    Photo-realistic 360 Head Avatars in the Wild

    Authors: Stanislaw Szymanowicz, Virginia Estellers, Tadas Baltrusaitis, Matthew Johnson

    Abstract: Delivering immersive, 3D experiences for human communication requires a method to obtain 360 degree photo-realistic avatars of humans. To make these experiences accessible to all, only commodity hardware, like mobile phone cameras, should be necessary to capture the data needed for avatar creation. For avatars to be rendered realistically from any viewpoint, we require training images and camera p… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: ECCV 2022 Workshop on Computer Vision for Metaverse

  6. arXiv:2208.00949  [pdf, other

    cs.GR cs.CV

    VolTeMorph: Realtime, Controllable and Generalisable Animation of Volumetric Representations

    Authors: Stephan J. Garbin, Marek Kowalski, Virginia Estellers, Stanislaw Szymanowicz, Shideh Rezaeifar, Jingjing Shen, Matthew Johnson, Julien Valentin

    Abstract: The recent increase in popularity of volumetric representations for scene reconstruction and novel view synthesis has put renewed focus on animating volumetric content at high visual quality and in real-time. While implicit deformation methods based on learned functions can produce impressive results, they are `black boxes' to artists and content creators, they require large amounts of training da… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: 18 pages, 21 figures

  7. arXiv:2112.05585  [pdf, other

    cs.CV

    Discrete neural representations for explainable anomaly detection

    Authors: Stanislaw Szymanowicz, James Charles, Roberto Cipolla

    Abstract: The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video. Understanding the cause of an anomalous event is crucial as the required response is dependant on its nature and severity. Recent works typically use object or action classifier to detect and provide labels for anomalous events. However, this constrains detection systems to a finite s… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Journal ref: Winter Conference on Applications of Computer Vision 2022

  8. arXiv:2106.08856  [pdf, other

    cs.CV

    X-MAN: Explaining multiple sources of anomalies in video

    Authors: Stanislaw Szymanowicz, James Charles, Roberto Cipolla

    Abstract: Our objective is to detect anomalies in video while also automatically explaining the reason behind the detector's response. In a practical sense, explainability is crucial for this task as the required response to an anomaly depends on its nature and severity. However, most leading methods (based on deep neural networks) are not interpretable and hide the decision making process in uninterpretabl… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2021