Skip to main content

Showing 1–50 of 181 results for author: Niessner, M

.
  1. arXiv:2506.06271  [pdf, ps, other

    cs.CV

    BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading

    Authors: Jonathan Schmidt, Simon Giebenhain, Matthias Niessner

    Abstract: We introduce BecomingLit, a novel method for reconstructing relightable, high-resolution head avatars that can be rendered from novel viewpoints at interactive rates. Therefore, we propose a new low-cost light stage capture setup, tailored specifically towards capturing faces. Using this setup, we collect a novel dataset consisting of diverse multi-view sequences of numerous subjects under varying… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Project Page: see https://jonathsch.github.io/becominglit/ ; YouTube Video: see https://youtu.be/xPyeIqKdszA

  2. arXiv:2506.02846  [pdf, ps, other

    cs.CV

    PBR-SR: Mesh PBR Texture Super Resolution from 2D Image Priors

    Authors: Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Nießner

    Abstract: We present PBR-SR, a novel method for physically based rendering (PBR) texture super resolution (SR). It outputs high-resolution, high-quality PBR textures from low-resolution (LR) PBR input in a zero-shot manner. PBR-SR leverages an off-the-shelf super-resolution model trained on natural images, and iteratively minimizes the deviations between super-resolution priors and differentiable renderings… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project page: https://terencecyj.github.io/projects/PBR-SR/, Video: https://youtu.be/eaM5S3Mt1RM

  3. arXiv:2506.01799  [pdf, ps, other

    cs.CV

    WorldExplorer: Towards Generating Fully Navigable 3D Scenes

    Authors: Manuel-Andreas Schneider, Lukas Höllein, Matthias Nießner

    Abstract: Generating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigab… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: project page: see https://the-world-explorer.github.io/, video: see https://youtu.be/c1lBnwJWNmE

  4. arXiv:2505.07396  [pdf, ps, other

    cs.CV cs.LG

    TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

    Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

    Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

  5. arXiv:2505.05591  [pdf, ps, other

    cs.CV

    QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization

    Authors: Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner, Angela Dai

    Abstract: Surface reconstruction is fundamental to computer vision and graphics, enabling applications in 3D modeling, mixed reality, robotics, and more. Existing approaches based on volumetric rendering obtain promising results, but optimize on a per-scene basis, resulting in a slow optimization that can struggle to model under-observed or textureless regions. We introduce QuickSplat, which learns data-dri… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Project page: https://liu115.github.io/quicksplat, Video: https://youtu.be/2IA_gnFvFG8

  6. arXiv:2505.05376  [pdf, other

    cs.CV

    GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans

    Authors: Rachmadio Noval Lazuardi, Artem Sevastopolsky, Egor Zakharov, Matthias Niessner, Vanessa Sklyarova

    Abstract: We propose a novel method that reconstructs hair strands directly from colorless 3D scans by leveraging multi-modal hair orientation extraction. Hair strand reconstruction is a fundamental problem in computer vision and graphics that can be used for high-fidelity digital avatar synthesis, animation, and AR/VR applications. However, accurately recovering hair strands from raw scan data remains chal… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figures, 1 table

  7. arXiv:2505.00615  [pdf, ps, other

    cs.CV cs.AI

    Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

    Authors: Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: We address the 3D reconstruction of human faces from a single RGB image. To this end, we propose Pixel3DMM, a set of highly-generalized vision transformers which predict per-pixel geometric cues in order to constrain the optimization of a 3D morphable face model (3DMM). We exploit the latent features of the DINO foundation model, and introduce a tailored surface normal and uv-coordinate prediction… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Project Website: https://simongiebenhain.github.io/pixel3dmm/ ; Video: https://www.youtube.com/watch?v=BwxwEXJwUDc

  8. arXiv:2504.12292  [pdf, ps, other

    cs.CV cs.AI cs.LG

    SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians

    Authors: Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner

    Abstract: Accurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, w… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: For video demonstrations and additional materials please see https://nlml.github.io/sheap/

  9. arXiv:2504.01008  [pdf, other

    cs.CV cs.AI

    IntrinsiX: High-Quality PBR Generation using Image Priors

    Authors: Peter Kocsis, Lukas Höllein, Matthias Nießner

    Abstract: We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps. This enables the generated outputs to be used for content creation scenarios in core graphics applications that facilitate re-lighting, edit… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Project page: https://peter-kocsis.github.io/IntrinsiX/ Video: https://youtu.be/b0wVA44R93Y

    ACM Class: I.4.8; I.4.9; I.2.10

  10. arXiv:2503.15996  [pdf, other

    cs.GR cs.CV

    Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models

    Authors: Marc Benedí San Millán, Angela Dai, Matthias Nießner

    Abstract: Animation of humanoid characters is essential in various graphics applications, but requires significant time and cost to create realistic animations. We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes, leveraging strong generalized motion priors from generative video models -- as such video models contain powerful motion information covering a wide varie… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 16 pages, 10 figures

  11. arXiv:2503.12552  [pdf, other

    cs.CV cs.GR

    MTGS: Multi-Traversal Gaussian Splatting

    Authors: Tianyu Li, Yihang Qiu, Zhenhua Wu, Carl Lindström, Peng Su, Matthias Nießner, Hongyang Li

    Abstract: Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal rec… ▽ More

    Submitted 22 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2503.01425  [pdf, other

    cs.GR cs.CV

    MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing

    Authors: Haoxuan Li, Ziya Erkoc, Lei Li, Daniele Sirigatti, Vladyslav Rozov, Angela Dai, Matthias Nießner

    Abstract: We introduce MeshPad, a generative approach that creates 3D meshes from sketch inputs. Building on recent advances in artist-designed triangle mesh generation, our approach addresses the need for interactive mesh creation. To this end, we focus on enabling consistent edits by decomposing editing into 'deletion' of regions of a mesh, followed by 'addition' of new mesh geometry. Both operations are… ▽ More

    Submitted 17 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Project page: https://derkleineli.github.io/meshpad/ Video: https://www.youtube.com/watch?v=_T6UTGTMZ1E

  13. arXiv:2502.20220  [pdf, other

    cs.CV

    Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars

    Authors: Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, Shunsuke Saito

    Abstract: Traditionally, creating photo-realistic 3D head avatars requires a studio-level multi-view capture setup and expensive optimization during test-time, limiting the use of digital human doubles to the VFX industry or offline renderings. To address this shortcoming, we present Avat3r, which regresses a high-quality and animatable 3D head avatar from just a few input images, vastly reducing compute… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Project website: https://tobias-kirschstein.github.io/avat3r/, Video: https://youtu.be/P3zNVx15gYs

  14. arXiv:2501.16312  [pdf, other

    cs.CV

    LinPrim: Linear Primitives for Differentiable Volumetric Rendering

    Authors: Nicolas von Lützow, Matthias Nießner

    Abstract: Volumetric rendering has become central to modern novel view synthesis methods, which use differentiable rendering to optimize 3D scene representations directly from observed views. While many recent works build on NeRF or 3D Gaussians, we explore an alternative volumetric scene representation. More specifically, we introduce two new scene representations based on linear primitives - octahedra and… ▽ More

    Submitted 23 April, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: Project page: https://nicolasvonluetzow.github.io/LinPrim - Project video: https://youtu.be/NRRlmFZj5KQ

  15. arXiv:2412.10294  [pdf, other

    cs.CV

    Coherent 3D Scene Diffusion From a Single RGB Image

    Authors: Manuel Dahnert, Angela Dai, Norman Müller, Matthias Nießner

    Abstract: We present a novel diffusion-based approach for coherent 3D scene reconstruction from a single RGB image. Our method utilizes an image-conditioned 3D scene diffusion model to simultaneously denoise the 3D poses and geometries of all objects within the scene. Motivated by the ill-posed nature of the task and to obtain consistent scene reconstruction results, we learn a generative scene prior by con… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: Project Page: https://www.manuel-dahnert.com/research/scene-diffusion - Accepted at NeurIPS 2024

  16. arXiv:2412.10209  [pdf, other

    cs.CV cs.AI cs.GR

    GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

    Authors: Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Niessner

    Abstract: We propose a novel approach for reconstructing animatable 3D Gaussian avatars from monocular videos captured by commodity devices like smartphones. Photorealistic 3D head avatar reconstruction from such recordings is challenging due to limited observations, which leaves unobserved regions under-constrained and can lead to artifacts in novel views. To address this problem, we introduce a multi-view… ▽ More

    Submitted 14 April, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Paper Video: https://youtu.be/QuIYTljvhyg Project Page: https://tangjiapeng.github.io/projects/GAF

  17. arXiv:2412.06592  [pdf, other

    cs.CV cs.GR

    PrEditor3D: Fast and Precise 3D Shape Editing

    Authors: Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang

    Abstract: We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered. To this end, we first project the 3D object onto 4-view images and perform synchronized multi-view image editing along with user-guided text prompts and user-provide… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Project Page: https://ziyaerkoc.com/preditor3d/ Video: https://www.youtube.com/watch?v=Ty2xXaEuewI

  18. arXiv:2411.18675  [pdf, other

    cs.CV cs.AI cs.GR cs.SD eess.AS

    GaussianSpeech: Audio-Driven Gaussian Avatars

    Authors: Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion s… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Paper Video: https://youtu.be/2VqYoFlYcwQ Project Page: https://shivangi-aneja.github.io/projects/gaussianspeech

  19. L3DG: Latent 3D Gaussian Diffusion

    Authors: Barbara Roessle, Norman Müller, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Angela Dai, Matthias Nießner

    Abstract: We propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: SIGGRAPH Asia 2024, project page: https://barbararoessle.github.io/l3dg , video: https://youtu.be/UHEEiXCYeLU

  20. arXiv:2409.15875  [pdf, other

    cs.CV

    Zero-Shot Detection of AI-Generated Images

    Authors: Davide Cozzolino, Giovanni Poggi, Matthias Nießner, Luisa Verdoliva

    Abstract: Detecting AI-generated images has become an extraordinarily difficult challenge as new generative architectures emerge on a daily basis with more and more capabilities and unprecedented realism. New versions of many commercial tools, such as DALLE, Midjourney, and Stable Diffusion, have been released recently, and it is impractical to continually update and retrain supervised forensic detectors to… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  21. arXiv:2409.12892  [pdf, other

    cs.CV

    3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

    Authors: Lukas Höllein, Aljaž Božič, Michael Zollhöfer, Matthias Nießner

    Abstract: We present 3DGS-LM, a new method that accelerates the reconstruction of 3D Gaussian Splatting (3DGS) by replacing its ADAM optimizer with a tailored Levenberg-Marquardt (LM). Existing methods reduce the optimization time by decreasing the number of Gaussians or by improving the implementation of the differentiable rasterizer. However, they still rely on the ADAM optimizer to fit Gaussian parameter… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: project page: https://lukashoel.github.io/3DGS-LM, video: https://www.youtube.com/watch?v=tDiGuGMssg8, code: https://github.com/lukasHoel/3DGS-LM

  22. arXiv:2409.08215  [pdf, other

    cs.CV cs.AI

    LT3SD: Latent Trees for 3D Scene Diffusion

    Authors: Quan Meng, Lei Li, Matthias Nießner, Angela Dai

    Abstract: We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry an… ▽ More

    Submitted 1 May, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: Project page: https://quan-meng.github.io/projects/lt3sd/ Video: https://youtu.be/AJ5sG9VyjGA

  23. arXiv:2408.13508  [pdf, other

    cs.CV

    G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles

    Authors: Adil Meric, Umut Kocasari, Matthias Nießner, Barbara Roessle

    Abstract: Neural Radiance Fields (NeRF) have emerged as a powerful tool for creating highly detailed and photorealistic scenes. Existing methods for NeRF-based 3D style transfer need extensive per-scene optimization for single or multiple styles, limiting the applicability and efficiency of 3D style transfer. In this work, we overcome the limitations of existing methods by rendering stylized novel views fro… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: GCPR 2024, Project page: https://mericadil.github.io/G3DST/

  24. arXiv:2408.11697  [pdf, other

    cs.CV

    Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors

    Authors: Paul Ungermann, Armin Ettenhofer, Matthias Nießner, Barbara Roessle

    Abstract: 3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: GCPR 2024, Project Page: https://paulungermann.github.io/Robust3DGaussians , Video: https://www.youtube.com/watch?v=P9unyR7yK3E

  25. arXiv:2406.18524  [pdf, other

    cs.CV

    MultiDiff: Consistent Novel View Synthesis from a Single Image

    Authors: Norman Müller, Katja Schwarz, Barbara Roessle, Lorenzo Porzi, Samuel Rota Bulò, Matthias Nießner, Peter Kontschieder

    Abstract: We introduce MultiDiff, a novel approach for consistent novel view synthesis of scenes from a single RGB image. The task of synthesizing novel views from a single reference image is highly ill-posed by nature, as there exist multiple, plausible explanations for unobserved areas. To address this issue, we incorporate strong priors in form of monocular depth predictors and video-diffusion models. Mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://sirwyver.github.io/MultiDiff Video: https://youtu.be/zBC4z4qXW_4 - CVPR 2024

  26. GGHead: Fast and Generalizable 3D Gaussian Heads

    Authors: Tobias Kirschstein, Simon Giebenhain, Jiapeng Tang, Markos Georgopoulos, Matthias Nießner

    Abstract: Learning 3D head priors from large 2D image collections is an important step towards high-quality 3D-aware human modeling. A core requirement is an efficient architecture that scales well to large-scale datasets and large image resolutions. Unfortunately, existing 3D GANs struggle to scale to generate samples at high resolutions due to their relatively slow train and render speeds, and typically h… ▽ More

    Submitted 24 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://tobias-kirschstein.github.io/gghead/ ; YouTube Video: https://youtu.be/M5vq3DoZ7RI

  27. arXiv:2405.19331  [pdf, other

    cs.CV cs.AI cs.GR

    NPGA: Neural Parametric Gaussian Avatars

    Authors: Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven appro… ▽ More

    Submitted 13 September, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: see https://simongiebenhain.github.io/NPGA/ ; Youtube Video: see https://youtu.be/t0S0OK7WnA4

    Journal ref: SIGGRAPH Asia 2024 Conference Papers (SA Conference Papers '24), December 3-6, 2024, Tokyo, Japan

  28. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  29. arXiv:2403.19319  [pdf, other

    cs.CV

    Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

    Authors: Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner

    Abstract: We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues.… ▽ More

    Submitted 5 September, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024, Project page: https://terencecyj.github.io/projects/Mesh2NeRF/ Video: https://youtu.be/SsFkhSuQYGM

  30. arXiv:2403.17550  [pdf, other

    cs.CV cs.LG cs.RO

    DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping

    Authors: Kutay Yılmaz, Matthias Nießner, Anastasiia Kornilova, Alexey Artemov

    Abstract: Recently, significant progress has been achieved in sensing real large-scale outdoor 3D environments, particularly by using modern acquisition equipment such as LiDAR sensors. Unfortunately, they are fundamentally limited in their ability to produce dense, complete 3D scenes. To address this issue, recent learning-based methods integrate neural implicit representations and optimizable feature grid… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  31. arXiv:2403.16318  [pdf, other

    cs.CV

    AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans

    Authors: Cedric Perauer, Laurenz Adrian Heidrich, Haifan Zhang, Matthias Nießner, Anastasiia Kornilova, Alexey Artemov

    Abstract: Recently, progress in acquisition equipment such as LiDAR sensors has enabled sensing increasingly spacious outdoor 3D environments. Making sense of such 3D acquisitions requires fine-grained scene understanding, such as constructing instance-based 3D scene segmentations. Commonly, a neural network is trained for this task; however, this requires access to a large, densely annotated dataset, which… ▽ More

    Submitted 28 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, to be published in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

    ACM Class: I.4.6; I.2.9

  32. arXiv:2403.10615  [pdf, other

    cs.CV cs.GR cs.LG

    LightIt: Illumination Modeling and Control for Diffusion Models

    Authors: Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, Yannick Hold-Geoffroy

    Abstract: We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce s… ▽ More

    Submitted 25 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Project page: https://peter-kocsis.github.io/LightIt/ Video: https://youtu.be/cCfSBD5aPLI

    ACM Class: I.4.8; I.2.10

  33. arXiv:2403.01807  [pdf, other

    cs.CV

    ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

    Authors: Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner

    Abstract: 3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained… ▽ More

    Submitted 29 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024, project page: https://lukashoel.github.io/ViewDiff/, video: https://www.youtube.com/watch?v=SdjoCqHzMMk, code: https://github.com/facebookresearch/ViewDiff

  34. arXiv:2401.06614  [pdf, other

    cs.CV

    Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

    Authors: Wei Cao, Chang Luo, Biao Zhang, Matthias Nießner, Jiapeng Tang

    Abstract: We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these cha… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  35. arXiv:2312.14140  [pdf, other

    cs.CV

    HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs

    Authors: Artem Sevastopolsky, Philip-William Grassal, Simon Giebenhain, ShahRukh Athar, Luisa Verdoliva, Matthias Niessner

    Abstract: Current advances in human head modeling allow the generation of plausible-looking 3D head models via neural representations, such as NeRFs and SDFs. Nevertheless, constructing complete high-fidelity head models with explicitly controlled animation remains an issue. Furthermore, completing the head geometry based on a partial observation, e.g., coming from a depth sensor, while preserving a high le… ▽ More

    Submitted 31 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 2nd version includes updated method and results. Project page: https://seva100.github.io/headcraft. Video: https://youtu.be/uBeBT2f1CL0. 24 pages, 21 figures, 3 tables

  36. arXiv:2312.12274  [pdf, other

    cs.CV cs.AI cs.GR

    Intrinsic Image Diffusion for Indoor Single-view Material Estimation

    Authors: Peter Kocsis, Vincent Sitzmann, Matthias Nießner

    Abstract: We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Project page: https://peter-kocsis.github.io/IntrinsicImageDiffusion/ Video: https://youtu.be/lz0meJlj5cA

    ACM Class: I.4.8; I.2.10

  37. arXiv:2312.11417  [pdf, other

    cs.CV

    PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models

    Authors: Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, Matthias Nießner

    Abstract: We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geomet… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.08459  [pdf, other

    cs.CV cs.AI cs.GR cs.SD eess.AS

    FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

    Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coh… ▽ More

    Submitted 17 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Paper Video: https://youtu.be/7Jf0kawrA3Q Project Page: https://shivangi-aneja.github.io/projects/facetalk/

    Journal ref: CVPR 2024

  39. arXiv:2312.07231  [pdf, other

    cs.CV cs.AI cs.LG

    Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

    Authors: Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li

    Abstract: Diffusion Transformers have recently shown remarkable effectiveness in generating high-quality 3D point clouds. However, training voxel-based diffusion models for high-resolution 3D voxels remains prohibitively expensive due to the cubic complexity of attention operators, which arises from the additional dimension of voxels. Motivated by the inherent redundancy of 3D compared to 2D, we propose Fas… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://dit-3d.github.io/FastDiT-3D/

  40. arXiv:2312.06740  [pdf, other

    cs.CV

    MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

    Authors: Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry code… ▽ More

    Submitted 29 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project Page: see https://simongiebenhain.github.io/MonoNPHM/ ; Video: see https://youtu.be/n-wjaC3UIeE

  41. arXiv:2312.02069  [pdf, other

    cs.CV

    GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

    Authors: Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner

    Abstract: We introduce GaussianAvatars, a new method to create photorealistic head avatars that are fully controllable in terms of expression, pose, and viewpoint. The core idea is a dynamic 3D representation based on 3D Gaussian splats that are rigged to a parametric morphable face model. This combination facilitates photorealistic rendering while allowing for precise animation control via the underlying p… ▽ More

    Submitted 28 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://shenhanqian.github.io/gaussian-avatars

  42. arXiv:2312.01068  [pdf, other

    cs.CV

    DPHMs: Diffusion Parametric Head Models for Depth-based Tracking

    Authors: Jiapeng Tang, Angela Dai, Yinyu Nie, Lev Markhasin, Justus Thies, Matthias Niessner

    Abstract: We introduce Diffusion Parametric Head Models (DPHMs), a generative model that enables robust volumetric head reconstruction and tracking from monocular depth sequences. While recent volumetric head models, such as NPHMs, can now excel in representing high-fidelity head geometries, tracking and reconstructing heads from real-world single-view depth sequences remains very challenging, as the fittin… ▽ More

    Submitted 8 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: CVPR 2024; homepage: https://tangjiapeng.github.io/projects/DPHMs/

  43. arXiv:2312.00195  [pdf, other

    cs.CV

    Raising the Bar of AI-generated Image Detection with CLIP

    Authors: Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva

    Abstract: The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  44. arXiv:2311.18635  [pdf, other

    cs.CV

    DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

    Authors: Tobias Kirschstein, Simon Giebenhain, Matthias Nießner

    Abstract: DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person, offering intuitive control over both pose and expression. We propose a diffusion-based neural renderer that leverages generic 2D priors to produce compelling images of faces. For coarse guidance of the expression and head pose, we render a neural parametric head model (NPHM) from the target viewpoint, which acts as a proxy ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project Page: https://tobias-kirschstein.github.io/diffusion-avatars/ , Video: https://youtu.be/nSjDiiTnp2E

  45. arXiv:2311.18494  [pdf, other

    cs.CV

    PRS: Sharp Feature Priors for Resolution-Free Surface Remeshing

    Authors: Natalia Soboleva, Olga Gorbunova, Maria Ivanova, Evgeny Burnaev, Matthias Nießner, Denis Zorin, Alexey Artemov

    Abstract: Surface reconstruction with preservation of geometric features is a challenging computer vision task. Despite significant progress in implicit shape reconstruction, state-of-the-art mesh extraction methods often produce aliased, perceptually distorted surfaces and lack scalability to high-resolution 3D shapes. We present a data-driven approach for automatic feature detection and remeshing that req… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  46. arXiv:2311.17261  [pdf, other

    cs.CV

    SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

    Authors: Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

    Abstract: We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization proble… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Project website: https://daveredrum.github.io/SceneTex/

  47. arXiv:2311.15475  [pdf, other

    cs.CV cs.LG

    MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

    Authors: Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner

    Abstract: We introduce MeshGPT, a new approach for generating triangle meshes that reflects the compactness typical of artist-created meshes, in contrast to dense triangle meshes extracted by iso-surfacing methods from neural fields. Inspired by recent advances in powerful large language models, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles. We fir… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Project Page: https://nihalsid.github.io/mesh-gpt/, Video: https://youtu.be/UV90O1_69_o

  48. arXiv:2310.19516  [pdf, other

    cs.CV

    Generating Context-Aware Natural Answers for Questions in 3D Scenes

    Authors: Mohammed Munzer Dwedari, Matthias Niessner, Dave Zhenyu Chen

    Abstract: 3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the langu… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  49. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  50. arXiv:2308.11417  [pdf, other

    cs.CV

    ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

    Authors: Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, Angela Dai

    Abstract: We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of sem… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023. Video: https://youtu.be/E6P9e2r6M8I , Project page: https://cy94.github.io/scannetpp/