Quark: Real-time, High-resolution, and General Neural View Synthesis

Flynn, John; Broxton, Michael; Murmann, Lukas; Chai, Lucy; DuVall, Matthew; Godard, Clément; Heal, Kathryn; Kaza, Srinivas; Lombardi, Stephen; Luo, Xuan; Achar, Supreeth; Prabhu, Kira; Sun, Tiancheng; Tsai, Lynn; Overbeck, Ryan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.16680 (cs)

[Submitted on 25 Nov 2024]

Title:Quark: Real-time, High-resolution, and General Neural View Synthesis

Authors:John Flynn, Michael Broxton, Lukas Murmann, Lucy Chai, Matthew DuVall, Clément Godard, Kathryn Heal, Srinivas Kaza, Stephen Lombardi, Xuan Luo, Supreeth Achar, Kira Prabhu, Tiancheng Sun, Lynn Tsai, Ryan Overbeck

View PDF HTML (experimental)

Abstract:We present a novel neural algorithm for performing high-quality, high-resolution, real-time novel view synthesis. From a sparse set of input RGB images or videos streams, our network both reconstructs the 3D scene and renders novel views at 1080p resolution at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a wide variety of datasets and scenes and produces state-of-the-art quality for a real-time method. Our quality approaches, and in some cases surpasses, the quality of some of the top offline methods. In order to achieve these results we use a novel combination of several key concepts, and tie them together into a cohesive and effective algorithm. We build on previous works that represent the scene using semi-transparent layers and use an iterative learned render-and-refine approach to improve those layers. Instead of flat layers, our method reconstructs layered depth maps (LDMs) that efficiently represent scenes with complex depth and occlusions. The iterative update steps are embedded in a multi-scale, UNet-style architecture to perform as much compute as possible at reduced resolution. Within each update step, to better aggregate the information from multiple input views, we use a specialized Transformer-based network component. This allows the majority of the per-input image processing to be performed in the input image space, as opposed to layer space, further increasing efficiency. Finally, due to the real-time nature of our reconstruction and rendering, we dynamically create and discard the internal 3D geometry for each frame, generating the LDM for each view. Taken together, this produces a novel and effective algorithm for view synthesis. Through extensive evaluation, we demonstrate that we achieve state-of-the-art quality at real-time rates. Project page: this https URL

Comments:	SIGGRAPH Asia 2024 camera ready version; project page this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2411.16680 [cs.CV]
	(or arXiv:2411.16680v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.16680

Submission history

From: Lucy Chai [view email]
[v1] Mon, 25 Nov 2024 18:59:50 UTC (30,433 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Quark: Real-time, High-resolution, and General Neural View Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Quark: Real-time, High-resolution, and General Neural View Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators