MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Chen, Yuedong; Zheng, Chuanxia; Xu, Haofei; Zhuang, Bohan; Vedaldi, Andrea; Cham, Tat-Jen; Cai, Jianfei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.04924 (cs)

[Submitted on 7 Nov 2024]

Title:MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Authors:Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai

View PDF HTML (experimental)

Abstract:We introduce MVSplat360, a feed-forward approach for 360° novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360° NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model. The video results are available on our project page: this https URL.

Comments:	NeurIPS 2024, Project page: this https URL, Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.04924 [cs.CV]
	(or arXiv:2411.04924v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.04924

Submission history

From: Yuedong Chen [view email]
[v1] Thu, 7 Nov 2024 17:59:31 UTC (9,368 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators