TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Huang, Tianyu; Zeng, Yihan; Dong, Bowen; Xu, Hang; Xu, Songcen; Lau, Rynson W. H.; Zuo, Wangmeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.17175 (cs)

[Submitted on 29 Sep 2023 (v1), last revised 14 Mar 2024 (this version, v2)]

Title:TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Authors:Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W.H. Lau, Wangmeng Zuo

View PDF HTML (experimental)

Abstract:Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rather than using the text prompts as input directly, we suggest to inject dynamic noise into the latent space of given text prompts, i.e., Noisy Text Fields (NTFs). In this way, limited 3D data can be mapped to the appropriate range of textual latent space that is expanded by NTFs. To this end, an NTFGen module is proposed to model general text latent code in noisy fields. Meanwhile, an NTFBind module is proposed to align view-invariant image latent code to noisy fields, further supporting image-conditional 3D generation. To guide the conditional generation in both geometry and texture, multi-modal discrimination is constructed with a text-3D discriminator and a text-2.5D discriminator. Compared to previous methods, TextField3D includes three merits: 1) large vocabulary, 2) text consistency, and 3) low latency. Extensive experiments demonstrate that our method achieves a potential open-vocabulary 3D generation capability.

Comments:	Accepted by ICLR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.17175 [cs.CV]
	(or arXiv:2309.17175v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.17175

Submission history

From: Tianyu Huang [view email]
[v1] Fri, 29 Sep 2023 12:14:41 UTC (3,923 KB)
[v2] Thu, 14 Mar 2024 07:36:29 UTC (4,939 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators