CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Liu, Jingyu; Xiong, Wenhan; Jones, Ian; Nie, Yixin; Gupta, Anchit; Oğuz, Barlas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.03565v1 (cs)

[Submitted on 7 Mar 2023 (this version), latest version 2 Jun 2023 (v2)]

Title:CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Authors:Jingyu Liu, Wenhan Xiong, Ian Jones, Yixin Nie, Anchit Gupta, Barlas Oğuz

View PDF

Abstract:Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as a home for immersive 3D experiences, or be used to train embodied agents. Existing methods for this task rely on labeled categories of furniture, e.g. bed, chair or table, to generate contextually relevant combinations of furniture. Whether heuristic or learned, these methods ignore instance-level attributes of objects such as color and style, and as a result may produce visually less coherent scenes. In this paper, we introduce an auto-regressive scene model which can output instance-level predictions, making use of general purpose image embedding based on CLIP. This allows us to learn visual correspondences such as matching color and style, and produce more plausible and aesthetically pleasing scenes. Evaluated on the 3D-FRONT dataset, our model achieves SOTA results in scene generation and improves auto-completion metrics by over 50%. Moreover, our embedding-based approach enables zero-shot text-guided scene generation and editing, which easily generalizes to furniture not seen at training time.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.03565 [cs.CV]
	(or arXiv:2303.03565v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.03565

Submission history

From: Jingyu Liu [view email]
[v1] Tue, 7 Mar 2023 00:26:02 UTC (30,004 KB)
[v2] Fri, 2 Jun 2023 04:48:55 UTC (31,937 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators