Panoptic Diffusion Models: co-generation of images and segmentation maps

Long, Yinghan; Roy, Kaushik

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.02929 (cs)

[Submitted on 4 Dec 2024 (v1), last revised 22 Feb 2025 (this version, v2)]

Title:Panoptic Diffusion Models: co-generation of images and segmentation maps

Authors:Yinghan Long, Kaushik Roy

View PDF HTML (experimental)

Abstract:Recently, diffusion models have demonstrated impressive capabilities in text-guided and image-conditioned image generation. However, existing diffusion models cannot simultaneously generate an image and a panoptic segmentation of objects and stuff from the prompt. Incorporating an inherent understanding of shapes and scene layouts can improve the creativity and realism of diffusion models. To address this limitation, we present Panoptic Diffusion Model (PDM), the first model designed to generate both images and panoptic segmentation maps concurrently. PDM bridges the gap between image and text by constructing segmentation layouts that provide detailed, built-in guidance throughout the generation process. This ensures the inclusion of categories mentioned in text prompts and enriches the diversity of segments within the background. We demonstrate the effectiveness of PDM across two architectures: a unified diffusion transformer and a two-stream transformer with a pretrained backbone. We propose a Multi-Scale Patching mechanism to generate high-resolution segmentation maps. Additionally, when ground-truth maps are available, PDM can function as a text-guided image-to-image generation model. Finally, we propose a novel metric for evaluating the quality of generated maps and show that PDM achieves state-of-the-art results in image generation with implicit scene control.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.02929 [cs.CV]
	(or arXiv:2412.02929v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.02929

Submission history

From: Yinghan Long [view email]
[v1] Wed, 4 Dec 2024 00:42:15 UTC (30,752 KB)
[v2] Sat, 22 Feb 2025 05:58:21 UTC (26,009 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Panoptic Diffusion Models: co-generation of images and segmentation maps

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Panoptic Diffusion Models: co-generation of images and segmentation maps

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators