One-Step Image Translation with Text-to-Image Models

Parmar, Gaurav; Park, Taesung; Narasimhan, Srinivasa; Zhu, Jun-Yan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.12036 (cs)

[Submitted on 18 Mar 2024]

Title:One-Step Image Translation with Text-to-Image Models

Authors:Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu

View PDF HTML (experimental)

Abstract:In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at this https URL.

Comments:	Github: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2403.12036 [cs.CV]
	(or arXiv:2403.12036v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.12036

Submission history

From: Gaurav Parmar [view email]
[v1] Mon, 18 Mar 2024 17:59:40 UTC (28,648 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:One-Step Image Translation with Text-to-Image Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:One-Step Image Translation with Text-to-Image Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators