Fine-grained Text to Image Synthesis

Ouyang, Xu; Chen, Ying; Zhu, Kaiyue; Agam, Gady

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.07196 (cs)

[Submitted on 10 Dec 2024 (v1), last revised 15 Dec 2024 (this version, v2)]

Title:Fine-grained Text to Image Synthesis

Authors:Xu Ouyang, Ying Chen, Kaiyue Zhu, Gady Agam

View PDF HTML (experimental)

Abstract:Fine-grained text to image synthesis involves generating images from texts that belong to different categories. In contrast to general text to image synthesis, in fine-grained synthesis there is high similarity between images of different subclasses, and there may be linguistic discrepancy among texts describing the same image. Recent Generative Adversarial Networks (GAN), such as the Recurrent Affine Transformation (RAT) GAN model, are able to synthesize clear and realistic images from texts. However, GAN models ignore fine-grained level information. In this paper we propose an approach that incorporates an auxiliary classifier in the discriminator and a contrastive learning method to improve the accuracy of fine-grained details in images synthesized by RAT GAN. The auxiliary classifier helps the discriminator classify the class of images, and helps the generator synthesize more accurate fine-grained images. The contrastive learning method minimizes the similarity between images from different subclasses and maximizes the similarity between images from the same subclass. We evaluate on several state-of-the-art methods on the commonly used CUB-200-2011 bird dataset and Oxford-102 flower dataset, and demonstrated superior performance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.07196 [cs.CV]
	(or arXiv:2412.07196v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.07196

Submission history

From: Xu Ouyang [view email]
[v1] Tue, 10 Dec 2024 05:09:52 UTC (25,452 KB)
[v2] Sun, 15 Dec 2024 22:56:40 UTC (25,453 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Fine-grained Text to Image Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Fine-grained Text to Image Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators