LocInv: Localization-aware Inversion for Text-Guided Image Editing

Tang, Chuanming; Wang, Kai; Yang, Fei; van de Weijer, Joost

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.01496 (cs)

[Submitted on 2 May 2024]

Title:LocInv: Localization-aware Inversion for Text-Guided Image Editing

Authors:Chuanming Tang, Kai Wang, Fei Yang, Joost van de Weijer

View PDF HTML (experimental)

Abstract:Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. However, existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area, primarily due to inaccuracies in cross-attention maps. To address this problem, we propose Localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps in the denoising phases of the diffusion process. Through the dynamic updating of tokens corresponding to noun words in the textual input, we are compelling the cross-attention maps to closely align with the correct noun and adjective words in the text prompt. Based on this technique, we achieve fine-grained image editing over particular objects while preventing undesired changes to other regions. Our method LocInv, based on the publicly available Stable Diffusion, is extensively evaluated on a subset of the COCO dataset, and consistently obtains superior results both quantitatively and this http URL code will be released at this https URL

Comments:	Accepted by CVPR 2024 Workshop AI4CC
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.01496 [cs.CV]
	(or arXiv:2405.01496v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.01496

Submission history

From: Chuanming Tang [view email]
[v1] Thu, 2 May 2024 17:27:04 UTC (17,588 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LocInv: Localization-aware Inversion for Text-Guided Image Editing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LocInv: Localization-aware Inversion for Text-Guided Image Editing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators