Exploring Text-Guided Single Image Editing for Remote Sensing Images

Han, Fangzhou; Si, Lingyu; Jiang, Zhizhuo; Dong, Hongwei; Zhang, Lamei; Liu, Yu; Chen, Hao; Du, Bo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.05769 (cs)

[Submitted on 9 May 2024 (v1), last revised 1 Jul 2025 (this version, v4)]

Title:Exploring Text-Guided Single Image Editing for Remote Sensing Images

Authors:Fangzhou Han, Lingyu Si, Zhizhuo Jiang, Hongwei Dong, Lamei Zhang, Yu Liu, Hao Chen, Bo Du

View PDF HTML (experimental)

Abstract:Artificial intelligence generative content (AIGC) has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pre-trained on large-scale benchmark datasets and text guidance facilitated by vision-language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above problems, this paper proposes a text-guided RSI editing method and can be trained using only a single image. A multi-scale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pre-trained VLMs and prompt ensembling (PE) to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. Additionally, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality. Codes will be released at this https URL.

Comments:	17 pages, 18 figures, Accepted by IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.05769 [cs.CV]
	(or arXiv:2405.05769v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.05769

Submission history

From: Fangzhou Han [view email]
[v1] Thu, 9 May 2024 13:45:04 UTC (1,561 KB)
[v2] Thu, 26 Sep 2024 05:10:23 UTC (10,118 KB)
[v3] Fri, 27 Jun 2025 16:23:00 UTC (14,184 KB)
[v4] Tue, 1 Jul 2025 14:55:57 UTC (14,184 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Exploring Text-Guided Single Image Editing for Remote Sensing Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Exploring Text-Guided Single Image Editing for Remote Sensing Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators