TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

Koksal, Aybora; Alatan, A. Aydin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.12099 (cs)

[Submitted on 17 May 2025]

Title:TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

Authors:Aybora Koksal, A. Aydin Alatan

View PDF HTML (experimental)

Abstract:Remote-sensing applications often run on edge hardware that cannot host today's 7B-parameter multimodal language models. This paper introduces TinyRS, the first 2B-parameter multimodal small language model (MSLM) optimized for remote sensing tasks, and TinyRS-R1, its reasoning-augmented variant. Built upon Qwen2-VL-2B, TinyRS is trained through a four-stage pipeline: pre-training on million satellite images, instruction tuning on visual instruction examples, fine-tuning with Chain-of-Thought (CoT) annotations from the proposed reasoning dataset, and alignment via Group Relative Policy Optimization (GRPO). TinyRS-R1 achieves or surpasses the performance of recent 7B-parameter remote sensing models across classification, VQA, visual grounding, and open-ended question answering-while requiring just one-third of the memory and latency. Our analysis shows that CoT reasoning substantially benefits spatial grounding and scene understanding, while the non-reasoning TinyRS excels in concise, latency-sensitive VQA tasks. TinyRS-R1 represents the first domain-specialized MSLM with GRPO-aligned CoT reasoning for general-purpose remote sensing.

Comments:	Submitted to BMVC 2025. Code, models, and the captions for datasets will be released
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.12099 [cs.CV]
	(or arXiv:2505.12099v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.12099

Submission history

From: Aybora Koksal [view email]
[v1] Sat, 17 May 2025 17:53:21 UTC (443 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators