Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Wang, Junying; Zhang, Zicheng; Shen, Ye; Wu, Yalun; Liang, Yingji; Guo, Yijin; Wen, Farong; Li, Wenzhe; Zhao, Xuezhi; Jia, Qi; Zhai, Guangtao

Computer Science > Computation and Language

arXiv:2509.24297 (cs)

[Submitted on 29 Sep 2025 (v1), last revised 30 Sep 2025 (this version, v2)]

Title:Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Authors:Junying Wang, Zicheng Zhang, Ye Shen, Yalun Wu, Yingji Liang, Yijin Guo, Farong Wen, Wenzhe Li, Xuezhi Zhao, Qi Jia, Guangtao Zhai

View PDF HTML (experimental)

Abstract:High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three parts: 1) Task Definition \& Evaluation Rubric: We develop a TQA-to-MMQA framework and establish a comprehensive, multi-dimensional MMQA quality rubric that provides principles for the transformation. 2) Benchmark Construction: Then we construct two extensive benchmarks to rigorously evaluate state-of-the-art generation \& understanding models on the distinct tasks of MMQA generation \& MMQA quality evaluation. 3) Preliminary Solution: We develop an agentic system (Q-Mirror), which operationalizes our framework by integrating MMQA generation and evaluation into a closed loop for iterative refinement. Our experiments show that while state-of-the-art models can generate MMQAs, their outputs still leave substantial gaps, underscoring the need for reliable evaluation. We further demonstrate that top-tier understanding models align closely with human judgment in MMQA quality assessment. Leveraging both insights, the Q-Mirror agent raises average scores from 78.90 to 85.22 and pass rates from 72\% to 95\%, offering a practical path to large-scale scientific benchmarks.

Comments:	25 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.24297 [cs.CL]
	(or arXiv:2509.24297v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2509.24297

Submission history

From: Junying Wang [view email]
[v1] Mon, 29 Sep 2025 05:22:10 UTC (5,982 KB)
[v2] Tue, 30 Sep 2025 04:56:54 UTC (5,981 KB)

Computer Science > Computation and Language

Title:Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators