CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making

Jiang, Songtao; Wang, Yuan; Chen, Ruizhe; Zhang, Yan; Luo, Ruilin; Lei, Bohan; Song, Sibo; Feng, Yang; Sun, Jimeng; Wu, Jian; Liu, Zuozhu

Abstract:In medical visual question answering (Med-VQA), achieving accurate responses relies on three critical steps: precise perception of medical imaging data, logical reasoning grounded in visual input and textual questions, and coherent answer derivation from the reasoning process. Recent advances in general vision-language models (VLMs) show that large-scale reinforcement learning (RL) could significantly enhance both reasoning capabilities and overall model performance. However, their application in medical domains is hindered by two fundamental challenges: 1) misalignment between perceptual understanding and reasoning stages, and 2) inconsistency between reasoning pathways and answer generation, both compounded by the scarcity of high-quality medical datasets for effective large-scale RL. In this paper, we first introduce Med-Zero-17K, a curated dataset for pure RL-based training, encompassing over 30 medical image modalities and 24 clinical tasks. Moreover, we propose a novel large-scale RL framework for Med-VLMs, Consistency-Aware Preference Optimization (CAPO), which integrates rewards to ensure fidelity between perception and reasoning, consistency in reasoning-to-answer derivation, and rule-based accuracy for final responses. Extensive experiments on both in-domain and out-of-domain scenarios demonstrate the superiority of our method over strong VLM baselines, showcasing strong generalization capability to 3D Med-VQA benchmarks and R1-like training paradigms.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2506.12849 [cs.CV]
	(or arXiv:2506.12849v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2506.12849

Computer Science > Computer Vision and Pattern Recognition

Title:CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators