Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

He, Bolei; He, Xinran; Shao, Run; Shu, Shanfu; Xue, Xianwei; Cheng, Mingquan; Li, Haifeng; Ling, Zhenhua

Computer Science > Computation and Language

arXiv:2508.15213 (cs)

[Submitted on 21 Aug 2025]

Title:Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Authors:Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, Mingquan Cheng, Haifeng Li, Zhenhua Ling

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured reasoning data generation pipeline and integrate GRPO to enhance reasoning ability. Experiments on medical, legal, and financial QA benchmarks show that S2K consistently outperforms existing methods and matches domain-pretrained LLMs with significantly lower cost.

Comments:	EMNLP2025 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2508.15213 [cs.CL]
	(or arXiv:2508.15213v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2508.15213

Submission history

From: Xinran He [view email]
[v1] Thu, 21 Aug 2025 03:53:35 UTC (1,243 KB)

Computer Science > Computation and Language

Title:Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators