Repurposing the scientific literature with vision-language models

Alyakin, Anton; Stryker, Jaden; Alber, Daniel Alexander; Sangwon, Karl L.; Lee, Jin Vivian; Duderstadt, Brandon; Save, Akshay; Kurland, David; Frome, Spencer; Singh, Shrutika; Zhang, Jeff; Yang, Eunice; Park, Ki Yun; Orillac, Cordelia; Valliani, Aly A.; Neifert, Sean; Liu, Albert; Patel, Aneek; Livia, Christopher; Lau, Darryl; Laufer, Ilya; Rozman, Peter A.; Hidalgo, Eveline Teresa; Riina, Howard; Feng, Rui; Hollon, Todd; Aphinyanaphongs, Yindalon; Golfinos, John G.; Snyder, Laura; Leuthardt, Eric; Kondziolka, Douglas; Oermann, Eric Karl

Computer Science > Artificial Intelligence

arXiv:2502.19546 (cs)

[Submitted on 26 Feb 2025 (v1), last revised 28 Apr 2025 (this version, v3)]

Title:Repurposing the scientific literature with vision-language models

View PDF

Abstract:Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content - without large-scale internet data - results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2502.19546 [cs.AI]
	(or arXiv:2502.19546v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.19546

Submission history

From: Anton Alyakin [view email]
[v1] Wed, 26 Feb 2025 20:35:37 UTC (6,367 KB)
[v2] Fri, 25 Apr 2025 13:29:53 UTC (6,618 KB)
[v3] Mon, 28 Apr 2025 00:52:00 UTC (6,597 KB)

Computer Science > Artificial Intelligence

Title:Repurposing the scientific literature with vision-language models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Repurposing the scientific literature with vision-language models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators