Repurposing the scientific literature with vision-language models
Authors:
Anton Alyakin,
Jaden Stryker,
Daniel Alexander Alber,
Karl L. Sangwon,
Jin Vivian Lee,
Brandon Duderstadt,
Akshay Save,
David Kurland,
Spencer Frome,
Shrutika Singh,
Jeff Zhang,
Eunice Yang,
Ki Yun Park,
Cordelia Orillac,
Aly A. Valliani,
Sean Neifert,
Albert Liu,
Aneek Patel,
Christopher Livia,
Darryl Lau,
Ilya Laufer,
Peter A. Rozman,
Eveline Teresa Hidalgo,
Howard Riina,
Rui Feng
, et al. (7 additional authors not shown)
Abstract:
Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 N…
▽ More
Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content - without large-scale internet data - results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.
△ Less
Submitted 27 April, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.