BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Remy, François; Demuynck, Kris; Demeester, Thomas

Computer Science > Computation and Language

arXiv:2311.16075 (cs)

[Submitted on 27 Nov 2023]

Title:BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Authors:François Remy, Kris Demuynck, Thomas Demeester

View PDF

Abstract:In this study, we investigate the potential of Large Language Models to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains. Drawing on the wealth of the UMLS knowledge graph and harnessing cutting-edge Large Language Models, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of three steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. Through rigorous evaluations via the extensive BioLORD testing suite and diverse downstream tasks, we demonstrate consistent and substantial performance improvements over the previous state of the art (e.g. +2pts on MedSTS, +2.5pts on MedNLI-S, +6.1pts on EHR-Rel-B). Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages. Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications.

Comments:	Preprint of upcoming journal article
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2311.16075 [cs.CL]
	(or arXiv:2311.16075v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.16075

Submission history

From: Francois Remy [view email]
[v1] Mon, 27 Nov 2023 18:46:17 UTC (1,270 KB)

Computer Science > Computation and Language

Title:BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators