Pan-infection Foundation Framework Enables Multiple Pathogen Prediction

Zhang, Lingrui; Wu, Haonan; Jin, Nana; Zheng, Chenqing; Xie, Jize; Cai, Qitai; Wang, Jun; Cao, Qin; Zheng, Xubin; Wang, Jiankun; Cheng, Lixin

Computer Science > Machine Learning

arXiv:2501.01462 (cs)

[Submitted on 31 Dec 2024]

Title:Pan-infection Foundation Framework Enables Multiple Pathogen Prediction

Authors:Lingrui Zhang, Haonan Wu, Nana Jin, Chenqing Zheng, Jize Xie, Qitai Cai, Jun Wang, Qin Cao, Xubin Zheng, Jiankun Wang, Lixin Cheng

View PDF

Abstract:Host-response-based diagnostics can improve the accuracy of diagnosing bacterial and viral infections, thereby reducing inappropriate antibiotic prescriptions. However, the existing cohorts with limited sample size and coarse infections types are unable to support the exploration of an accurate and generalizable diagnostic model. Here, we curate the largest infection host-response transcriptome data, including 11,247 samples across 89 blood transcriptome datasets from 13 countries and 21 platforms. We build a diagnostic model for pathogen prediction starting from a pan-infection model as foundation (AUC = 0.97) based on the pan-infection dataset. Then, we utilize knowledge distillation to efficiently transfer the insights from this "teacher" model to four lightweight pathogen "student" models, i.e., staphylococcal infection (AUC = 0.99), streptococcal infection (AUC = 0.94), HIV infection (AUC = 0.93), and RSV infection (AUC = 0.94), as well as a sepsis "student" model (AUC = 0.99). The proposed knowledge distillation framework not only facilitates the diagnosis of pathogens using pan-infection data, but also enables an across-disease study from pan-infection to sepsis. Moreover, the framework enables high-degree lightweight design of diagnostic models, which is expected to be adaptively deployed in clinical settings.

Comments:	15 pages, 8 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)
Cite as:	arXiv:2501.01462 [cs.LG]
	(or arXiv:2501.01462v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.01462

Submission history

From: Lingrui Zhang [view email]
[v1] Tue, 31 Dec 2024 14:34:53 UTC (5,983 KB)

Computer Science > Machine Learning

Title:Pan-infection Foundation Framework Enables Multiple Pathogen Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pan-infection Foundation Framework Enables Multiple Pathogen Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators