-
ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification
Authors:
Shijia Zhang,
Xiyu Ding,
Kai Ding,
Jacob Zhang,
Kevin Galinsky,
Mengrui Wang,
Ryan P. Mayers,
Zheyu Wang,
Hadi Kharrazi
Abstract:
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) fo…
▽ More
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class-separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduced manual review efforts by 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand-alone LoRA.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Cross-Attention Fusion of MRI and Jacobian Maps for Alzheimer's Disease Diagnosis
Authors:
Shijia Zhang,
Xiyu Ding,
Brian Caffo,
Junyu Chen,
Cindy Zhang,
Hadi Kharrazi,
Zheyu Wang
Abstract:
Early diagnosis of Alzheimer's disease (AD) is critical for intervention before irreversible neurodegeneration occurs. Structural MRI (sMRI) is widely used for AD diagnosis, but conventional deep learning approaches primarily rely on intensity-based features, which require large datasets to capture subtle structural changes. Jacobian determinant maps (JSM) provide complementary information by enco…
▽ More
Early diagnosis of Alzheimer's disease (AD) is critical for intervention before irreversible neurodegeneration occurs. Structural MRI (sMRI) is widely used for AD diagnosis, but conventional deep learning approaches primarily rely on intensity-based features, which require large datasets to capture subtle structural changes. Jacobian determinant maps (JSM) provide complementary information by encoding localized brain deformations, yet existing multimodal fusion strategies fail to fully integrate these features with sMRI. We propose a cross-attention fusion framework to model the intrinsic relationship between sMRI intensity and JSM-derived deformations for AD classification. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we compare cross-attention, pairwise self-attention, and bottleneck attention with four pre-trained 3D image encoders. Cross-attention fusion achieves superior performance, with mean ROC-AUC scores of 0.903 (+/-0.033) for AD vs. cognitively normal (CN) and 0.692 (+/-0.061) for mild cognitive impairment (MCI) vs. CN. Despite its strong performance, our model remains highly efficient, with only 1.56 million parameters--over 40 times fewer than ResNet-34 (63M) and Swin UNETR (61.98M). These findings demonstrate the potential of cross-attention fusion for improving AD diagnosis while maintaining computational efficiency.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor
Authors:
Zhuowen Yin,
Xinyao Ding,
Xin Zhang,
Zhengwang Wu,
Li Wang,
Xiangmin Xu,
Gang Li
Abstract:
Autism Spectrum Disorder (ASD) has been emerging as a growing public health threat. Early diagnosis of ASD is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communications and behavioral patterns are unreliable for children younger than 2 years of age. Given evidences of neurodevelopmental abnormalities in ASD infants, we resort to a nove…
▽ More
Autism Spectrum Disorder (ASD) has been emerging as a growing public health threat. Early diagnosis of ASD is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communications and behavioral patterns are unreliable for children younger than 2 years of age. Given evidences of neurodevelopmental abnormalities in ASD infants, we resort to a novel deep learning-based method to extract key features from the inherently scarce, class-imbalanced, and heterogeneous structural MR images for early autism diagnosis. Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance by extracting key features. We also proposed weight constraints to cope with sample heterogeneity by giving different samples different voting weights during validation, and we used Path Signature to unravel meaningful developmental features from the two-time point data longitudinally. We further extracted machine learning focused brain regions for autism diagnosis. Extensive experiments have shown that our method performed well under practical scenarios, transcending existing machine learning methods and providing anatomical insights for autism early diagnosis.
△ Less
Submitted 2 May, 2024; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Effective drug combination for Caenorhabditis elegans nematodes discovered by output-driven feedback system control technique
Authors:
Xianting Ding,
Zach Njus,
Taejoon Kong,
Wenqiong Su,
Chih-Ming Ho,
Santosh Pandey
Abstract:
Infections from parasitic nematodes (or roundworms) contribute to a significant disease burden and productivity losses for humans and livestock. The limited number of anthelmintics (or antinematode drugs) available today to treat these infections are rapidly losing their efficacy as multidrug resistance in parasites becomes a global health challenge. We propose an engineering approach to discover…
▽ More
Infections from parasitic nematodes (or roundworms) contribute to a significant disease burden and productivity losses for humans and livestock. The limited number of anthelmintics (or antinematode drugs) available today to treat these infections are rapidly losing their efficacy as multidrug resistance in parasites becomes a global health challenge. We propose an engineering approach to discover an anthelmintic drug combination that is more potent at killing wild-type Caenorhabditis elegans worms than four individual drugs. In the experiment, freely swimming single worms are enclosed in microfluidic drug environments to assess the centroid velocity and track curvature of worm movements. After analyzing the behavioral data in every iteration, the feedback system control (FSC) scheme is used to predict new drug combinations to test. Through a differential evolutionary search, the winning drug combination is reached that produces minimal centroid velocity and high track curvature, while requiring each drug in less than their EC50 concentrations. The FSC approach is model-less and does not need any information on the drug pharmacology, signaling pathways, or animal biology. Toward combating multidrug resistance, the method presented here is applicable to the discovery of new potent combinations of available anthelmintics on C. elegans, parasitic nematodes, and other small model organisms.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor
Authors:
Feng Ren,
Xiao Ding,
Min Zheng,
Mikhail Korzinkin,
Xin Cai,
Wei Zhu,
Alexey Mantsyzov,
Alex Aliper,
Vladimir Aladinskiy,
Zhongying Cao,
Shanshan Kong,
Xi Long,
Bonnie Hei Man Liu,
Yingtao Liu,
Vladimir Naumov,
Anastasia Shneyderman,
Ivan V. Ozerov,
Ju Wang,
Frank W. Pun,
Alan Aspuru-Guzik,
Michael Levitt,
Alex Zhavoronkov
Abstract:
The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li…
▽ More
The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery.
△ Less
Submitted 12 February, 2022; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Computational neurology: Computational modeling approaches in dementia
Authors:
KongFatt Wong-Lin,
Jose M. Sanchez-Bornot,
Niamh McCombe,
Daman Kaur,
Paula L. McClean,
Xin Zou,
Vahab Youssofzadeh,
Xuemei Ding,
Magda Bucholc,
Su Yang,
Girijesh Prasad,
Damien Coyle,
Liam P. Maguire,
Haiying Wang,
Hui Wang,
Nadim A. A. Atiya,
Alok Joshi
Abstract:
Dementia is a collection of symptoms associated with impaired cognition and impedes everyday normal functioning. Dementia, with Alzheimer's disease constituting its most common type, is highly complex in terms of etiology and pathophysiology. A more quantitative or computational attitude towards dementia research, or more generally in neurology, is becoming necessary - Computational Neurology. We…
▽ More
Dementia is a collection of symptoms associated with impaired cognition and impedes everyday normal functioning. Dementia, with Alzheimer's disease constituting its most common type, is highly complex in terms of etiology and pathophysiology. A more quantitative or computational attitude towards dementia research, or more generally in neurology, is becoming necessary - Computational Neurology. We provide a focused review of some computational approaches that have been developed and applied to the study of dementia, particularly Alzheimer's disease. Both mechanistic modeling and data-drive, including AI or machine learning, approaches are discussed. Linkage to clinical decision support systems for dementia diagnosis will also be discussed.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models
Authors:
Dianbo Liu,
Leonardo Clemente,
Canelle Poirier,
Xiyu Ding,
Matteo Chinazzi,
Jessica T Davis,
Alessandro Vespignani,
Mauricio Santillana
Abstract:
We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from C…
▽ More
We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.