-
ILETIA: An AI-enhanced method for individualized trigger-oocyte pickup interval estimation of progestin-primed ovarian stimulation protocol
Authors:
Binjian Wu,
Qian Li,
Zhe Kuang,
Hongyuan Gao,
Xinyi Liu,
Haiyan Guo,
Qiuju Chen,
Xinyi Liu,
Yangruizhe Jiang,
Yuqi Zhang,
Jinyin Zha,
Mingyu Li,
Qiuhan Ren,
Sishuo Feng,
Haicang Zhang,
Xuefeng Lu,
Jian Zhang
Abstract:
In vitro fertilization-embryo transfer (IVF-ET) stands as one of the most prevalent treatments for infertility. During an IVF-ET cycle, the time interval between trigger shot and oocyte pickup (OPU) is a pivotal period for follicular maturation, which determines mature oocytes yields and impacts the success of subsequent procedures. However, accurately predicting this interval is severely hindered…
▽ More
In vitro fertilization-embryo transfer (IVF-ET) stands as one of the most prevalent treatments for infertility. During an IVF-ET cycle, the time interval between trigger shot and oocyte pickup (OPU) is a pivotal period for follicular maturation, which determines mature oocytes yields and impacts the success of subsequent procedures. However, accurately predicting this interval is severely hindered by the variability of clinicians'experience that often leads to suboptimal oocyte retrieval rate. To address this challenge, we propose ILETIA, the first machine learning-based method that could predict the optimal trigger-OPU interval for patients receiving progestin-primed ovarian stimulation (PPOS) protocol. Specifically, ILETIA leverages a Transformer to learn representations from clinical tabular data, and then employs gradient-boosted trees for interval prediction. For model training and evaluating, we compiled a dataset PPOS-DS of nearly ten thousand patients receiving PPOS protocol, the largest such dataset to our knowledge. Experimental results demonstrate that our method achieves strong performance (AUROC = 0.889), outperforming both clinicians and other widely used computational models. Moreover, ILETIA also supports premature ovulation risk prediction in a specific OPU time (AUROC = 0.838). Collectively, by enabling more precise and individualized decisions, ILETIA has the potential to improve clinical outcomes and lay the foundation for future IVF-ET research.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence
Authors:
Yanyi Ding,
Zhiyi Kuang,
Yuxin Pei,
Jeff Tan,
Ziyu Zhang,
Joseph Konan
Abstract:
SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021. With thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant challenges to scientists on keeping pace with vaccine development and public health measures. Therefore, an efficient method of identifying the divergence of lab samples…
▽ More
SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021. With thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant challenges to scientists on keeping pace with vaccine development and public health measures. Therefore, an efficient method of identifying the divergence of lab samples from patients would greatly aid the documentation of SARS-CoV-2 genomics. In this study, we propose a neural network model that leverages recurrent and convolutional units to directly take in amino acid sequences of spike proteins and classify corresponding clades. We also compared our model's performance with Bidirectional Encoder Representations from Transformers (BERT) pre-trained on protein database. Our approach has the potential of providing a more computationally efficient alternative to current homology based intra-species differentiation.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
High-Throughput Machine Learning from Electronic Health Records
Authors:
Ross S. Kleiman,
Paul S. Bennett,
Peggy L. Peissig,
Richard L. Berg,
Zhaobin Kuang,
Scott J. Hebbring,
Michael D. Caldwell,
David Page
Abstract:
The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, we currently know very little about how accurately…
▽ More
The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, we currently know very little about how accurately the risk for most diseases or events can be predicted, and how far in advance. Machine learning algorithms use training data rather than preprogrammed rules to make predictions and are well suited for the complex task of disease prediction. Although there are thousands of conditions and illnesses patients can encounter, no prior research simultaneously predicts risks for thousands of diagnosis codes and thereby establishes a comprehensive patient risk profile. Here we show that such pandiagnostic prediction is possible with a high level of performance across diagnosis codes. For the tasks of predicting diagnosis risks both 1 and 6 months in advance, we achieve average areas under the receiver operating characteristic curve (AUCs) of 0.803 and 0.758, respectively, across thousands of prediction tasks. Finally, our research contributes a new clinical prediction dataset in which researchers can explore how well a diagnosis can be predicted and what health factors are most useful for prediction. For the first time, we can get a much more complete picture of how well risks for thousands of different diagnosis codes can be predicted.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications
Authors:
Finn Kuusisto,
John Steill,
Zhaobin Kuang,
James Thomson,
David Page,
Ron Stewart
Abstract:
We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the seco…
▽ More
We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the second is to identify potential drugs for investigation in drug repositioning. We compare the results from our algorithm to existing data and state-of-the-art algorithms, demonstrating compelling results for both application areas. While we apply the algorithm here for biomedical applications, we argue that the method is generalizable to any available corpus of sufficient size.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
TransPath: A Computational Method to Study the Ion Transit Pathways in Membrane Channels
Authors:
Z. Kuang,
A. Liu,
T. L. Beck
Abstract:
The finely tuned structures of membrane channels allow selective passage of ions through the available aqueous pores. In order to understand channel function, it is crucial to locate the pore and study its physical and chemical properties. Recently obtained X-ray crystal structures of bacterial chloride channel homologues reveal a complicated topology with curvilinear pores. The commonly used HO…
▽ More
The finely tuned structures of membrane channels allow selective passage of ions through the available aqueous pores. In order to understand channel function, it is crucial to locate the pore and study its physical and chemical properties. Recently obtained X-ray crystal structures of bacterial chloride channel homologues reveal a complicated topology with curvilinear pores. The commonly used HOLE program encounters difficulties in studying such pores. Here we propose a new pore-searching algorithm (TransPath) which uses the Configurational Bias Monte Carlo (CBMC) method to generate transmembrane trajectories driven by both geometric and electrostatic features. The trajectories are binned into groups determined by a vector distance criterion.
From each group, a representative trajectory is selected based on the Rosenbluth weight, and the geometrically optimal path is obtained by simulated annealing. Candidate ion pathways can then be determined by analysis of the radius and potential profiles. The proposed method and its implementation are illustrated using the bacterial KcsA potassium channel as an example. The procedure is then applied to the more complex structures of the bacterial E. coli ClC channel homologues.
△ Less
Submitted 22 July, 2005;
originally announced July 2005.