-
Challenges and perspectives in computational deconvolution of genomics data
Authors:
Lana X. Garmire,
Yijun Li,
Qianhui Huang,
Chuan Xu,
Sarah Teichmann,
Naftali Kaminski,
Matteo Pellegrini,
Quan Nguyen,
Andrew E. Teschendorff
Abstract:
Deciphering cell type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach estimating cell type abundances from a variety of omics data. Despite significant methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four signi…
▽ More
Deciphering cell type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach estimating cell type abundances from a variety of omics data. Despite significant methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four significant challenges related to computational deconvolution, from the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies and strategies to promote rigorous benchmarking.
△ Less
Submitted 2 September, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Emerging Artificial Intelligence Applications in Spatial Transcriptomics Analysis
Authors:
Yijun Li,
Stefan Stanojevic,
Lana X. Garmire
Abstract:
Spatial transcriptomics (ST) has advanced significantly in the last few years. Such advancement comes with the urgent need for novel computational methods to handle the unique challenges of ST data analysis. Many artificial intelligence (AI) methods have been developed to utilize various machine learning and deep learning techniques for computational ST analysis. This review provides a comprehensi…
▽ More
Spatial transcriptomics (ST) has advanced significantly in the last few years. Such advancement comes with the urgent need for novel computational methods to handle the unique challenges of ST data analysis. Many artificial intelligence (AI) methods have been developed to utilize various machine learning and deep learning techniques for computational ST analysis. This review provides a comprehensive and up-to-date survey of current AI methods for ST analysis.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Computational Methods for Single-Cell Multi-Omics Integration and Alignment
Authors:
Stefan Stanojevic,
Yijun Li,
Lana X. Garmire
Abstract:
Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater opportunities to understand cellular states and biological processes. However, the problem of integrating different -omics data with very different dimensionality and statistical properties remains quite challenging. A growing body of…
▽ More
Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater opportunities to understand cellular states and biological processes. However, the problem of integrating different -omics data with very different dimensionality and statistical properties remains quite challenging. A growing body of computational tools are being developed for this task, leveraging ideas ranging from machine translation to the theory of networks and representing a new frontier on the interface of biology and data science. Our goal in this review paper is to provide a comprehensive, up-to-date survey of computational techniques for the integration of multi-omics and alignment of multiple modalities of genomics data in the single cell research field.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
Blood-derived lncRNAs as potential biomarkers for early cancer diagnosis: The Good, the Bad and the Beauty
Authors:
Cedric Badowski,
Bing He,
Lana X Garmire
Abstract:
Cancer ranks as one of the deadliest diseases worldwide. The high mortality rate associated with cancer is partially due to the lack of reliable early detection methods and/or inaccurate diagnostic tools such as certain protein biomarkers. Cell-free nucleic acids (cfNA) such as circulating long non-coding RNAs (lncRNAs) have recently been proposed as a new class of potential biomarkers that could…
▽ More
Cancer ranks as one of the deadliest diseases worldwide. The high mortality rate associated with cancer is partially due to the lack of reliable early detection methods and/or inaccurate diagnostic tools such as certain protein biomarkers. Cell-free nucleic acids (cfNA) such as circulating long non-coding RNAs (lncRNAs) have recently been proposed as a new class of potential biomarkers that could improve cancer diagnosis. The reported correlation between circulating lncRNA levels and the presence of tumors has triggered a great amount of interest among clinicians and scientists who have been actively investigating their potentials as reliable cancer biomarkers. In this report, we review the progress achieved (the Good) and challenges encountered (the Bad) in the development of circulating lncRNAs as potential biomarkers for early cancer diagnosis. We report and discuss the specificity and sensitivity issues of blood-based lncRNAs currently considered as promising biomarkers for various cancers such as hepatocellular carcinoma, colorectal cancer, gastric cancer and prostate cancer. We also emphasize the potential clinical applications (the Beauty) of circulating lncRNAs both as therapeutic targets and agents, on top of diagnostic and prognostic capabilities. Based on different published works, we finally provide recommendations for investigators who seek to investigate and compare the levels of circulating lncRNAs in the blood of cancer patients compared to healthy subjects by RT-qPCR or Next Generation Sequencing.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
ASGARD: A Single-cell Guided pipeline to Aid Repurposing of Drugs
Authors:
Bing He,
Yao Xiao,
Haodong Liang,
Qianhui Huang,
Yuheng Du,
Yijun Li,
David Garmire,
Duxin Sun,
Lana X. Garmire
Abstract:
Intercellular heterogeneity is a major obstacle to successful precision medicine. Single-cell RNA sequencing (scRNA-seq) technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potential for precision medicine has yet to be reached. Towards this, we propose a new drug recommendation system called: A Single-cell Guided Pipeline to Aid Repurposi…
▽ More
Intercellular heterogeneity is a major obstacle to successful precision medicine. Single-cell RNA sequencing (scRNA-seq) technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potential for precision medicine has yet to be reached. Towards this, we propose a new drug recommendation system called: A Single-cell Guided Pipeline to Aid Repurposing of Drugs (ASGARD). ASGARD defines a novel drug score predicting drugs by considering all cell clusters to address the intercellular heterogeneity within each patient. We tested ASGARD on multiple diseases, including breast cancer, acute lymphoblastic leukemia, and coronavirus disease 2019 (COVID-19). On single-drug therapy, ASGARD shows significantly better average accuracy (AUC of 0.92) compared to two other bulk-cell-based drug repurposing methods (AUC of 0.80 and 0.76). It is also considerably better (AUC of 0.82) than other cell cluster level predicting methods (AUC of 0.67 and 0.55). In addition, ASGARD is also validated by the drug response prediction method TRANSACT with Triple-Negative-Breast-Cancer patient samples. Many top-ranked drugs are either approved by FDA or in clinical trials treating corresponding diseases. In silico cell-type specific drop-out experiments using triple-negative breast cancers show the importance of T cells in the tumor microenvironment in affecting drug predictions. In conclusion, ASGARD is a promising drug repurposing recommendation tool guided by single-cell RNA-seq for personalized medicine. ASGARD is free for educational use at https://github.com/lanagarmire/ASGARD.
△ Less
Submitted 22 December, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Cox-nnet v2.0: improved neural-network based survival prediction extended to large-scale EMR dataset
Authors:
Di Wang,
Kevin He,
Lana X Garmire
Abstract:
Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature co…
▽ More
Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. Applying on an EMR dataset of OPTN kidney transplantation, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32 folds (n=10,000) and achieves better prediction accuracy than Cox-PH (p<0.05). Availability and implementation: Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Strategies to integrate multi-omics data for patient survival prediction
Authors:
Lana X Garmire
Abstract:
Genomics, especially multi-omics, has made precision medicine feasible. The completion and publicly accessible multi-omics resource with clinical outcome, such as The Cancer Genome Atlas (TCGA) is a great test bed for developing computational methods that integrate multi-omics data to predict patient cancer phenotypes. We have been utilizing TCGA multi-omics data to predict cancer patient survival…
▽ More
Genomics, especially multi-omics, has made precision medicine feasible. The completion and publicly accessible multi-omics resource with clinical outcome, such as The Cancer Genome Atlas (TCGA) is a great test bed for developing computational methods that integrate multi-omics data to predict patient cancer phenotypes. We have been utilizing TCGA multi-omics data to predict cancer patient survival, using a variety of approaches, including prior-biological knowledge (such as pathways), and more recently, deep-learning methods. Over time, we have developed methods such as Cox-nnet, DeepProg, and two-stage Cox-nnet, to address the challenges due to multi-omics and multi-modality. Despite the limited sample size (hundreds to thousands) in the training datasets as well as the heterogeneity nature of human populations, these methods have shown significance and robustness at predicting patient survival in independent population cohorts. In the following, we would describe in detail these methodologies, the modeling results, and important biological insights revealed by these methods.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Single Cell Transcriptome Research in Human Placenta
Authors:
Hui Li,
Qianhui Huang,
Yu Liu,
Lana X Garmire
Abstract:
Human placenta is a complex and heterogeneous organ interfacing between the mother and the fetus that supports fetal development. Alterations to placental structural components are associated with various pregnancy complications. To reveal the heterogeneity among various placenta cell types in normal and diseased placentas, as well as elucidate molecular interactions within a population of placent…
▽ More
Human placenta is a complex and heterogeneous organ interfacing between the mother and the fetus that supports fetal development. Alterations to placental structural components are associated with various pregnancy complications. To reveal the heterogeneity among various placenta cell types in normal and diseased placentas, as well as elucidate molecular interactions within a population of placental cells, a new genomics technology called single cell RNA-Seq (or scRNA-seq) has been employed in the last couple of years. Here we review the principles of scRNA-seq technology, and summarize the recent human placenta studies at scRNA-seq level across gestational ages as well as in pregnancy complications such as preterm birth and preeclampsia. We list the computational analysis platforms and resources available for the public use. Lastly, we discuss the future areas of interest for placenta single cell studies, as well as the data analytics needed to accomplish them.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Recommendations to enhance rigor and reproducibility in biomedical research
Authors:
Jaqueline J. Brito,
Jun Li,
Jason H. Moore,
Casey S. Greene,
Nicole A. Nogoy,
Lana X. Garmire,
Serghei Mangul
Abstract:
Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as…
▽ More
Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as source code and documentation. Publications that lack such information compromise the role of peer review in evaluating technical strength and scientific contribution. Incomplete ancillary information for an academic software package may bias or limit any subsequent work produced with the tool. We provide eight recommendations across four different domains to improve reproducibility, transparency, and rigor in computational biology - precisely on the main values which should be emphasized in life science curricula. Our recommendations for improving software availability, usability, and archival stability aim to foster a sustainable data science ecosystem in biomedicine and life science research.
△ Less
Submitted 27 July, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.