-
ULV: A robust statistical method for clustered data, with applications to multisubject, single-cell omics data
Authors:
Mingyu Du,
Kevin Johnston,
Veronica Berrocal,
Wei Li,
Xiangmin Xu,
Zhaoxia Yu
Abstract:
Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile bi…
▽ More
Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile biological information at the single-cell level. However, the analysis of such data faces several critical challenges: limited number of individuals, non-normality, potential dropouts, outliers, and repeated measurements from the same individual. In this article, we propose a novel method, which we call U-statistic based latent variable (ULV). Our proposed method takes advantage of the robustness of rank-based statistics and exploits the statistical efficiency of parametric methods for small sample sizes. It is a computationally feasible framework that addresses all the issues mentioned above simultaneously. An additional advantage of ULV is its flexibility in modeling various types of single-cell data, including both RNA and protein abundance. The usefulness of our method is demonstrated in two studies: a single-cell proteomics study of acute myelogenous leukemia (AML) and a single-cell RNA study of COVID-19 symptoms. In the AML study, ULV successfully identified differentially expressed proteins that would have been missed by the pseudobulk version of the Wilcoxon rank-sum test. In the COVID-19 study, ULV identified genes associated with covariates such as age and gender, and genes that would be missed without adjusting for covariates. The differentially expressed genes identified by our method are less biased toward genes with high expression levels. Furthermore, ULV identified additional gene pathways likely contributing to the mechanisms of COVID-19 severity.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Protein sequence design with deep generative models
Authors:
Zachary Wu,
Kadina E. Johnston,
Frances H. Arnold,
Kevin K. Yang
Abstract:
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Time-varying $\ell_0$ optimization for Spike Inference from Multi-Trial Calcium Recordings
Authors:
Tong Shen,
Kevin Johnston,
Gyorgy Lur,
Michele Guindani,
Hernando Ombao,
Zhaoxia Yu
Abstract:
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained…
▽ More
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying $\ell_0$ penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate function across trials due to learning.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Orbital Separation Amplification in Fragile Binaries with Evolved Components
Authors:
Kyle B. Johnston,
Terry D. Oswalt,
David Valls-Gabaud
Abstract:
The secular stellar mass-loss causes an amplification of the orbital separation in fragile, common proper motion, binary systems with separations of the order of 1000 A.U. In these systems, companions evolve as two independent coeval stars as they experience negligible mutual tidal interactions or mass transfer. We present models for how post-main sequence mass-loss statistically distorts the freq…
▽ More
The secular stellar mass-loss causes an amplification of the orbital separation in fragile, common proper motion, binary systems with separations of the order of 1000 A.U. In these systems, companions evolve as two independent coeval stars as they experience negligible mutual tidal interactions or mass transfer. We present models for how post-main sequence mass-loss statistically distorts the frequency distribution of separations in fragile binaries. These models demonstrate the expected increase in orbital seapration resulting from stellar mass-loss, as well as a perturbation of associated orbital parameters. Comparisons between our models and observations resulting from the Luyten survey of wide visual binaries, specifically those containing MS and white-dwarf pairs, demonstrate a good agreement between the calculated and the observed angular separation distribution functions.
△ Less
Submitted 22 November, 2011; v1 submitted 17 November, 2011;
originally announced November 2011.