-
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype
Authors:
Changxi Chi,
Jun Xia,
Jingbo Zhou,
Jiabei Cheng,
Chang Yu,
Stan Z. Li
Abstract:
Predicting genetic perturbations enables the identification of potentially crucial genes prior to wet-lab experiments, significantly improving overall experimental efficiency. Since genes are the foundation of cellular life, building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations. However, current methods fail to fully leverage gene-relat…
▽ More
Predicting genetic perturbations enables the identification of potentially crucial genes prior to wet-lab experiments, significantly improving overall experimental efficiency. Since genes are the foundation of cellular life, building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations. However, current methods fail to fully leverage gene-related information, and solely rely on simple evaluation metrics to construct coarse-grained GRN. More importantly, they ignore functional differences between biotypes, limiting the ability to capture potential gene interactions. In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data, respectively, which serve as the initialization for gene representations. Additionally, we introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes, while capturing implicit gene relationships through graph structure learning (GSL). We propose GRAPE, a heterogeneous graph neural network (HGNN) that leverages gene representations initialized with features from descriptions and sequences, models the distinct roles of genes with different biotypes, and dynamically refines the GRN through GSL. The results on publicly available datasets show that our method achieves state-of-the-art performance.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Mitigating mode collapse in normalizing flows by annealing with an adaptive schedule: Application to parameter estimation
Authors:
Yihang Wang,
Chris Chi,
Aaron R. Dinner
Abstract:
Normalizing flows (NFs) provide uncorrelated samples from complex distributions, making them an appealing tool for parameter estimation. However, the practical utility of NFs remains limited by their tendency to collapse to a single mode of a multimodal distribution. In this study, we show that annealing with an adaptive schedule based on the effective sample size (ESS) can mitigate mode collapse.…
▽ More
Normalizing flows (NFs) provide uncorrelated samples from complex distributions, making them an appealing tool for parameter estimation. However, the practical utility of NFs remains limited by their tendency to collapse to a single mode of a multimodal distribution. In this study, we show that annealing with an adaptive schedule based on the effective sample size (ESS) can mitigate mode collapse. We demonstrate that our approach can converge the marginal likelihood for a biochemical oscillator model fit to time-series data in ten-fold less computation time than a widely used ensemble Markov chain Monte Carlo (MCMC) method. We show that the ESS can also be used to reduce variance by pruning the samples. We expect these developments to be of general use for sampling with NFs and discuss potential opportunities for further improvements.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Characterizing nonlinear dynamics by contrastive cartography
Authors:
Nicolas Romeo,
Chris Chi,
Aaron R. Dinner,
Elizabeth R. Jerison
Abstract:
The qualitative study of dynamical systems using bifurcation theory is key to understanding systems from biological clocks and neurons to physical phase transitions. Data generated from such systems can feature complex transients, an unknown number of attractors, and stochasticity. Characterization of these often-complicated behaviors remains challenging. Making an analogy to bifurcation analysis,…
▽ More
The qualitative study of dynamical systems using bifurcation theory is key to understanding systems from biological clocks and neurons to physical phase transitions. Data generated from such systems can feature complex transients, an unknown number of attractors, and stochasticity. Characterization of these often-complicated behaviors remains challenging. Making an analogy to bifurcation analysis, which specifies that useful dynamical features are often invariant to coordinate transforms, we leverage contrastive learning to devise a generic tool to discover dynamical classes from stochastic trajectory data. By providing a model-free trajectory analysis tool, this method automatically recovers the dynamical phase diagram of known models and provides a "map" of dynamical behaviors for a large ensemble of dynamical systems. The method thus provides a way to characterize and compare dynamical trajectories without governing equations or prior knowledge of target behavior. We additionally show that the same strategy can be used to characterize the stochastic motion of bacteria, establishing that this approach can be used as a standalone analysis tool or as a component of a broader data-driven analysis framework for dynamical data.
△ Less
Submitted 19 May, 2025; v1 submitted 30 January, 2025;
originally announced February 2025.
-
Sampling parameters of ordinary differential equations with Langevin dynamics that satisfy constraints
Authors:
Chris Chi,
Jonathan Weare,
Aaron R. Dinner
Abstract:
Fitting models to data to obtain distributions of consistent parameter values is important for uncertainty quantification, model comparison, and prediction. Standard Markov Chain Monte Carlo (MCMC) approaches for fitting ordinary differential equations (ODEs) to time-series data involve proposing trial parameter sets, numerically integrating the ODEs forward in time, and accepting or rejecting the…
▽ More
Fitting models to data to obtain distributions of consistent parameter values is important for uncertainty quantification, model comparison, and prediction. Standard Markov Chain Monte Carlo (MCMC) approaches for fitting ordinary differential equations (ODEs) to time-series data involve proposing trial parameter sets, numerically integrating the ODEs forward in time, and accepting or rejecting the trial parameter sets. When the model dynamics depend nonlinearly on the parameters, as is generally the case, trial parameter sets are often rejected, and MCMC approaches become prohibitively computationally costly to converge. Here, we build on methods for numerical continuation and trajectory optimization to introduce an approach in which we use Langevin dynamics in the joint space of variables and parameters to sample models that satisfy constraints on the dynamics. We demonstrate the method by sampling Hopf bifurcations and limit cycles of a model of a biochemical oscillator in a Bayesian framework for parameter estimation, and we obtain more than a hundred fold speedup relative to a leading ensemble MCMC approach that requires numerically integrating the ODEs forward in time. We describe numerical experiments that provide insight into the speedup. The method is general and can be used in any framework for parameter estimation and model selection.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.