Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations
Authors:
Kalyan Ramakrishnan,
Jonathan G. Hedley,
Sisi Qu,
Puneet K. Dokania,
Philip H. S. Torr,
Cesar A. Prada-Medina,
Julien Fauqueur,
Kaspar Martens
Abstract:
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we of…
▽ More
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.