Skip to main content

Showing 1–23 of 23 results for author: Theis, F

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2507.01163  [pdf, ps, other

    cs.CV q-bio.CB q-bio.QM

    cp_measure: API-first feature extraction for image-based profiling workflows

    Authors: Alán F. Muñoz, Tim Treis, Alexandr A. Kalinin, Shatavisha Dasgupta, Fabian Theis, Anne E. Carpenter, Shantanu Singh

    Abstract: Biological image analysis has traditionally focused on measuring specific visual properties of interest for cells or other entities. A complementary paradigm gaining increasing traction is image-based profiling - quantifying many distinct visual features to form comprehensive profiles which may reveal hidden patterns in cellular states, drug responses, and disease mechanisms. While current tools l… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 10 pages, 4 figures, 4 supplementary figures. CODEML Workshop paper accepted (non-archival), as a part of ICML2025 events

    ACM Class: I.4.7

  2. arXiv:2504.17247  [pdf, other

    cs.LG cs.AI q-bio.BM

    Targeted AMP generation through controlled diffusion with efficient embeddings

    Authors: Diogo Soares, Leon Hetzel, Paulina Szymczak, Fabian Theis, Stephan Günnemann, Ewa Szczurek

    Abstract: Deep learning-based antimicrobial peptide (AMP) discovery faces critical challenges such as low experimental hit rates as well as the need for nuanced controllability and efficient modeling of peptide properties. To address these challenges, we introduce OmegAMP, a framework that leverages a diffusion-based generative model with efficient low-dimensional embeddings, precise controllability mechani… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  3. arXiv:2503.20027  [pdf

    q-bio.MN cs.LG

    A scalable gene network model of regulatory dynamics in single cells

    Authors: Paul Bertin, Joseph D. Viviano, Alejandro Tejada-Lapuerta, Weixu Wang, Stefan Bauer, Fabian J. Theis, Yoshua Bengio

    Abstract: Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 42 pages, 10 figures

  4. arXiv:2501.02526  [pdf, other

    q-bio.BM cs.LG

    Unified Guidance for Geometry-Conditioned Molecular Generation

    Authors: Sirine Ayadi, Leon Hetzel, Johanna Sommer, Fabian Theis, Stephan Günnemann

    Abstract: Effectively designing molecular geometries is essential to advancing pharmaceutical innovations, a domain, which has experienced great attention through the success of generative models and, in particular, diffusion models. However, current molecular diffusion models are tailored towards a specific downstream task and lack adaptability. We introduce UniGuide, a framework for controlled geometric g… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS)

  5. arXiv:2409.11654  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.NC

    How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

    Authors: Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B. Burkhardt, Andrea Califano, Jonah Cool, Abby F. Dernburg, Kirsty Ewing, Emily B. Fox, Matthias Haury, Amy E. Herr, Eric Horvitz, Patrick D. Hsu, Viren Jain, Gregory R. Johnson, Thomas Kalil, David R. Kelley, Shana O. Kelley, Anna Kreshuk , et al. (17 additional authors not shown)

    Abstract: The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision… ▽ More

    Submitted 14 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  6. arXiv:2407.11734  [pdf, other

    q-bio.QM cs.LG q-bio.GN

    Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen

    Authors: Alessandro Palma, Till Richter, Hanyi Zhang, Manuel Lubetzki, Alexander Tong, Andrea Dittadi, Fabian Theis

    Abstract: Generative modeling of single-cell RNA-seq data is crucial for tasks like trajectory inference, batch effect removal, and simulation of realistic cellular data. However, recent deep generative models simulating synthetic single cells from noise operate on pre-processed continuous gene expression approximations, overlooking the discrete nature of single-cell data, which limits their effectiveness a… ▽ More

    Submitted 3 March, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 41 pages, 22 figures

    Journal ref: The Thirteenth International Conference on Learning Representations (2025)

  7. arXiv:2311.07621  [pdf, other

    q-bio.GN cs.LG

    To Transformers and Beyond: Large Language Models for the Genome

    Authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

    Abstract: In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  8. arXiv:2311.02455  [pdf, other

    cs.LG q-bio.GN q-bio.QM stat.AP

    Mixed Models with Multiple Instance Learning

    Authors: Jan P. Engelmann, Alessandro Palma, Jakub M. Tomczak, Fabian J. Theis, Francesco Paolo Casale

    Abstract: Predicting patient features from single-cell data can help identify cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce MixMIL, a framework integrating Generalized Linear… ▽ More

    Submitted 8 March, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: AISTATS 2024 Oral, Code: https://github.com/AIH-SGML/MixMIL

  9. arXiv:2310.14935  [pdf

    cs.LG q-bio.GN

    Causal machine learning for single-cell genomics

    Authors: Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis

    Abstract: Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the ca… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 35 pages, 7 figures, 3 tables, 1 box

  10. arXiv:2307.00558  [pdf, other

    cs.LG q-bio.QM

    Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

    Authors: Hananeh Aliee, Ferdinand Kapl, Soroor Hediyeh-Zadeh, Fabian J. Theis

    Abstract: This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enf… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  11. arXiv:2306.17246  [pdf, other

    cs.LG q-bio.BM stat.AP

    The power of motifs as inductive bias for learning molecular distributions

    Authors: Johanna Sommer, Leon Hetzel, David Lüdke, Fabian Theis, Stephan Günnemann

    Abstract: Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study… ▽ More

    Submitted 4 April, 2023; originally announced June 2023.

    Comments: Accepted for publication at the MLDD workshop, ICLR 2023

  12. arXiv:2211.03793  [pdf, other

    q-bio.GN cs.LG q-bio.QM stat.AP

    Uncertainty Quantification for Atlas-Level Cell Type Transfer

    Authors: Jan Engelmann, Leon Hetzel, Giovanni Palla, Lisa Sikkema, Malte Luecken, Fabian Theis

    Abstract: Single-cell reference atlases are large-scale, cell-level maps that capture cellular heterogeneity within an organ using single cell genomics. Given their size and cellular diversity, these atlases serve as high-quality training data for the transfer of cell type labels to new datasets. Such label transfer, however, must be robust to domain shifts in gene expression due to measurement technique, l… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Workshop paper at the 2022 ICML Workshop on Computational Biology

  13. arXiv:2205.07110  [pdf, other

    cs.LG q-bio.QM

    SystemMatch: optimizing preclinical drug models to human clinical outcomes via generative latent-space matching

    Authors: Scott Gigante, Varsha G. Raghavan, Amanda M. Robinson, Robert A. Barton, Adeeb H. Rahman, Drausin F. Wulsin, Jacques Banchereau, Noam Solomon, Luis F. Voloch, Fabian J. Theis

    Abstract: Translating the relevance of preclinical models ($\textit{in vitro}$, animal models, or organoids) to their relevance in humans presents an important challenge during drug development. The rising abundance of single-cell genomic data from human tumors and tissue offers a new opportunity to optimize model systems by their similarity to targeted human cell types in disease. In this work, we introduc… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Published at the MLDD workshop, ICLR 2022

  14. arXiv:2204.13545  [pdf, other

    cs.LG q-bio.GN stat.AP stat.ML

    Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

    Authors: Leon Hetzel, Simon Böhm, Niki Kilbertus, Stephan Günnemann, Mohammad Lotfollahi, Fabian Theis

    Abstract: Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely perform… ▽ More

    Submitted 30 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: 10 pages. NeurIPS 2022 conference paper

  15. arXiv:2104.11364  [pdf

    q-bio.OT cs.CY

    A field guide to cultivating computational biology

    Authors: Anne E Carpenter, Casey S Greene, Piero Carnici, Benilton S Carvalho, Michiel de Hoon, Stacey Finley, Kim-Anh Le Cao, Jerry SH Lee, Luigi Marchionni, Suzanne Sindi, Fabian J Theis, Gregory P Way, Jean YH Yang, Elana J Fertig

    Abstract: Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

  16. arXiv:1910.01791  [pdf, other

    cs.LG eess.IV q-bio.CB q-bio.GN stat.ML

    Conditional out-of-sample generation for unpaired data using trVAE

    Authors: Mohammad Lotfollahi, Mohsen Naghipourfar, Fabian J. Theis, F. Alexander Wolf

    Abstract: While generative models have shown great success in generating high-dimensional samples conditional on low-dimensional descriptors (learning e.g. stroke thickness in MNIST, hair color in CelebA, or speaker identity in Wavenet), their generation out-of-sample poses fundamental problems. The conditional variational autoencoder (CVAE) as a simple conditional generative model does not explicitly relat… ▽ More

    Submitted 30 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: Added reference to Johansson et al. (2016) and removed sentences from Lopez et al. (2018) in the background section (see acknowledgements)

  17. arXiv:1909.12550  [pdf

    q-bio.GN q-bio.MN q-bio.PE

    Single-cell eQTLGen Consortium: a personalized understanding of disease

    Authors: Monique G. P. van der Wijst, Dylan H. de Vries, Hilde E. Groot, Gosia Trynka, Chung-Chau Hon, Martijn C. Nawijn, Youssef Idaghdour, Pim van der Harst, Chun J. Ye, Joseph Powell, Fabian J. Theis, Ahmed Mahfouz, Matthias Heinig, Lude Franke

    Abstract: In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

    Comments: 26 pages, 5 figures, position paper of sc-eQTLGen consortium

  18. arXiv:1810.04281  [pdf, other

    stat.AP q-bio.QM

    Fully integrative data analysis of NMR metabolic fingerprints with comprehensive patient data: a case report based on the German Chronic Kidney Disease (GCKD) study

    Authors: Helena U. Zacharias, Michael Altenbuchinger, Stefan Solbrig, Andreas Schäfer, Mustafa Buyukozkan, Ulla T. Schultheiß, Fruzsina Kotsis, Anna Köttgen, Jan Krumsiek, Fabian J. Theis, Rainer Spang, Peter J. Oefner, Wolfram Gronwald, GCKD study investigators

    Abstract: Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To that end, it is necessary to integrate omics data with other data types such as clinical, phenotypic, and demographic parameters of categorical or continuous nature. Here, we exemplify this data integration issue for a study on chronic kidney disea… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

  19. arXiv:1511.01658  [pdf, other

    math.OC q-bio.MN

    A simulation-based approach for solving optimisation problems with ODE-type steady state constraints

    Authors: Anna Fiedler, Fabian J. Theis, Jan Hasenauer

    Abstract: Ordinary differential equations (ODEs) are widely used to model biological, (bio-)chemical and technical processes. The parameters of these ODEs are often estimated from experimental data using ODE-constrained optimisation. This article proposes a simple simulation-based approach for solving optimisation problems with steady state constraints relying on an ODE. This simulation-based optimisation m… ▽ More

    Submitted 5 November, 2015; originally announced November 2015.

    Comments: 11 pages, 3 figures

  20. arXiv:1506.06392  [pdf, other

    q-bio.MN q-bio.QM

    Data-driven modelling of biological multi-scale processes

    Authors: Jan Hasenauer, Nick Jagiella, Sabrina Hross, Fabian J. Theis

    Abstract: Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the rel… ▽ More

    Submitted 21 June, 2015; originally announced June 2015.

    Comments: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers)

    MSC Class: 92Bxx; 93A30

  21. arXiv:1407.2112  [pdf

    cs.GR cs.HC q-bio.QM

    MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data

    Authors: Justin Feigelman, Fabian J. Theis, Carsten Marr

    Abstract: Background: Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. Results: We present Multiresolution Correlation Analysis (MCA), a method for visually identifying subpopul… ▽ More

    Submitted 8 July, 2014; originally announced July 2014.

    Comments: BioVis 2014 conference

  22. Stability and multi-attractor dynamics of a toggle switch based on a two-stage model of stochastic gene expression

    Authors: Michael K. Strasser, Fabian J. Theis, Carsten Marr

    Abstract: A toggle switch consists of two genes that mutually repress each other. This regulatory motif is active during cell differentiation and is thought to act as a memory device, being able to choose and maintain cell fate decisions. In this contribution, we study the stability and dynamics of a two-stage gene expression switch within a probabilistic framework inspired by the properties of the Pu/Gata… ▽ More

    Submitted 1 December, 2011; originally announced December 2011.

    Comments: to appear in the Biophysical Journal

  23. Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

    Authors: Carsten Marr, Fabian J. Theis, Larry S. Liebovitch, Marc-Thorsten Hütt

    Abstract: The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downs… ▽ More

    Submitted 24 May, 2010; originally announced May 2010.

    Comments: 14 pages, 8 figures, to be published in PLoS Computational Biology