-
Progress and new challenges in image-based profiling
Authors:
Erik Serrano,
John Peters,
Jesko Wagner,
Rebecca E. Graham,
Zhenghao Chen,
Brian Feng,
Gisele Miranda,
Alexandr A. Kalinin,
Loan Vulliard,
Jenna Tomkinson,
Cameron Mattson,
Michael J. Lippincott,
Ziqi Kang,
Divya Sitani,
Dave Bunten,
Srijit Seal,
Neil O. Carragher,
Anne E. Carpenter,
Shantanu Singh,
Paula A. Marin Zapata,
Juan C. Caicedo,
Gregory P. Way
Abstract:
For over two decades, image-based profiling has revolutionized cellular phenotype analysis. Image-based profiling processes rich, high-throughput, microscopy data into unbiased measurements that reveal phenotypic patterns powerful for drug discovery, functional genomics, and cell state classification. Here, we review the evolving computational landscape of image-based profiling, detailing current…
▽ More
For over two decades, image-based profiling has revolutionized cellular phenotype analysis. Image-based profiling processes rich, high-throughput, microscopy data into unbiased measurements that reveal phenotypic patterns powerful for drug discovery, functional genomics, and cell state classification. Here, we review the evolving computational landscape of image-based profiling, detailing current procedures, discussing limitations, and highlighting future development directions. Deep learning has fundamentally reshaped image-based profiling, improving feature extraction, scalability, and multimodal data integration. Methodological advancements such as single-cell analysis and batch effect correction, drawing inspiration from single-cell transcriptomics, have enhanced analytical precision. The growth of open-source software ecosystems and the development of community-driven standards have further democratized access to image-based profiling, fostering reproducibility and collaboration across research groups. Despite these advancements, the field still faces significant challenges requiring innovative solutions. By focusing on the technical evolution of image-based profiling rather than the wide-ranging biological applications, our aim with this review is to provide researchers with a roadmap for navigating the progress and new challenges in this rapidly advancing domain.
△ Less
Submitted 7 August, 2025;
originally announced August 2025.
-
Reproducible image-based profiling with Pycytominer
Authors:
Erik Serrano,
Srinivas Niranj Chandrasekaran,
Dave Bunten,
Kenneth I. Brewer,
Jenna Tomkinson,
Roshan Kern,
Michael Bornholdt,
Stephen Fleming,
Ruifan Pei,
John Arevalo,
Hillary Tsang,
Vincent Rubinetti,
Callum Tromans-Coia,
Tim Becker,
Erin Weisbart,
Charlotte Bunne,
Alexandr A. Kalinin,
Rebecca Senft,
Stephen J. Taylor,
Nasim Jamali,
Adeniyi Adeboye,
Hamdah Shafqat Abbasi,
Allen Goodman,
Juan C. Caicedo,
Anne E. Carpenter
, et al. (3 additional authors not shown)
Abstract:
Advances in high-throughput microscopy have enabled the rapid acquisition of large numbers of high-content microscopy images. Whether by deep learning or classical algorithms, image analysis pipelines then produce single-cell features. To process these single-cells for downstream applications, we present Pycytominer, a user-friendly, open-source python package that implements the bioinformatics st…
▽ More
Advances in high-throughput microscopy have enabled the rapid acquisition of large numbers of high-content microscopy images. Whether by deep learning or classical algorithms, image analysis pipelines then produce single-cell features. To process these single-cells for downstream applications, we present Pycytominer, a user-friendly, open-source python package that implements the bioinformatics steps, known as image-based profiling. We demonstrate Pycytominers usefulness in a machine learning project to predict nuisance compounds that cause undesirable cell injuries.
△ Less
Submitted 2 July, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
A field guide to cultivating computational biology
Authors:
Anne E Carpenter,
Casey S Greene,
Piero Carnici,
Benilton S Carvalho,
Michiel de Hoon,
Stacey Finley,
Kim-Anh Le Cao,
Jerry SH Lee,
Luigi Marchionni,
Suzanne Sindi,
Fabian J Theis,
Gregory P Way,
Jean YH Yang,
Elana J Fertig
Abstract:
Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina…
▽ More
Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplinary niche is where computational biology thrives. It has matured over the past three decades and made major contributions to scientific knowledge and human health, yet researchers in the field often languish in career advancement, publication, and grant review. We propose solutions for individual scientists, institutions, journal publishers, funding agencies, and educators.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Evaluating deep variational autoencoders trained on pan-cancer gene expression
Authors:
Gregory P. Way,
Casey S. Greene
Abstract:
Cancer is a heterogeneous disease with diverse molecular etiologies and outcomes. The Cancer Genome Atlas (TCGA) has released a large compendium of over 10,000 tumors with RNA-seq gene expression measurements. Gene expression captures the diverse molecular profiles of tumors and can be interrogated to reveal differential pathway activations. Deep unsupervised models, including Variational Autoenco…
▽ More
Cancer is a heterogeneous disease with diverse molecular etiologies and outcomes. The Cancer Genome Atlas (TCGA) has released a large compendium of over 10,000 tumors with RNA-seq gene expression measurements. Gene expression captures the diverse molecular profiles of tumors and can be interrogated to reveal differential pathway activations. Deep unsupervised models, including Variational Autoencoders (VAE) can be used to reveal these underlying patterns. We compare a one-hidden layer VAE to two alternative VAE architectures with increased depth. We determine the additional capacity marginally improves performance. We train and compare the three VAE architectures to other dimensionality reduction techniques including principal components analysis (PCA), independent components analysis (ICA), non-negative matrix factorization (NMF), and analysis of gene expression by denoising autoencoders (ADAGE). We compare performance in a supervised learning task predicting gene inactivation pan-cancer and in a latent space analysis of high grade serous ovarian cancer (HGSC) subtypes. We do not observe substantial differences across algorithms in the classification task. VAE latent spaces offer biological insights into HGSC subtype biology.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.