-
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy
Authors:
Oren Kraus,
Federico Comitani,
John Urbanik,
Kian Kenyon-Dean,
Lakshmanan Arumugam,
Saber Saberian,
Cas Wognum,
Safiye Celik,
Imran S. Haque
Abstract:
High Content Screening (HCS) microscopy datasets have transformed the ability to profile cellular responses to genetic and chemical perturbations, enabling cell-based inference of drug-target interactions (DTI). However, the adoption of representation learning methods for HCS data has been hindered by the lack of accessible datasets and robust benchmarks. To address this gap, we present RxRx3-core…
▽ More
High Content Screening (HCS) microscopy datasets have transformed the ability to profile cellular responses to genetic and chemical perturbations, enabling cell-based inference of drug-target interactions (DTI). However, the adoption of representation learning methods for HCS data has been hindered by the lack of accessible datasets and robust benchmarks. To address this gap, we present RxRx3-core, a curated and compressed subset of the RxRx3 dataset, and an associated DTI benchmarking task. At just 18GB, RxRx3-core significantly reduces the size barrier associated with large-scale HCS datasets while preserving critical data necessary for benchmarking representation learning models against a zero-shot DTI prediction task. RxRx3-core includes 222,601 microscopy images spanning 736 CRISPR knockouts and 1,674 compounds at 8 concentrations. RxRx3-core is available on HuggingFace and Polaris, along with pre-trained embeddings and benchmarking code, ensuring accessibility for the research community. By providing a compact dataset and robust benchmarks, we aim to accelerate innovation in representation learning methods for HCS data and support the discovery of novel biological insights.
△ Less
Submitted 30 May, 2025; v1 submitted 25 March, 2025;
originally announced March 2025.
-
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Authors:
Kian Kenyon-Dean,
Zitong Jerry Wang,
John Urbanik,
Konstantin Donhauser,
Jason Hartford,
Saber Saberian,
Nil Sahin,
Ihab Bendidi,
Safiye Celik,
Marta Fay,
Juan Sebastian Rodriguez Vera,
Imran S Haque,
Oren Kraus
Abstract:
Large-scale cell microscopy screens are used in drug discovery and molecular biology research to study the effects of millions of chemical and genetic perturbations on cells. To use these images in downstream analysis, we need models that can map each image into a feature space that represents diverse biological phenotypes consistently, in the sense that perturbations with similar biological effec…
▽ More
Large-scale cell microscopy screens are used in drug discovery and molecular biology research to study the effects of millions of chemical and genetic perturbations on cells. To use these images in downstream analysis, we need models that can map each image into a feature space that represents diverse biological phenotypes consistently, in the sense that perturbations with similar biological effects have similar representations. In this work, we present the largest foundation model for cell microscopy data to date, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome biological relationship recall and replicate consistency benchmarks. Beyond scaling, we developed two key methods that improve performance: (1) training on a curated and diverse dataset; and, (2) using biologically motivated linear probing tasks to search across each transformer block for the best candidate representation of whole-genome screens. We find that many self-supervised vision transformers, pretrained on either natural or microscopy images, yield significantly more biologically meaningful representations of microscopy images in their intermediate blocks than in their typically used final blocks. More broadly, our approach and results provide insights toward a general strategy for successfully building foundation models for large-scale biological data.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Authors:
Pang Wei Koh,
Shiori Sagawa,
Henrik Marklund,
Sang Michael Xie,
Marvin Zhang,
Akshay Balsubramani,
Weihua Hu,
Michihiro Yasunaga,
Richard Lanas Phillips,
Irena Gao,
Tony Lee,
Etienne David,
Ian Stavness,
Wei Guo,
Berton A. Earnshaw,
Imran S. Haque,
Sara Beery,
Jure Leskovec,
Anshul Kundaje,
Emma Pierson,
Sergey Levine,
Chelsea Finn,
Percy Liang
Abstract:
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma…
▽ More
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.
△ Less
Submitted 16 July, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU
Authors:
Imran S. Haque,
Vijay S. Pande
Abstract:
Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphic…
▽ More
Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphics cards has been cited as a hindrance to the acceptance of GPUs as high-performance coprocessors, but the impact of this design has not been previously quantified.
In this article we present MemtestG80, our software for assessing memory error rates on NVIDIA G80 and GT200-architecture-based graphics cards. Furthermore, we present the results of a large-scale assessment of GPU error rate, conducted by running MemtestG80 on over 20,000 hosts on the Folding@home distributed computing network. Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home finds that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture.
△ Less
Submitted 13 November, 2009; v1 submitted 2 October, 2009;
originally announced October 2009.