-
PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue
Authors:
George Shaikovski,
Eugene Vorontsov,
Adam Casson,
Julian Viret,
Eric Zimmermann,
Neil Tenenholtz,
Yi Kan Wang,
Jan H. Bernhard,
Ran A. Godrich,
Juan A. Retamero,
Razik Yousfi,
Nicolo Fusi,
Thomas J. Fuchs,
Kristen Severson,
Siqi Liu
Abstract:
Recent pathology foundation models can provide rich tile-level representations but fall short of delivering general-purpose clinical utility without further extensive model development. These models lack whole-slide image (WSI) understanding and are not trained with large-scale diagnostic data, limiting their performance on diverse downstream tasks. We introduce PRISM2, a multi-modal slide-level f…
▽ More
Recent pathology foundation models can provide rich tile-level representations but fall short of delivering general-purpose clinical utility without further extensive model development. These models lack whole-slide image (WSI) understanding and are not trained with large-scale diagnostic data, limiting their performance on diverse downstream tasks. We introduce PRISM2, a multi-modal slide-level foundation model trained via clinical dialogue to enable scalable, generalizable pathology AI. PRISM2 is trained on nearly 700,000 specimens (2.3 million WSIs) paired with real-world clinical diagnostic reports in a two-stage process. In Stage 1, a vision-language model is trained using contrastive and captioning objectives to align whole slide embeddings with textual clinical diagnosis. In Stage 2, the language model is unfrozen to enable diagnostic conversation and extract more clinically meaningful representations from hidden states. PRISM2 achieves strong performance on diagnostic and biomarker prediction tasks, outperforming prior slide-level models including PRISM and TITAN. It also introduces a zero-shot yes/no classification approach that surpasses CLIP-style methods without prompt tuning or class enumeration. By aligning visual features with clinical reasoning, PRISM2 improves generalization on both data-rich and low-sample tasks, offering a scalable path forward for building general pathology AI agents capable of assisting diagnostic and prognostic decisions.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Authors:
Eric Zimmermann,
Eugene Vorontsov,
Julian Viret,
Adam Casson,
Michal Zelechowski,
George Shaikovski,
Neil Tenenholtz,
James Hall,
David Klimstra,
Razik Yousfi,
Thomas Fuchs,
Nicolo Fusi,
Siqi Liu,
Kristen Severson
Abstract:
Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and mo…
▽ More
Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce three new models: Virchow2, a 632 million parameter vision transformer, Virchow2G, a 1.9 billion parameter vision transformer, and Virchow2G Mini, a 22 million parameter distillation of Virchow2G, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.
△ Less
Submitted 6 November, 2024; v1 submitted 1 August, 2024;
originally announced August 2024.
-
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
Authors:
George Shaikovski,
Adam Casson,
Kristen Severson,
Eric Zimmermann,
Yi Kan Wang,
Jeremy D. Kunz,
Juan A. Retamero,
Gerard Oakley,
David Klimstra,
Christopher Kanan,
Matthew Hanna,
Michal Zelechowski,
Julian Viret,
Neil Tenenholtz,
James Hall,
Nicolo Fusi,
Razik Yousfi,
Peter Hamilton,
William A. Moye,
Eugene Vorontsov,
Siqi Liu,
Thomas J. Fuchs
Abstract:
Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately.…
▽ More
Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately. The requirement to train a network to aggregate information across a large number of tiles in multiple whole slide images limits these models' impact. In this work, we present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings and leverages clinical report text for pre-training. Using the tile embeddings, PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use. Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching and surpassing that of a supervised aggregator model. Using the slide embeddings with linear classifiers, PRISM surpasses supervised aggregator models. Furthermore, we demonstrate that fine-tuning of the PRISM slide encoder yields label-efficient training for biomarker prediction, a task that typically suffers from low availability of training data; an aggregator initialized with PRISM and trained on as little as 10% of the training data can outperform a supervised baseline that uses all of the data.
△ Less
Submitted 22 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Adapting Self-Supervised Learning for Computational Pathology
Authors:
Eric Zimmermann,
Neil Tenenholtz,
James Hall,
George Shaikovski,
Michal Zelechowski,
Adam Casson,
Fausto Milletari,
Julian Viret,
Eugene Vorontsov,
Siqi Liu,
Kristen Severson
Abstract:
Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primari…
▽ More
Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Improving mitosis detection on histopathology images using large vision-language models
Authors:
Ruiwen Ding,
James Hall,
Neil Tenenholtz,
Kristen Severson
Abstract:
In certain types of cancerous tissue, mitotic count has been shown to be associated with tumor proliferation, poor prognosis, and therapeutic resistance. Due to the high inter-rater variability of mitotic counting by pathologists, convolutional neural networks (CNNs) have been employed to reduce the subjectivity of mitosis detection in hematoxylin and eosin (H&E)-stained whole slide images. Howeve…
▽ More
In certain types of cancerous tissue, mitotic count has been shown to be associated with tumor proliferation, poor prognosis, and therapeutic resistance. Due to the high inter-rater variability of mitotic counting by pathologists, convolutional neural networks (CNNs) have been employed to reduce the subjectivity of mitosis detection in hematoxylin and eosin (H&E)-stained whole slide images. However, most existing models have performance that lags behind expert panel review and only incorporate visual information. In this work, we demonstrate that pre-trained large-scale vision-language models that leverage both visual features and natural language improve mitosis detection accuracy. We formulate the mitosis detection task as an image captioning task and a visual question answering (VQA) task by including metadata such as tumor and scanner types as context. The effectiveness of our pipeline is demonstrated via comparison with various baseline models using 9,501 mitotic figures and 11,051 hard negatives (non-mitotic figures that are difficult to characterize) from the publicly available Mitosis Domain Generalization Challenge (MIDOG22) dataset.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Virchow: A Million-Slide Digital Pathology Foundation Model
Authors:
Eugene Vorontsov,
Alican Bozkurt,
Adam Casson,
George Shaikovski,
Michal Zelechowski,
Siqi Liu,
Kristen Severson,
Eric Zimmermann,
James Hall,
Neil Tenenholtz,
Nicolo Fusi,
Philippe Mathieu,
Alexander van Eck,
Donghun Lee,
Julian Viret,
Eric Robert,
Yi Kan Wang,
Jeremy D. Kunz,
Matthew C. H. Lee,
Jan Bernhard,
Ran A. Godrich,
Gerard Oakley,
Ewan Millar,
Matthew Hanna,
Juan Retamero
, et al. (6 additional authors not shown)
Abstract:
The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computati…
▽ More
The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computational pathology. Using self-supervised learning empowered by the DINOv2 algorithm, Virchow is a vision transformer model with 632 million parameters trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue and specimen types, which is orders of magnitude more data than previous works. The Virchow model enables the development of a pan-cancer detection system with 0.949 overall specimen-level AUC across 17 different cancer types, while also achieving 0.937 AUC on 7 rare cancer types. The Virchow model sets the state-of-the-art on the internal and external image tile level benchmarks and slide level biomarker prediction tasks. The gains in performance highlight the importance of training on massive pathology image datasets, suggesting scaling up the data and network architecture can improve the accuracy for many high-impact computational pathology applications where limited amounts of training data are available.
△ Less
Submitted 17 January, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Statistical learning for accurate and interpretable battery lifetime prediction
Authors:
Peter M. Attia,
Kristen A. Severson,
Jeremy D. Witmer
Abstract:
Data-driven methods for battery lifetime prediction are attracting increasing attention for applications in which the degradation mechanisms are poorly understood and suitable training sets are available. However, while advanced machine learning and deep learning methods promise high performance with minimal data preprocessing, simpler linear models with engineered features often achieve comparabl…
▽ More
Data-driven methods for battery lifetime prediction are attracting increasing attention for applications in which the degradation mechanisms are poorly understood and suitable training sets are available. However, while advanced machine learning and deep learning methods promise high performance with minimal data preprocessing, simpler linear models with engineered features often achieve comparable performance, especially for small training sets, while also providing physical and statistical interpretability. In this work, we use a previously published dataset to develop simple, accurate, and interpretable data-driven models for battery lifetime prediction. We first present the "capacity matrix" concept as a compact representation of battery electrochemical cycling data, along with a series of feature representations. We then create a number of univariate and multivariate models, many of which achieve comparable performance to the highest-performing models previously published for this dataset. These models also provide insights into the degradation of these cells. Our approaches can be used both to quickly train models for a new dataset and to benchmark the performance of more advanced machine learning methods.
△ Less
Submitted 24 April, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
DPVis: Visual Analytics with Hidden Markov Models for Disease Progression Pathways
Authors:
Bum Chul Kwon,
Vibha Anand,
Kristen A Severson,
Soumya Ghosh,
Zhaonan Sun,
Brigitte I Frohnert,
Markus Lundgren,
Kenney Ng
Abstract:
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models…
▽ More
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
△ Less
Submitted 9 April, 2020; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Unsupervised learning with contrastive latent variable models
Authors:
Kristen Severson,
Soumya Ghosh,
Kenney Ng
Abstract:
In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.…
▽ More
In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of a probabilistic model, namely that it allows for the incorporation of prior information, handles missing data, and can be generalized to different distributional assumptions. We describe several possible variations of the model and demonstrate the application of the technique to de-noising, feature selection, and subgroup discovery settings.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.