-
From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification
Authors:
Xue Li,
Jameson Merkow,
Noel C. F. Codella,
Alberto Santamaria-Pang,
Naiteek Sangani,
Alexander Ersoy,
Christopher Burt,
John W. Garrett,
Richard J. Bruce,
Joshua D. Warner,
Tyler Bradshaw,
Ivan Tarapov,
Matthew P. Lungren,
Alan B. McMillan
Abstract:
Foundation models, pretrained on extensive datasets, have significantly advanced machine learning by providing robust and transferable embeddings applicable to various domains, including medical imaging diagnostics. This study evaluates the utility of embeddings derived from both general-purpose and medical domain-specific foundation models for training lightweight adapter models in multi-class ra…
▽ More
Foundation models, pretrained on extensive datasets, have significantly advanced machine learning by providing robust and transferable embeddings applicable to various domains, including medical imaging diagnostics. This study evaluates the utility of embeddings derived from both general-purpose and medical domain-specific foundation models for training lightweight adapter models in multi-class radiography classification, focusing specifically on tube placement assessment. A dataset comprising 8842 radiographs classified into seven distinct categories was employed to extract embeddings using six foundation models: DenseNet121, BiomedCLIP, Med-Flamingo, MedImageInsight, Rad-DINO, and CXR-Foundation. Adapter models were subsequently trained using classical machine learning algorithms. Among these combinations, MedImageInsight embeddings paired with an support vector machine adapter yielded the highest mean area under the curve (mAUC) at 93.8%, followed closely by Rad-DINO (91.1%) and CXR-Foundation (89.0%). In comparison, BiomedCLIP and DenseNet121 exhibited moderate performance with mAUC scores of 83.0% and 81.8%, respectively, whereas Med-Flamingo delivered the lowest performance at 75.1%. Notably, most adapter models demonstrated computational efficiency, achieving training within one minute and inference within seconds on CPU, underscoring their practicality for clinical applications. Furthermore, fairness analyses on adapters trained on MedImageInsight-derived embeddings indicated minimal disparities, with gender differences in performance within 2% and standard deviations across age groups not exceeding 3%. These findings confirm that foundation model embeddings-especially those from MedImageInsight-facilitate accurate, computationally efficient, and equitable diagnostic classification using lightweight adapters for radiographic image analysis.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation
Authors:
Md Mahfuz Al Hasan,
Mahdi Zaman,
Abdul Jawad,
Alberto Santamaria-Pang,
Ho Hin Lee,
Ivan Tarapov,
Kyle See,
Md Shah Imran,
Antika Roy,
Yaser Pourmohammadi Fallah,
Navid Asadizanjani,
Reza Forghani
Abstract:
Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for con…
▽ More
Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for contextual representation, and ii) is inspired by the top-down mechanism of the human visual recognition system, making it a biologically motivated architecture. By employing discrete wavelet transformations (DWT) at multiple scales, WaveFormer preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction. This significantly reduces the number of parameters, which is critical for real-world deployment where computational resources and training times are constrained. Furthermore, the model is generic and easily adaptable to diverse applications. Evaluations on BraTS2023, FLARE2021, and KiTS2023 demonstrate performance on par with state-of-the-art methods while offering substantially lower computational complexity.
△ Less
Submitted 31 March, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model Representations
Authors:
Ho Hin Lee,
Alberto Santamaria-Pang,
Jameson Merkov,
Matthew Lungren,
Ivan Tarapov
Abstract:
Accurate survival prediction in oncology requires integrating diverse imaging modalities to capture the complex interplay of tumor biology. Traditional single-modality approaches often fail to leverage the complementary insights provided by radiological and pathological assessments. In this work, we introduce M4Survive (Multi-Modal Mamba Modeling for Survival Prediction), a novel framework that le…
▽ More
Accurate survival prediction in oncology requires integrating diverse imaging modalities to capture the complex interplay of tumor biology. Traditional single-modality approaches often fail to leverage the complementary insights provided by radiological and pathological assessments. In this work, we introduce M4Survive (Multi-Modal Mamba Modeling for Survival Prediction), a novel framework that learns joint foundation model representations using efficient adapter networks. Our approach dynamically fuses heterogeneous embeddings from a foundation model repository (e.g., MedImageInsight, BiomedCLIP, Prov-GigaPath, UNI2-h), creating a correlated latent space optimized for survival risk estimation. By leveraging Mamba-based adapters, M4Survive enables efficient multi-modal learning while preserving computational efficiency. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms both unimodal and traditional static multi-modal baselines in survival prediction accuracy. This work underscores the potential of foundation model-driven multi-modal fusion in advancing precision oncology and predictive analytics.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation
Authors:
Alberto Santamaria-Pang,
Frank Tuan,
Ross Campbell,
Cindy Zhang,
Ankush Jindal,
Roopa Surapur,
Brad Holloman,
Deanna Hanisch,
Rae Buckley,
Carisa Cooney,
Ivan Tarapov,
Kimberly S. Peairs,
Brian Hasselfeld,
Peter Greene
Abstract:
The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physici…
▽ More
The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physician workload and improve patient-provider communication. We developed OPTIC (Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations), a powerful message triaging tool that utilizes GPT-4 for data labeling and BERT for model distillation. The study used a dataset of 405,487 patient messaging encounters from Johns Hopkins Medicine between January and June 2020. High-quality labeled data was generated through GPT-4-based prompt engineering, which was then used to train a BERT model to classify messages as "Admin" or "Clinical." The BERT model achieved 88.85% accuracy on the test set validated by GPT-4 labeling, with a sensitivity of 88.29%, specificity of 89.38%, and an F1 score of 0.8842. BERTopic analysis identified 81 distinct topics within the test data, with over 80% accuracy in classifying 58 topics. The system was successfully deployed through Epic's Nebula Cloud Platform, demonstrating its practical effectiveness in healthcare settings.
△ Less
Submitted 5 February, 2025;
originally announced March 2025.
-
MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging
Authors:
Noel C. F. Codella,
Ying Jin,
Shrey Jain,
Yu Gu,
Ho Hin Lee,
Asma Ben Abacha,
Alberto Santamaria-Pang,
Will Guyman,
Naiteek Sangani,
Sheng Zhang,
Hoifung Poon,
Stephanie Hyland,
Shruthi Bannur,
Javier Alvarez-Valle,
Xue Li,
John Garrett,
Alan McMillan,
Gaurav Rajguru,
Madhu Maddi,
Nilesh Vijayrania,
Rehaan Bhimai,
Nick Mecklenburg,
Rupal Jain,
Daniel Holstein,
Naveen Gaur
, et al. (6 additional authors not shown)
Abstract:
In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-ar…
▽ More
In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-art (SOTA) or human expert level performance across classification, image-image search, and fine-tuning tasks. Specifically, on public datasets, MedImageInsight achieves SOTA in CT 3D medical image retrieval, as well as SOTA in disease classification and search for chest X-ray, dermatology, and OCT imaging. Furthermore, MedImageInsight achieves human expert performance in bone age estimation (on both public and partner data), as well as AUC above 0.9 in most other domains. When paired with a text decoder, MedImageInsight achieves near SOTA level single image report findings generation with less than 10\% the parameters of other models. Compared to fine-tuning GPT-4o with only MIMIC-CXR data for the same task, MedImageInsight outperforms in clinical metrics, but underperforms on lexical metrics where GPT-4o sets a new SOTA. Importantly for regulatory purposes, MedImageInsight can generate ROC curves, adjust sensitivity and specificity based on clinical need, and provide evidence-based decision support through image-image search (which can also enable retrieval augmented generation). In an independent clinical evaluation of image-image search in chest X-ray, MedImageInsight outperformed every other publicly available foundation model evaluated by large margins (over 6 points AUC), and significantly outperformed other models in terms of AI fairness (across age and gender). We hope releasing MedImageInsight will help enhance collective progress in medical imaging AI research and development.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Emergent Language Symbolic Autoencoder (ELSA) with Weak Supervision to Model Hierarchical Brain Networks
Authors:
Ammar Ahmed Pallikonda Latheef,
Alberto Santamaria-Pang,
Craig K Jones,
Haris I Sair
Abstract:
Brain networks display a hierarchical organization, a complexity that poses a challenge for existing deep learning models, often structured as flat classifiers, leading to difficulties in interpretability and the 'black box' issue. To bridge this gap, we propose a novel architecture: a symbolic autoencoder informed by weak supervision and an Emergent Language (EL) framework. This model moves beyon…
▽ More
Brain networks display a hierarchical organization, a complexity that poses a challenge for existing deep learning models, often structured as flat classifiers, leading to difficulties in interpretability and the 'black box' issue. To bridge this gap, we propose a novel architecture: a symbolic autoencoder informed by weak supervision and an Emergent Language (EL) framework. This model moves beyond traditional flat classifiers by producing hierarchical clusters and corresponding imagery, subsequently represented through symbolic sentences to improve the clinical interpretability of hierarchically organized data such as intrinsic brain networks, which can be characterized using resting-state fMRI images. Our innovation includes a generalized hierarchical loss function designed to ensure that both sentences and images accurately reflect the hierarchical structure of functional brain networks. This enables us to model functional brain networks from a broader perspective down to more granular details. Furthermore, we introduce a quantitative method to assess the hierarchical consistency of these symbolic representations. Our qualitative analyses show that our model successfully generates hierarchically organized, clinically interpretable images, a finding supported by our quantitative evaluations. We find that our best performing loss function leads to a hierarchical consistency of over 97% when identifying images corresponding to brain networks. This approach not only advances the interpretability of deep learning models in neuroimaging analysis but also represents a significant step towards modeling the intricate hierarchical nature of brain networks.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Foundation Models for Biomedical Image Segmentation: A Survey
Authors:
Ho Hin Lee,
Yu Gu,
Theodore Zhao,
Yanbo Xu,
Jianwei Yang,
Naoto Usuyama,
Cliff Wong,
Mu Wei,
Bennett A. Landman,
Yuankai Huo,
Alberto Santamaria-Pang,
Hoifung Poon
Abstract:
Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical im…
▽ More
Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical imaging. The fundamental premise of SAM lies in its capability to segment or identify objects in images without prior knowledge of the object type or imaging modality. This approach aligns well with tasks achievable by the human visual system, though its application in non-biological vision contexts remains more theoretically challenging. A notable feature of SAM is its ability to adjust segmentation according to a specified resolution scale or area of interest, akin to semantic priming. This adaptability has spurred a wave of creativity and innovation in applying SAM to medical imaging. Our review focuses on the period from April 1, 2023, to September 30, 2023, a critical first six months post-initial publication. We examine the adaptations and integrations of SAM necessary to address longstanding clinical challenges, particularly in the context of 33 open datasets covered in our analysis. While SAM approaches or achieves state-of-the-art performance in numerous applications, it falls short in certain areas, such as segmentation of the carotid artery, adrenal glands, optic nerve, and mandible bone. Our survey delves into the innovative techniques where SAM's foundational approach excels and explores the core concepts in translating and applying these models effectively in diverse medical imaging scenarios.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
Authors:
Asma Ben Abacha,
Alberto Santamaria-Pang,
Ho Hin Lee,
Jameson Merkow,
Qin Cai,
Surya Teja Devarakonda,
Abdullah Islam,
Julia Gong,
Matthew P. Lungren,
Thomas Lin,
Noel C Codella,
Ivan Tarapov
Abstract:
The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant c…
▽ More
The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query
Authors:
Ho Hin Lee,
Alberto Santamaria-Pang,
Jameson Merkow,
Ozan Oktay,
Fernando Pérez-García,
Javier Alvarez-Valle,
Ivan Tarapov
Abstract:
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) that demonstrates the feasibility of medical image retrieval with similar anatomical regions. RegionMIR addresses two major challenges for medical image retrieval i) standardization of clinically relevant searching criteria (e.g., anatomical, pathology-based), and ii) localization of anatomical area o…
▽ More
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) that demonstrates the feasibility of medical image retrieval with similar anatomical regions. RegionMIR addresses two major challenges for medical image retrieval i) standardization of clinically relevant searching criteria (e.g., anatomical, pathology-based), and ii) localization of anatomical area of interests that are semantically meaningful. In this work, we propose an ROI image retrieval image network that retrieves images with similar anatomy by extracting anatomical features (via bounding boxes) and evaluate similarity between pairwise anatomy-categorized features between the query and the database of images using contrastive learning. ROI queries are encoded using a contrastive-pretrained encoder that was fine-tuned for anatomy classification, which generates an anatomical-specific latent space for region-correlated image retrieval. During retrieval, we compare the anatomically encoded query to find similar features within a feature database generated from training samples, and retrieve images with similar regions from training samples. We evaluate our approach on both anatomy classification and image retrieval tasks using the Chest ImaGenome Dataset. Our proposed strategy yields an improvement over state-of-the-art pretraining and co-training strategies, from 92.24 to 94.12 (2.03%) classification accuracy in anatomies. We qualitatively evaluate the image retrieval performance demonstrating generalizability across multiple anatomies with different morphology.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Deep Labeling of fMRI Brain Networks
Authors:
Ammar Ahmed Pallikonda Latheef,
Sejal Ghate,
Zhipeng Hui,
Alberto Santamaria-Pang,
Ivan Tarapov,
Haris I Sair,
Craig K Jones
Abstract:
Resting State Networks (RSNs) of the brain extracted from Resting State functional Magnetic Resonance Imaging (RS-fMRI) are used in the pre-surgical planning to guide the neurosurgeon. This is difficult, though, as expert knowledge is required to label each of the RSNs. There is a lack of efficient and standardized methods to be used in clinical workflows. Additionally, these methods need to be ge…
▽ More
Resting State Networks (RSNs) of the brain extracted from Resting State functional Magnetic Resonance Imaging (RS-fMRI) are used in the pre-surgical planning to guide the neurosurgeon. This is difficult, though, as expert knowledge is required to label each of the RSNs. There is a lack of efficient and standardized methods to be used in clinical workflows. Additionally, these methods need to be generalizable since the method needs to work well regardless of the acquisition technique. We propose an accurate, fast, and lightweight deep learning approach to label RSNs. Group Independent Component Analysis (ICA) was used to extract large scale functional connectivity patterns in the cohort and dual regression was used to back project them on individual subject RSNs. We compare a Multi-Layer Perceptron (MLP) based method with 2D and 3D Convolutional Neural Networks (CNNs) and find that the MLP is faster and more accurate. The MLP method performs as good or better than other works despite its compact size. We prove the generalizability of our method by showing that the MLP performs at 100% accuracy in the holdout dataset and 98.3% accuracy in three other sites' fMRI acquisitions.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Deep Labeling of fMRI Brain Networks Using Cloud Based Processing
Authors:
Sejal Ghate,
Alberto Santamaria-Pang,
Ivan Tarapov,
Haris I Sair,
Craig K Jones
Abstract:
Resting state fMRI is an imaging modality which reveals brain activity localization through signal changes, in what is known as Resting State Networks (RSNs). This technique is gaining popularity in neurosurgical pre-planning to visualize the functional regions and assess regional activity. Labeling of rs-fMRI networks require subject-matter expertise and is time consuming, creating a need for an…
▽ More
Resting state fMRI is an imaging modality which reveals brain activity localization through signal changes, in what is known as Resting State Networks (RSNs). This technique is gaining popularity in neurosurgical pre-planning to visualize the functional regions and assess regional activity. Labeling of rs-fMRI networks require subject-matter expertise and is time consuming, creating a need for an automated classification algorithm. While the impact of AI in medical diagnosis has shown great progress; deploying and maintaining these in a clinical setting is an unmet need. We propose an end-to-end reproducible pipeline which incorporates image processing of rs-fMRI in a cloud-based workflow while using deep learning to automate the classification of RSNs. We have architected a reproducible Azure Machine Learning cloud-based medical imaging concept pipeline for fMRI analysis integrating the popular FMRIB Software Library (FSL) toolkit. To demonstrate a clinical application using a large dataset, we compare three neural network architectures for classification of deeper RSNs derived from processed rs-fMRI. The three algorithms are: an MLP, a 2D projection-based CNN, and a fully 3D CNN classification networks. Each of the net-works was trained on the rs-fMRI back-projected independent components giving >98% accuracy for each classification method.
△ Less
Submitted 20 September, 2022; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Adversarial Attacks with Time-Scale Representations
Authors:
Alberto Santamaria-Pang,
Jianwei Qiu,
Aritra Chowdhury,
James Kubricht,
Peter Tu,
Iyer Naresh,
Nurali Virani
Abstract:
We propose a novel framework for real-time black-box universal attacks which disrupts activations of early convolutional layers in deep learning models. Our hypothesis is that perturbations produced in the wavelet space disrupt early convolutional layers more effectively than perturbations performed in the time domain. The main challenge in adversarial attacks is to preserve low frequency image co…
▽ More
We propose a novel framework for real-time black-box universal attacks which disrupts activations of early convolutional layers in deep learning models. Our hypothesis is that perturbations produced in the wavelet space disrupt early convolutional layers more effectively than perturbations performed in the time domain. The main challenge in adversarial attacks is to preserve low frequency image content while minimally changing the most meaningful high frequency content. To address this, we formulate an optimization problem using time-scale (wavelet) representations as a dual space in three steps. First, we project original images into orthonormal sub-spaces for low and high scales via wavelet coefficients. Second, we perturb wavelet coefficients for high scale projection using a generator network. Third, we generate new adversarial images by projecting back the original coefficients from the low scale and the perturbed coefficients from the high scale sub-space. We provide a theoretical framework that guarantees a dual mapping from time and time-scale domain representations. We compare our results with state-of-the-art black-box attacks from generative-based and gradient-based models. We also verify efficacy against multiple defense methods such as JPEG compression, Guided Denoiser and Comdefend. Our results show that wavelet-based perturbations consistently outperform time-based attacks thus providing new insights into vulnerabilities of deep learning models and could potentially lead to robust architectures or new defense and attack mechanisms by leveraging time-scale representations.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Symbolic Semantic Segmentation and Interpretation of COVID-19 Lung Infections in Chest CT volumes based on Emergent Languages
Authors:
Aritra Chowdhury,
Alberto Santamaria-Pang,
James R. Kubricht,
Jianwei Qiu,
Peter Tu
Abstract:
The coronavirus disease (COVID-19) has resulted in a pandemic crippling the a breadth of services critical to daily life. Segmentation of lung infections in computerized tomography (CT) slices could be be used to improve diagnosis and understanding of COVID-19 in patients. Deep learning systems lack interpretability because of their black box nature. Inspired by human communication of complex idea…
▽ More
The coronavirus disease (COVID-19) has resulted in a pandemic crippling the a breadth of services critical to daily life. Segmentation of lung infections in computerized tomography (CT) slices could be be used to improve diagnosis and understanding of COVID-19 in patients. Deep learning systems lack interpretability because of their black box nature. Inspired by human communication of complex ideas through language, we propose a symbolic framework based on emergent languages for the segmentation of COVID-19 infections in CT scans of lungs. We model the cooperation between two artificial agents - a Sender and a Receiver. These agents synergistically cooperate using emergent symbolic language to solve the task of semantic segmentation. Our game theoretic approach is to model the cooperation between agents unlike Generative Adversarial Networks (GANs). The Sender retrieves information from one of the higher layers of the deep network and generates a symbolic sentence sampled from a categorical distribution of vocabularies. The Receiver ingests the stream of symbols and cogenerates the segmentation mask. A private emergent language is developed that forms the communication channel used to describe the task of segmentation of COVID infections. We augment existing state of the art semantic segmentation architectures with our symbolic generator to form symbolic segmentation models. Our symbolic segmentation framework achieves state of the art performance for segmentation of lung infections caused by COVID-19. Our results show direct interpretation of symbolic sentences to discriminate between normal and infected regions, infection morphology and image characteristics. We show state of the art results for segmentation of COVID-19 lung infections in CT.
△ Less
Submitted 22 August, 2020;
originally announced August 2020.
-
Emergent symbolic language based deep medical image classification
Authors:
Aritra Chowdhury,
Alberto Santamaria-Pang,
James R. Kubricht,
Peter Tu
Abstract:
Modern deep learning systems for medical image classification have demonstrated exceptional capabilities for distinguishing between image based medical categories. However, they are severely hindered by their ina-bility to explain the reasoning behind their decision making. This is partly due to the uninterpretable continuous latent representations of neural net-works. Emergent languages (EL) have…
▽ More
Modern deep learning systems for medical image classification have demonstrated exceptional capabilities for distinguishing between image based medical categories. However, they are severely hindered by their ina-bility to explain the reasoning behind their decision making. This is partly due to the uninterpretable continuous latent representations of neural net-works. Emergent languages (EL) have recently been shown to enhance the capabilities of neural networks by equipping them with symbolic represen-tations in the framework of referential games. Symbolic representations are one of the cornerstones of highly explainable good old fashioned AI (GOFAI) systems. In this work, we demonstrate for the first time, the emer-gence of deep symbolic representations of emergent language in the frame-work of image classification. We show that EL based classification models can perform as well as, if not better than state of the art deep learning mod-els. In addition, they provide a symbolic representation that opens up an entire field of possibilities of interpretable GOFAI methods involving symbol manipulation. We demonstrate the EL classification framework on immune cell marker based cell classification and chest X-ray classification using the CheXpert dataset. Code is available online at https://github.com/AriChow/EL.
△ Less
Submitted 22 August, 2020;
originally announced August 2020.
-
Automated Phenotyping via Cell Auto Training (CAT) on the Cell DIVE Platform
Authors:
Alberto Santamaria-Pang,
Anup Sood,
Dan Meyer,
Aritra Chowdhury,
Fiona Ginty
Abstract:
We present a method for automatic cell classification in tissue samples using an automated training set from multiplexed immunofluorescence images. The method utilizes multiple markers stained in situ on a single tissue section on a robust hyperplex immunofluorescence platform (Cell DIVE, GE Healthcare) that provides multi-channel images allowing analysis at single cell/sub-cellular levels. The ce…
▽ More
We present a method for automatic cell classification in tissue samples using an automated training set from multiplexed immunofluorescence images. The method utilizes multiple markers stained in situ on a single tissue section on a robust hyperplex immunofluorescence platform (Cell DIVE, GE Healthcare) that provides multi-channel images allowing analysis at single cell/sub-cellular levels. The cell classification method consists of two steps: first, an automated training set from every image is generated using marker-to-cell staining information. This mimics how a pathologist would select samples from a very large cohort at the image level. In the second step, a probability model is inferred from the automated training set. The probabilistic model captures staining patterns in mutually exclusive cell types and builds a single probability model for the data cohort. We have evaluated the proposed approach to classify: i) immune cells in cancer and ii) brain cells in neurological degenerative diseased tissue with average accuracies above 95%.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
ESCELL: Emergent Symbolic Cellular Language
Authors:
Aritra Chowdhury,
James R. Kubricht,
Anup Sood,
Peter Tu,
Alberto Santamaria-Pang
Abstract:
We present ESCELL, a method for developing an emergent symbolic language of communication between multiple agents reasoning about cells. We show how agents are able to cooperate and communicate successfully in the form of symbols similar to human language to accomplish a task in the form of a referential game (Lewis' signaling game). In one form of the game, a sender and a receiver observe a set o…
▽ More
We present ESCELL, a method for developing an emergent symbolic language of communication between multiple agents reasoning about cells. We show how agents are able to cooperate and communicate successfully in the form of symbols similar to human language to accomplish a task in the form of a referential game (Lewis' signaling game). In one form of the game, a sender and a receiver observe a set of cells from 5 different cell phenotypes. The sender is told one cell is a target and is allowed to send one symbol to the receiver from a fixed arbitrary vocabulary size. The receiver relies on the information in the symbol to identify the target cell. We train the sender and receiver networks to develop an innate emergent language between themselves to accomplish this task. We observe that the networks are able to successfully identify cells from 5 different phenotypes with an accuracy of 93.2%. We also introduce a new form of the signaling game where the sender is shown one image instead of all the images that the receiver sees. The networks successfully develop an emergent language to get an identification accuracy of 77.8%.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
Towards Emergent Language Symbolic Semantic Segmentation and Model Interpretability
Authors:
Alberto Santamaria-Pang,
James Kubricht,
Aritra Chowdhury,
Chitresh Bhushan,
Peter Tu
Abstract:
Recent advances in methods focused on the grounding problem have resulted in techniques that can be used to construct a symbolic language associated with a specific domain. Inspired by how humans communicate complex ideas through language, we developed a generalized Symbolic Semantic ($\text{S}^2$) framework for interpretable segmentation. Unlike adversarial models (e.g., GANs), we explicitly mode…
▽ More
Recent advances in methods focused on the grounding problem have resulted in techniques that can be used to construct a symbolic language associated with a specific domain. Inspired by how humans communicate complex ideas through language, we developed a generalized Symbolic Semantic ($\text{S}^2$) framework for interpretable segmentation. Unlike adversarial models (e.g., GANs), we explicitly model cooperation between two agents, a Sender and a Receiver, that must cooperate to achieve a common goal. The Sender receives information from a high layer of a segmentation network and generates a symbolic sentence derived from a categorical distribution. The Receiver obtains the symbolic sentences and co-generates the segmentation mask. In order for the model to converge, the Sender and Receiver must learn to communicate using a private language. We apply our architecture to segment tumors in the TCGA dataset. A UNet-like architecture is used to generate input to the Sender network which produces a symbolic sentence, and a Receiver network co-generates the segmentation mask based on the sentence. Our Segmentation framework achieved similar or better performance compared with state-of-the-art segmentation methods. In addition, our results suggest direct interpretation of the symbolic sentences to discriminate between normal and tumor tissue, tumor morphology, and other image characteristics.
△ Less
Submitted 4 August, 2020; v1 submitted 18 July, 2020;
originally announced July 2020.