-
Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence
Authors:
Thomas A. Lasko,
John M. Still,
Thomas Z. Li,
Marco Barbero Mota,
William W. Stead,
Eric V. Strobl,
Bennett A. Landman,
Fabien Maldonado
Abstract:
Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on…
▽ More
Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Semi-supervised Contrastive Learning Using Partial Label Information
Authors:
Colin B. Hansen,
Vishwesh Nath,
Diego A. Mesa,
Yuankai Huo,
Bennett A. Landman,
Thomas A. Lasko
Abstract:
In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the l…
▽ More
In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.
△ Less
Submitted 3 June, 2024; v1 submitted 17 March, 2020;
originally announced March 2020.
-
MRI correlates of chronic symptoms in mild traumatic brain injury
Authors:
Cailey I. Kerley,
Kurt G. Schilling,
Justin Blaber,
Beth Miller,
Allen Newton,
Adam W. Anderson,
Bennett A. Landman,
Tonia S. Rex
Abstract:
Veterans with mild traumatic brain injury (mTBI) have reported auditory and visual dysfunction that persists beyond the acute incident. The etiology behind these symptoms is difficult to characterize with current clinical imaging. These functional deficits may be caused by shear injury or micro-bleeds, which can be detected with special imaging modalities. We explore these hypotheses in a pilot st…
▽ More
Veterans with mild traumatic brain injury (mTBI) have reported auditory and visual dysfunction that persists beyond the acute incident. The etiology behind these symptoms is difficult to characterize with current clinical imaging. These functional deficits may be caused by shear injury or micro-bleeds, which can be detected with special imaging modalities. We explore these hypotheses in a pilot study of multi-parametric MRI. We extract over 1,000 imaging and clinical metrics and project them to a low-dimensional space, where we can discriminate between healthy controls and patients with mTBI. We also show correlations between the metric representations and patient symptoms.
△ Less
Submitted 22 June, 2020; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Methods and open-source toolkit for analyzing and visualizing challenge results
Authors:
Manuel Wiesenfarth,
Annika Reinke,
Bennett A. Landman,
Manuel Jorge Cardoso,
Lena Maier-Hein,
Annette Kopp-Schneider
Abstract:
Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given…
▽ More
Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. Given these shortcomings, the contribution of this paper is two-fold: (1) We present a set of methods to comprehensively analyze and visualize the results of single-task and multi-task challenges and apply them to a number of simulated and real-life challenges to demonstrate their specific strengths and weaknesses; (2) We release the open-source framework challengeR as part of this work to enable fast and wide adoption of the methodology proposed in this paper. Our approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. This is demonstrated by the experiments performed within this work. Our framework could thus become an important tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond.
△ Less
Submitted 5 December, 2019; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Montage based 3D Medical Image Retrieval from Traumatic Brain Injury Cohort using Deep Convolutional Neural Network
Authors:
Cailey I. Kerley,
Yuankai Huo,
Shikha Chaganti,
Shunxing Bao,
Mayur B. Patel,
Bennett A. Landman
Abstract:
Brain imaging analysis on clinically acquired computed tomography (CT) is essential for the diagnosis, risk prediction of progression, and treatment of the structural phenotypes of traumatic brain injury (TBI). However, in real clinical imaging scenarios, entire body CT images (e.g., neck, abdomen, chest, pelvis) are typically captured along with whole brain CT scans. For instance, in a typical sa…
▽ More
Brain imaging analysis on clinically acquired computed tomography (CT) is essential for the diagnosis, risk prediction of progression, and treatment of the structural phenotypes of traumatic brain injury (TBI). However, in real clinical imaging scenarios, entire body CT images (e.g., neck, abdomen, chest, pelvis) are typically captured along with whole brain CT scans. For instance, in a typical sample of clinical TBI imaging cohort, only ~15% of CT scans actually contain whole brain CT images suitable for volumetric brain analyses; the remaining are partial brain or non-brain images. Therefore, a manual image retrieval process is typically required to isolate the whole brain CT scans from the entire cohort. However, the manual image retrieval is time and resource consuming and even more difficult for the larger cohorts. To alleviate the manual efforts, in this paper we propose an automated 3D medical image retrieval pipeline, called deep montage-based image retrieval (dMIR), which performs classification on 2D montage images via a deep convolutional neural network. The novelty of the proposed method for image processing is to characterize the medical image retrieval task based on the montage images. In a cohort of 2000 clinically acquired TBI scans, 794 scans were used as training data, 206 scans were used as validation data, and the remaining 1000 scans were used as testing data. The proposed achieved accuracy=1.0, recall=1.0, precision=1.0, f1=1.0 for validation data, while achieved accuracy=0.988, recall=0.962, precision=0.962, f1=0.962 for testing data. Thus, the proposed dMIR is able to perform accurate CT whole brain image retrieval from large-scale clinical cohorts.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
Data-driven Probabilistic Atlases Capture Whole-brain Individual Variation
Authors:
Yuankai Huo,
Katherine Swett,
Susan M. Resnick,
Laurie E. Cutting,
Bennett A. Landman
Abstract:
Probabilistic atlases provide essential spatial contextual information for image interpretation, Bayesian modeling, and algorithmic processing. Such atlases are typically constructed by grouping subjects with similar demographic information. Importantly, use of the same scanner minimizes inter-group variability. However, generalizability and spatial specificity of such approaches is more limited t…
▽ More
Probabilistic atlases provide essential spatial contextual information for image interpretation, Bayesian modeling, and algorithmic processing. Such atlases are typically constructed by grouping subjects with similar demographic information. Importantly, use of the same scanner minimizes inter-group variability. However, generalizability and spatial specificity of such approaches is more limited than one might like. Inspired by Commowick "Frankenstein's creature paradigm" which builds a personal specific anatomical atlas, we propose a data-driven framework to build a personal specific probabilistic atlas under the large-scale data scheme. The data-driven framework clusters regions with similar features using a point distribution model to learn different anatomical phenotypes. Regional structural atlases and corresponding regional probabilistic atlases are used as indices and targets in the dictionary. By indexing the dictionary, the whole brain probabilistic atlases adapt to each new subject quickly and can be used as spatial priors for visualization and processing. The novelties of this approach are (1) it provides a new perspective of generating personal specific whole brain probabilistic atlases (132 regions) under data-driven scheme across sites. (2) The framework employs the large amount of heterogeneous data (2349 images). (3) The proposed framework achieves low computational cost since only one affine registration and Pearson correlation operation are required for a new subject. Our method matches individual regions better with higher Dice similarity value when testing the probabilistic atlases. Importantly, the advantage the large-scale scheme is demonstrated by the better performance of using large-scale training data (1888 images) than smaller training set (720 images).
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Opportunities for Mining Radiology Archives for Pediatric Control Images
Authors:
Camilo Bermudez,
Varvara N Probst,
Larry T Davis,
Thomas Lasko,
Bennett A Landman
Abstract:
A large database of brain imaging data from healthy, normal controls is useful to describe physiologic and pathologic structural changes at a population scale. In particular, these data can provide information about structural changes throughout development and aging. However, scarcity of control data as well as technical challenges during imaging acquisition has made it difficult to collect large…
▽ More
A large database of brain imaging data from healthy, normal controls is useful to describe physiologic and pathologic structural changes at a population scale. In particular, these data can provide information about structural changes throughout development and aging. However, scarcity of control data as well as technical challenges during imaging acquisition has made it difficult to collect large amounts of data in a healthy pediatric population. In this study, we search the medical record at Vanderbilt University Medical Center for pediatric patients who received brain imaging, either CT or MRI, according to 7 common complaints: headache, seizure, altered level of consciousness, nausea and vomiting, dizziness, head injury, and gait abnormalities in order to find the percent of studies that demonstrated pathologic findings. Using a text-search based algorithm, we show that an average of 59.3% of MRI studies and 37.3% of CT scans are classified as normal, resulting in the production of thousands of normal images. These results suggest there is a wealth of pediatric imaging control data which can be used to create normative descriptions of development as well as to establish biomarkers for disease.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.