-
Virtual Dosimetrists: A Radiotherapy Training "Flight Simulator"
Authors:
Skylar S. Gay,
Tucker Netherton,
Barbara Marquez,
Raymond Mumme,
Mary Gronberg,
Brent Parker,
Chelsea Pinnix,
Sanjay Shete,
Carlos Cardenas,
Laurence Court
Abstract:
Effective education in radiotherapy plan quality review requires a robust, regularly updated set of examples and the flexibility to demonstrate multiple possible planning approaches and their consequences. However, the current clinic-based paradigm does not support these needs. To address this, we have developed 'Virtual Dosimetrist' models that can both generate training examples of suboptimal tr…
▽ More
Effective education in radiotherapy plan quality review requires a robust, regularly updated set of examples and the flexibility to demonstrate multiple possible planning approaches and their consequences. However, the current clinic-based paradigm does not support these needs. To address this, we have developed 'Virtual Dosimetrist' models that can both generate training examples of suboptimal treatment plans and then allow trainees to improve the plan quality through simple natural language prompts, as if communicating with a dosimetrist. The dose generation and modification process is accurate, rapid, and requires only modest resources. This work is the first to combine dose distribution prediction with natural language processing; providing a robust pipeline for both generating suboptimal training plans and allowing trainees to practice their critical plan review and improvement skills that addresses the challenges of the current clinic-based paradigm.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
When are Diffusion Priors Helpful in Sparse Reconstruction? A Study with Sparse-view CT
Authors:
Matt Y. Cheung,
Sophia Zorek,
Tucker J. Netherton,
Laurence E. Court,
Sadeer Al-Kindi,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Diffusion models demonstrate state-of-the-art performance on image generation, and are gaining traction for sparse medical image reconstruction tasks. However, compared to classical reconstruction algorithms relying on simple analytical priors, diffusion models have the dangerous property of producing realistic looking results \emph{even when incorrect}, particularly with few observations. We inve…
▽ More
Diffusion models demonstrate state-of-the-art performance on image generation, and are gaining traction for sparse medical image reconstruction tasks. However, compared to classical reconstruction algorithms relying on simple analytical priors, diffusion models have the dangerous property of producing realistic looking results \emph{even when incorrect}, particularly with few observations. We investigate the utility of diffusion models as priors for image reconstruction by varying the number of observations and comparing their performance to classical priors (sparse and Tikhonov regularization) using pixel-based, structural, and downstream metrics. We make comparisons on low-dose chest wall computed tomography (CT) for fat mass quantification. First, we find that classical priors are superior to diffusion priors when the number of projections is ``sufficient''. Second, we find that diffusion priors can capture a large amount of detail with very few observations, significantly outperforming classical priors. However, they fall short of capturing all details, even with many observations. Finally, we find that the performance of diffusion priors plateau after extremely few ($\approx$10-15) projections. Ultimately, our work highlights potential issues with diffusion-based sparse reconstruction and underscores the importance of further investigation, particularly in high-stakes clinical settings.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Regression Conformal Prediction under Bias
Authors:
Matt Y. Cheung,
Tucker J. Netherton,
Laurence E. Court,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Uncertainty quantification is crucial to account for the imperfect predictions of machine learning algorithms for high-impact applications. Conformal prediction (CP) is a powerful framework for uncertainty quantification that generates calibrated prediction intervals with valid coverage. In this work, we study how CP intervals are affected by bias - the systematic deviation of a prediction from gr…
▽ More
Uncertainty quantification is crucial to account for the imperfect predictions of machine learning algorithms for high-impact applications. Conformal prediction (CP) is a powerful framework for uncertainty quantification that generates calibrated prediction intervals with valid coverage. In this work, we study how CP intervals are affected by bias - the systematic deviation of a prediction from ground truth values - a phenomenon prevalent in many real-world applications. We investigate the influence of bias on interval lengths of two different types of adjustments -- symmetric adjustments, the conventional method where both sides of the interval are adjusted equally, and asymmetric adjustments, a more flexible method where the interval can be adjusted unequally in positive or negative directions. We present theoretical and empirical analyses characterizing how symmetric and asymmetric adjustments impact the "tightness" of CP intervals for regression tasks. Specifically for absolute residual and quantile-based non-conformity scores, we prove: 1) the upper bound of symmetrically adjusted interval lengths increases by $2|b|$ where $b$ is a globally applied scalar value representing bias, 2) asymmetrically adjusted interval lengths are not affected by bias, and 3) conditions when asymmetrically adjusted interval lengths are guaranteed to be smaller than symmetric ones. Our analyses suggest that even if predictions exhibit significant drift from ground truth values, asymmetrically adjusted intervals are still able to maintain the same tightness and validity of intervals as if the drift had never happened, while symmetric ones significantly inflate the lengths. We demonstrate our theoretical results with two real-world prediction tasks: sparse-view computed tomography (CT) reconstruction and time-series weather forecasting. Our work paves the way for more bias-robust machine learning systems.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays
Authors:
Yiran Sun,
Hana Baroudi,
Tucker Netherton,
Laurence Court,
Osama Mawlawi,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D obse…
▽ More
Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility. Our code is available at https://github.com/yransun/DIFR3CT.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation
Authors:
McKell Woodland,
Nihil Patel,
Austin Castelo,
Mais Al Taie,
Mohamed Eltaher,
Joshua P. Yung,
Tucker J. Netherton,
Tiffany L. Calderone,
Jessica I. Sanchez,
Darrel W. Cleere,
Ahmed Elsaiey,
Nakul Gupta,
David Victor,
Laura Beretta,
Ankit B. Patel,
Kristy K. Brock
Abstract:
Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work a…
▽ More
Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work applied the Mahalanobis distance (MD) post hoc to the bottleneck features of four Swin UNETR and nnU-net models that segmented the liver on T1-weighted magnetic resonance imaging and computed tomography. By reducing the dimensions of the bottleneck features with either principal component analysis or uniform manifold approximation and projection, images the models failed on were detected with high performance and minimal computational load. In addition, this work explored a non-parametric alternative to the MD, a k-th nearest neighbors distance (KNN). KNN drastically improved scalability and performance over MD when both were applied to raw and average-pooled bottleneck features.
△ Less
Submitted 2 October, 2024; v1 submitted 5 August, 2024;
originally announced August 2024.
-
MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework
Authors:
Adrian Celaya,
Evan Lim,
Rachel Glenn,
Brayden Mi,
Alex Balsells,
Dawid Schellingerhout,
Tucker Netherton,
Caroline Chung,
Beatrice Riviere,
David Fuentes
Abstract:
Medical imaging segmentation is a highly active area of research, with deep learning-based methods achieving state-of-the-art results in several benchmarks. However, the lack of standardized tools for training, testing, and evaluating new methods makes the comparison of methods difficult. To address this, we introduce the Medical Imaging Segmentation Toolkit (MIST), a simple, modular, and end-to-e…
▽ More
Medical imaging segmentation is a highly active area of research, with deep learning-based methods achieving state-of-the-art results in several benchmarks. However, the lack of standardized tools for training, testing, and evaluating new methods makes the comparison of methods difficult. To address this, we introduce the Medical Imaging Segmentation Toolkit (MIST), a simple, modular, and end-to-end medical imaging segmentation framework designed to facilitate consistent training, testing, and evaluation of deep learning-based medical imaging segmentation methods. MIST standardizes data analysis, preprocessing, and evaluation pipelines, accommodating multiple architectures and loss functions. This standardization ensures reproducible and fair comparisons across different methods. We detail MIST's data format requirements, pipelines, and auxiliary features and demonstrate its efficacy using the BraTS Adult Glioma Post-Treatment Challenge dataset. Our results highlight MIST's ability to produce accurate segmentation masks and its scalability across multiple GPUs, showcasing its potential as a powerful tool for future medical imaging research and development.
△ Less
Submitted 18 November, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Metric-Guided Conformal Bounds for Probabilistic Image Reconstruction
Authors:
Matt Y Cheung,
Tucker J Netherton,
Laurence E Court,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Modern deep learning reconstruction algorithms generate impressively realistic scans from sparse inputs, but can often produce significant inaccuracies. This makes it difficult to provide statistically guaranteed claims about the true state of a subject from scans reconstructed by these algorithms. In this study, we propose a framework for computing provably valid prediction bounds on claims deriv…
▽ More
Modern deep learning reconstruction algorithms generate impressively realistic scans from sparse inputs, but can often produce significant inaccuracies. This makes it difficult to provide statistically guaranteed claims about the true state of a subject from scans reconstructed by these algorithms. In this study, we propose a framework for computing provably valid prediction bounds on claims derived from probabilistic black-box image reconstruction algorithms. The key insights behind our framework are to represent reconstructed scans with a derived clinical metric of interest, and to calibrate bounds on the ground truth metric with conformal prediction (CP) using a prior calibration dataset. These bounds convey interpretable feedback about the subject's state, and can also be used to retrieve nearest-neighbor reconstructed scans for visual inspection. We demonstrate the utility of this framework on sparse-view computed tomography (CT) for fat mass quantification and radiotherapy planning tasks. Results show that our framework produces bounds with better semantical interpretation than conventional pixel-based bounding approaches. Furthermore, we can flag dangerous outlier reconstructions that look plausible but have statistically unlikely metric values.
△ Less
Submitted 3 March, 2025; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation
Authors:
McKell Woodland,
Nihil Patel,
Mais Al Taie,
Joshua P. Yung,
Tucker J. Netherton,
Ankit B. Patel,
Kristy K. Brock
Abstract:
Clinically deployed segmentation models are known to fail on data outside of their training distribution. As these models perform well on most cases, it is imperative to detect out-of-distribution (OOD) images at inference to protect against automation bias. This work applies the Mahalanobis distance post hoc to the bottleneck features of a Swin UNETR model that segments the liver on T1-weighted m…
▽ More
Clinically deployed segmentation models are known to fail on data outside of their training distribution. As these models perform well on most cases, it is imperative to detect out-of-distribution (OOD) images at inference to protect against automation bias. This work applies the Mahalanobis distance post hoc to the bottleneck features of a Swin UNETR model that segments the liver on T1-weighted magnetic resonance imaging. By reducing the dimensions of the bottleneck features with principal component analysis, OOD images were detected with high performance and minimal computational load.
△ Less
Submitted 19 October, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
CT Reconstruction from Few Planar X-rays with Application towards Low-resource Radiotherapy
Authors:
Yiran Sun,
Tucker Netherton,
Laurence Court,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
CT scans are the standard-of-care for many clinical ailments, and are needed for treatments like external beam radiotherapy. Unfortunately, CT scanners are rare in low and mid-resource settings due to their costs. Planar X-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work, we propose a method to generate CT vo…
▽ More
CT scans are the standard-of-care for many clinical ailments, and are needed for treatments like external beam radiotherapy. Unfortunately, CT scanners are rare in low and mid-resource settings due to their costs. Planar X-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work, we propose a method to generate CT volumes from few (<5) planar X-ray observations using a prior data distribution, and perform the first evaluation of such a reconstruction algorithm for a clinical application: radiotherapy planning. We propose a deep generative model, building on advances in neural implicit representations to synthesize volumetric CT scans from few input planar X-ray images at different angles. To focus the generation task on clinically-relevant features, our model can also leverage anatomical guidance during training (via segmentation masks). We generated 2-field opposed, palliative radiotherapy plans on thoracic CTs reconstructed by our method, and found that isocenter radiation dose on reconstructed scans have <1% error with respect to the dose calculated on clinically acquired CTs using <=4 X-ray views. In addition, our method is better than recent sparse CT reconstruction baselines in terms of standard pixel and structure-level metrics (PSNR, SSIM, Dice score) on the public LIDC lung CT dataset. Code is available at: https://github.com/wanderinrain/Xray2CT.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
OpenKBP-Opt: An international and reproducible evaluation of 76 knowledge-based planning pipelines
Authors:
Aaron Babier,
Rafid Mahmood,
Binghao Zhang,
Victor G. L. Alves,
Ana Maria Barragán-Montero,
Joel Beaudry,
Carlos E. Cardenas,
Yankui Chang,
Zijie Chen,
Jaehee Chun,
Kelly Diaz,
Harold David Eraso,
Erik Faustmann,
Sibaji Gaj,
Skylar Gay,
Mary Gronberg,
Bingqi Guo,
Junjun He,
Gerd Heilemann,
Sanchit Hira,
Yuliang Huang,
Fuxin Ji,
Dashan Jiang,
Jean Carlo Jimenez Giraldo,
Hoyeon Lee
, et al. (34 additional authors not shown)
Abstract:
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models that were developed by different research groups during the OpenKBP Grand Challenge. The dose predictions were input to four optimization mode…
▽ More
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models that were developed by different research groups during the OpenKBP Grand Challenge. The dose predictions were input to four optimization models to form 76 unique KBP pipelines that generated 7600 plans. The predictions and plans were compared to the reference plans via: dose score, which is the average mean absolute voxel-by-voxel difference in dose a model achieved; the deviation in dose-volume histogram (DVH) criterion; and the frequency of clinical planning criteria satisfaction. We also performed a theoretical investigation to justify our dose mimicking models. The range in rank order correlation of the dose score between predictions and their KBP pipelines was 0.50 to 0.62, which indicates that the quality of the predictions is generally positively correlated with the quality of the plans. Additionally, compared to the input predictions, the KBP-generated plans performed significantly better (P<0.05; one-sided Wilcoxon test) on 18 of 23 DVH criteria. Similarly, each optimization model generated plans that satisfied a higher percentage of criteria than the reference plans. Lastly, our theoretical investigation demonstrated that the dose mimicking models generated plans that are also optimal for a conventional planning model. This was the largest international effort to date for evaluating the combination of KBP prediction and optimization models. In the interest of reproducibility, our data and code is freely available at https://github.com/ababier/open-kbp-opt.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images
Authors:
Anjany Sekuboyina,
Malek E. Husseini,
Amirhossein Bayat,
Maximilian Löffler,
Hans Liebl,
Hongwei Li,
Giles Tetteh,
Jan Kukačka,
Christian Payer,
Darko Štern,
Martin Urschler,
Maodong Chen,
Dalong Cheng,
Nikolas Lessmann,
Yujin Hu,
Tianfu Wang,
Dong Yang,
Daguang Xu,
Felix Ambellan,
Tamaz Amiranashvili,
Moritz Ehlke,
Hans Lamecker,
Sebastian Lehnert,
Marilia Lirio,
Nicolás Pérez de Olaguer
, et al. (44 additional authors not shown)
Abstract:
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to co…
▽ More
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse.
△ Less
Submitted 5 April, 2022; v1 submitted 24 January, 2020;
originally announced January 2020.