-
When are Diffusion Priors Helpful in Sparse Reconstruction? A Study with Sparse-view CT
Authors:
Matt Y. Cheung,
Sophia Zorek,
Tucker J. Netherton,
Laurence E. Court,
Sadeer Al-Kindi,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Diffusion models demonstrate state-of-the-art performance on image generation, and are gaining traction for sparse medical image reconstruction tasks. However, compared to classical reconstruction algorithms relying on simple analytical priors, diffusion models have the dangerous property of producing realistic looking results \emph{even when incorrect}, particularly with few observations. We inve…
▽ More
Diffusion models demonstrate state-of-the-art performance on image generation, and are gaining traction for sparse medical image reconstruction tasks. However, compared to classical reconstruction algorithms relying on simple analytical priors, diffusion models have the dangerous property of producing realistic looking results \emph{even when incorrect}, particularly with few observations. We investigate the utility of diffusion models as priors for image reconstruction by varying the number of observations and comparing their performance to classical priors (sparse and Tikhonov regularization) using pixel-based, structural, and downstream metrics. We make comparisons on low-dose chest wall computed tomography (CT) for fat mass quantification. First, we find that classical priors are superior to diffusion priors when the number of projections is ``sufficient''. Second, we find that diffusion priors can capture a large amount of detail with very few observations, significantly outperforming classical priors. However, they fall short of capturing all details, even with many observations. Finally, we find that the performance of diffusion priors plateau after extremely few ($\approx$10-15) projections. Ultimately, our work highlights potential issues with diffusion-based sparse reconstruction and underscores the importance of further investigation, particularly in high-stakes clinical settings.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Regression Conformal Prediction under Bias
Authors:
Matt Y. Cheung,
Tucker J. Netherton,
Laurence E. Court,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Uncertainty quantification is crucial to account for the imperfect predictions of machine learning algorithms for high-impact applications. Conformal prediction (CP) is a powerful framework for uncertainty quantification that generates calibrated prediction intervals with valid coverage. In this work, we study how CP intervals are affected by bias - the systematic deviation of a prediction from gr…
▽ More
Uncertainty quantification is crucial to account for the imperfect predictions of machine learning algorithms for high-impact applications. Conformal prediction (CP) is a powerful framework for uncertainty quantification that generates calibrated prediction intervals with valid coverage. In this work, we study how CP intervals are affected by bias - the systematic deviation of a prediction from ground truth values - a phenomenon prevalent in many real-world applications. We investigate the influence of bias on interval lengths of two different types of adjustments -- symmetric adjustments, the conventional method where both sides of the interval are adjusted equally, and asymmetric adjustments, a more flexible method where the interval can be adjusted unequally in positive or negative directions. We present theoretical and empirical analyses characterizing how symmetric and asymmetric adjustments impact the "tightness" of CP intervals for regression tasks. Specifically for absolute residual and quantile-based non-conformity scores, we prove: 1) the upper bound of symmetrically adjusted interval lengths increases by $2|b|$ where $b$ is a globally applied scalar value representing bias, 2) asymmetrically adjusted interval lengths are not affected by bias, and 3) conditions when asymmetrically adjusted interval lengths are guaranteed to be smaller than symmetric ones. Our analyses suggest that even if predictions exhibit significant drift from ground truth values, asymmetrically adjusted intervals are still able to maintain the same tightness and validity of intervals as if the drift had never happened, while symmetric ones significantly inflate the lengths. We demonstrate our theoretical results with two real-world prediction tasks: sparse-view computed tomography (CT) reconstruction and time-series weather forecasting. Our work paves the way for more bias-robust machine learning systems.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Metric-Guided Conformal Bounds for Probabilistic Image Reconstruction
Authors:
Matt Y Cheung,
Tucker J Netherton,
Laurence E Court,
Ashok Veeraraghavan,
Guha Balakrishnan
Abstract:
Modern deep learning reconstruction algorithms generate impressively realistic scans from sparse inputs, but can often produce significant inaccuracies. This makes it difficult to provide statistically guaranteed claims about the true state of a subject from scans reconstructed by these algorithms. In this study, we propose a framework for computing provably valid prediction bounds on claims deriv…
▽ More
Modern deep learning reconstruction algorithms generate impressively realistic scans from sparse inputs, but can often produce significant inaccuracies. This makes it difficult to provide statistically guaranteed claims about the true state of a subject from scans reconstructed by these algorithms. In this study, we propose a framework for computing provably valid prediction bounds on claims derived from probabilistic black-box image reconstruction algorithms. The key insights behind our framework are to represent reconstructed scans with a derived clinical metric of interest, and to calibrate bounds on the ground truth metric with conformal prediction (CP) using a prior calibration dataset. These bounds convey interpretable feedback about the subject's state, and can also be used to retrieve nearest-neighbor reconstructed scans for visual inspection. We demonstrate the utility of this framework on sparse-view computed tomography (CT) for fat mass quantification and radiotherapy planning tasks. Results show that our framework produces bounds with better semantical interpretation than conventional pixel-based bounding approaches. Furthermore, we can flag dangerous outlier reconstructions that look plausible but have statistically unlikely metric values.
△ Less
Submitted 3 March, 2025; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Evolving Horizons in Radiotherapy Auto-Contouring: Distilling Insights, Embracing Data-Centric Frameworks, and Moving Beyond Geometric Quantification
Authors:
Kareem A. Wahid,
Carlos E. Cardenas,
Barbara Marquez,
Tucker J. Netherton,
Benjamin H. Kann,
Laurence E. Court,
Renjie He,
Mohamed A. Naser,
Amy C. Moreno,
Clifton D. Fuller,
David Fuentes
Abstract:
Deep learning has significantly advanced the potential for automated contouring in radiotherapy planning. In this manuscript, guided by contemporary literature, we underscore three key insights: (1) High-quality training data is essential for auto-contouring algorithms; (2) Auto-contouring models demonstrate commendable performance even with limited medical image data; (3) The quantitative perform…
▽ More
Deep learning has significantly advanced the potential for automated contouring in radiotherapy planning. In this manuscript, guided by contemporary literature, we underscore three key insights: (1) High-quality training data is essential for auto-contouring algorithms; (2) Auto-contouring models demonstrate commendable performance even with limited medical image data; (3) The quantitative performance of auto-contouring is reaching a plateau. Given these insights, we emphasize the need for the radiotherapy research community to embrace data-centric approaches to further foster clinical adoption of auto-contouring technologies.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Deep Learning-Based Dose Prediction for Automated, Individualized Quality Assurance of Head and Neck Radiation Therapy Plans
Authors:
Mary P. Gronberg,
Beth M. Beadle,
Adam S. Garden,
Heath Skinner,
Skylar Gay,
Tucker Netherton,
Wenhua Cao,
Carlos E. Cardenas,
Christine Chung,
David Fuentes,
Clifton D. Fuller,
Rebecca M. Howell,
Anuja Jhingran,
Tze Yee Lim,
Barbara Marquez,
Raymond Mumme,
Adenike M. Olanrewaju,
Christine B. Peterson,
Ivan Vazquez,
Thomas J. Whitaker,
Zachary Wooten,
Ming Yang,
Laurence E. Court
Abstract:
Purpose: This study aimed to use deep learning-based dose prediction to assess head and neck (HN) plan quality and identify suboptimal plans.
Methods: A total of 245 VMAT HN plans were created using RapidPlan knowledge-based planning (KBP). A subset of 112 high-quality plans was selected under the supervision of an HN radiation oncologist. We trained a 3D Dense Dilated U-Net architecture to pred…
▽ More
Purpose: This study aimed to use deep learning-based dose prediction to assess head and neck (HN) plan quality and identify suboptimal plans.
Methods: A total of 245 VMAT HN plans were created using RapidPlan knowledge-based planning (KBP). A subset of 112 high-quality plans was selected under the supervision of an HN radiation oncologist. We trained a 3D Dense Dilated U-Net architecture to predict 3-dimensional dose distributions using 3-fold cross-validation on 90 plans. Model inputs included CT images, target prescriptions, and contours for targets and organs at risk (OARs). The model's performance was assessed on the remaining 22 test plans. We then tested the application of the dose prediction model for automated review of plan quality. Dose distributions were predicted on 14 clinical plans. The predicted versus clinical OAR dose metrics were compared to flag OARs with suboptimal normal tissue sparing using a 2 Gy dose difference or 3% dose-volume threshold. OAR flags were compared to manual flags by 3 HN radiation oncologists.
Results: The predicted dose distributions were of comparable quality to the KBP plans. The differences between the predicted and KBP-planned D1%, D95%, and D99% across the targets were within -2.53%(SD=1.34%), -0.42%(SD=1.27%), and -0.12%(SD=1.97%), respectively, and the OAR mean and maximum doses were within -0.33Gy(SD=1.40Gy) and -0.96Gy(SD=2.08Gy). For the plan quality assessment study, radiation oncologists flagged 47 OARs for possible plan improvement. There was high interphysician variability; 83% of physician-flagged OARs were flagged by only one of 3 physicians. The comparative dose prediction model flagged 63 OARs, including 30 of 47 physician-flagged OARs.
Conclusion: Deep learning can predict high-quality dose distributions, which can be used as comparative dose distributions for automated, individualized assessment of HN plan quality.
△ Less
Submitted 25 April, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images
Authors:
Anjany Sekuboyina,
Malek E. Husseini,
Amirhossein Bayat,
Maximilian Löffler,
Hans Liebl,
Hongwei Li,
Giles Tetteh,
Jan Kukačka,
Christian Payer,
Darko Štern,
Martin Urschler,
Maodong Chen,
Dalong Cheng,
Nikolas Lessmann,
Yujin Hu,
Tianfu Wang,
Dong Yang,
Daguang Xu,
Felix Ambellan,
Tamaz Amiranashvili,
Moritz Ehlke,
Hans Lamecker,
Sebastian Lehnert,
Marilia Lirio,
Nicolás Pérez de Olaguer
, et al. (44 additional authors not shown)
Abstract:
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to co…
▽ More
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse.
△ Less
Submitted 5 April, 2022; v1 submitted 24 January, 2020;
originally announced January 2020.