AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality
Authors:
Biling Wang,
Austen Maniscalco,
Ti Bai,
Siqiu Wang,
Michael Dohopolski,
Mu-Han Lin,
Chenyang Shen,
Dan Nguyen,
Junzhou Huang,
Steve Jiang,
Xinlei Wang
Abstract:
Purpose: This study presents a Deep Learning (DL)-based quality assessment (QA) approach for evaluating auto-generated contours (auto-contours) in radiotherapy, with emphasis on Online Adaptive Radiotherapy (OART). Leveraging Bayesian Ordinal Classification (BOC) and calibrated uncertainty thresholds, the method enables confident QA predictions without relying on ground truth contours or extensive…
▽ More
Purpose: This study presents a Deep Learning (DL)-based quality assessment (QA) approach for evaluating auto-generated contours (auto-contours) in radiotherapy, with emphasis on Online Adaptive Radiotherapy (OART). Leveraging Bayesian Ordinal Classification (BOC) and calibrated uncertainty thresholds, the method enables confident QA predictions without relying on ground truth contours or extensive manual labeling. Methods: We developed a BOC model to classify auto-contour quality and quantify prediction uncertainty. A calibration step was used to optimize uncertainty thresholds that meet clinical accuracy needs. The method was validated under three data scenarios: no manual labels, limited labels, and extensive labels. For rectum contours in prostate cancer, we applied geometric surrogate labels when manual labels were absent, transfer learning when limited, and direct supervision when ample labels were available. Results: The BOC model delivered robust performance across all scenarios. Fine-tuning with just 30 manual labels and calibrating with 34 subjects yielded over 90% accuracy on test data. Using the calibrated threshold, over 93% of the auto-contours' qualities were accurately predicted in over 98% of cases, reducing unnecessary manual reviews and highlighting cases needing correction. Conclusion: The proposed QA model enhances contouring efficiency in OART by reducing manual workload and enabling fast, informed clinical decisions. Through uncertainty quantification, it ensures safer, more reliable radiotherapy workflows.
△ Less
Submitted 11 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
Performance Deterioration of Deep Learning Models after Clinical Deployment: A Case Study with Auto-segmentation for Definitive Prostate Cancer Radiotherapy
Authors:
Biling Wang,
Michael Dohopolski,
Ti Bai,
Junjie Wu,
Raquibul Hannan,
Neil Desai,
Aurelie Garant,
Daniel Yang,
Dan Nguyen,
Mu-Han Lin,
Robert Timmerman,
Xinlei Wang,
Steve Jiang
Abstract:
We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We…
▽ More
We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We trained a UNet based segmentation model on data from 2006 to 2011 and tested it on data from 2012 to 2022 to simulate real world clinical deployment. We measured the model performance using the Dice similarity coefficient (DSC), visualized the trends in contour quality using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test to analyze the differences in DSC distributions across distinct periods, and multiple linear regression to investigate the impact of various clinical factors. The model exhibited peak performance in the initial phase (from 2012 to 2014) for segmenting the prostate, rectum, and bladder. However, we observed a notable decline in performance for the prostate and rectum after 2015, while bladder contour quality remained stable. Key factors that impacted the prostate contour quality included physician contouring styles, the use of various hydrogel spacer, CT scan slice thickness, MRI-guided contouring, and using intravenous (IV) contrast. Rectum contour quality was influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The bladder contour quality was primarily affected by using IV contrast. This study highlights the challenges in maintaining AI model performance consistency in a dynamic clinical setting. It underscores the need for continuous monitoring and updating of AI models to ensure their ongoing effectiveness and relevance in patient care.
△ Less
Submitted 16 November, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.