-
Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems
Authors:
Michael E. Garcia-Alcoser,
Mobina GhojoghNejad,
Fakrul Islam Tushar,
David Kim,
Kyle J. Lafata,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Purpose: This study aims to evaluate the effectiveness of large language models (LLMs) in automating disease annotation of CT radiology reports. We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis (CAP) CT reports.
Materials and Methods: This retrospective study analyzed 40,833 CT reports from 29,540 pa…
▽ More
Purpose: This study aims to evaluate the effectiveness of large language models (LLMs) in automating disease annotation of CT radiology reports. We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis (CAP) CT reports.
Materials and Methods: This retrospective study analyzed 40,833 CT reports from 29,540 patients, with 1,789 CAP reports manually annotated across three organ systems. External validation was conducted using the CT-RATE dataset. Three open-weight LLMs were tested with zero-shot prompting. Performance was evaluated using Cohen's Kappa and micro/macro-averaged F1 scores.
Results: In 12,197 Duke CAP reports from 8,854 patients, Llama-3.1 8B and Gemma-3 27B showed the highest agreement ($κ$ median: 0.87). On the manually annotated set, Gemma-3 27B achieved the top macro-F1 (0.82), followed by Llama-3.1 8B (0.79), while the RBA scored lowest (0.64). On the CT-RATE dataset (lungs/pleura only), Llama-3.1 8B performed best (0.91), with Gemma-3 27B close behind (0.89). Performance differences were mainly due to differing labeling practices, especially for lung atelectasis.
Conclusion: Lightweight LLMs outperform rule-based methods for CT report annotation and generalize across organ systems with zero-shot prompting. However, binary labels alone cannot capture the full nuance of report language. LLMs can provide a flexible, efficient solution aligned with clinical judgment and user needs.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
Authors:
Junn Yong Loo,
Michelle Adeline,
Julia Kaiwen Lau,
Fang Yu Leong,
Hwa Hui Tew,
Arghya Pal,
Vishnu Monn Baskaran,
Chee-Ming Ting,
Raphaël C. -W. Phan
Abstract:
Energy-based models (EBMs) are a powerful class of probabilistic generative models due to their flexibility and interpretability. However, relationships between potential flows and explicit EBMs remain underexplored, while contrastive divergence training via implicit Markov chain Monte Carlo (MCMC) sampling is often unstable and expensive in high-dimensional settings. In this paper, we propose Var…
▽ More
Energy-based models (EBMs) are a powerful class of probabilistic generative models due to their flexibility and interpretability. However, relationships between potential flows and explicit EBMs remain underexplored, while contrastive divergence training via implicit Markov chain Monte Carlo (MCMC) sampling is often unstable and expensive in high-dimensional settings. In this paper, we propose Variational Potential Flow Bayes (VPFB), a new energy-based generative framework that eliminates the need for implicit MCMC sampling and does not rely on auxiliary networks or cooperative training. VPFB learns an energy-parameterized potential flow by constructing a flow-driven density homotopy that is matched to the data distribution through a variational loss minimizing the Kullback-Leibler divergence between the flow-driven and marginal homotopies. This principled formulation enables robust and efficient generative modeling while preserving the interpretability of EBMs. Experimental results on image generation, interpolation, out-of-distribution detection, and compositional generation confirm the effectiveness of VPFB, showing that our method performs competitively with existing approaches in terms of sample quality and versatility across diverse generative modeling tasks.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
SYN-LUNGS: Towards Simulating Lung Nodules with Anatomy-Informed Digital Twins for AI Training
Authors:
Fakrul Islam Tushar,
Lavsen Dahal,
Cindy McCabe,
Fong Chi Ho,
Paul Segars,
Ehsan Abadi,
Kyle J. Lafata,
Ehsan Samei,
Joseph Y. Lo
Abstract:
AI models for lung cancer screening are limited by data scarcity, impacting generalizability and clinical applicability. Generative models address this issue but are constrained by training data variability. We introduce SYN-LUNGS, a framework for generating high-quality 3D CT images with detailed annotations. SYN-LUNGS integrates XCAT3 phantoms for digital twin generation, X-Lesions for nodule si…
▽ More
AI models for lung cancer screening are limited by data scarcity, impacting generalizability and clinical applicability. Generative models address this issue but are constrained by training data variability. We introduce SYN-LUNGS, a framework for generating high-quality 3D CT images with detailed annotations. SYN-LUNGS integrates XCAT3 phantoms for digital twin generation, X-Lesions for nodule simulation (varying size, location, and appearance), and DukeSim for CT image formation with vendor and parameter variability. The dataset includes 3,072 nodule images from 1,044 simulated CT scans, with 512 lesions and 174 digital twins. Models trained on clinical + simulated data outperform clinical only models, achieving 10% improvement in detection, 2-9% in segmentation and classification, and enhanced synthesis. By incorporating anatomy-informed simulations, SYN-LUNGS provides a scalable approach for AI model development, particularly in rare disease representation and improving model reliability.
△ Less
Submitted 6 May, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
ST-HCSS: Deep Spatio-Temporal Hypergraph Convolutional Neural Network for Soft Sensing
Authors:
Hwa Hui Tew,
Fan Ding,
Gaoxuan Li,
Junn Yong Loo,
Chee-Ming Ting,
Ze Yang Ding,
Chee Pin Tan
Abstract:
Higher-order sensor networks are more accurate in characterizing the nonlinear dynamics of sensory time-series data in modern industrial settings by allowing multi-node connections beyond simple pairwise graph edges. In light of this, we propose a deep spatio-temporal hypergraph convolutional neural network for soft sensing (ST-HCSS). In particular, our proposed framework is able to construct and…
▽ More
Higher-order sensor networks are more accurate in characterizing the nonlinear dynamics of sensory time-series data in modern industrial settings by allowing multi-node connections beyond simple pairwise graph edges. In light of this, we propose a deep spatio-temporal hypergraph convolutional neural network for soft sensing (ST-HCSS). In particular, our proposed framework is able to construct and leverage a higher-order graph (hypergraph) to model the complex multi-interactions between sensor nodes in the absence of prior structural knowledge. To capture rich spatio-temporal relationships underlying sensor data, our proposed ST-HCSS incorporates stacked gated temporal and hypergraph convolution layers to effectively aggregate and update hypergraph information across time and nodes. Our results validate the superiority of ST-HCSS compared to existing state-of-the-art soft sensors, and demonstrates that the learned hypergraph feature representations aligns well with the sensor data correlations. The code is available at https://github.com/htew0001/ST-HCSS.git
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
KANS: Knowledge Discovery Graph Attention Network for Soft Sensing in Multivariate Industrial Processes
Authors:
Hwa Hui Tew,
Gaoxuan Li,
Fan Ding,
Xuewen Luo,
Junn Yong Loo,
Chee-Ming Ting,
Ze Yang Ding,
Chee Pin Tan
Abstract:
Soft sensing of hard-to-measure variables is often crucial in industrial processes. Current practices rely heavily on conventional modeling techniques that show success in improving accuracy. However, they overlook the non-linear nature, dynamics characteristics, and non-Euclidean dependencies between complex process variables. To tackle these challenges, we present a framework known as a Knowledg…
▽ More
Soft sensing of hard-to-measure variables is often crucial in industrial processes. Current practices rely heavily on conventional modeling techniques that show success in improving accuracy. However, they overlook the non-linear nature, dynamics characteristics, and non-Euclidean dependencies between complex process variables. To tackle these challenges, we present a framework known as a Knowledge discovery graph Attention Network for effective Soft sensing (KANS). Unlike the existing deep learning soft sensor models, KANS can discover the intrinsic correlations and irregular relationships between the multivariate industrial processes without a predefined topology. First, an unsupervised graph structure learning method is introduced, incorporating the cosine similarity between different sensor embedding to capture the correlations between sensors. Next, we present a graph attention-based representation learning that can compute the multivariate data parallelly to enhance the model in learning complex sensor nodes and edges. To fully explore KANS, knowledge discovery analysis has also been conducted to demonstrate the interpretability of the model. Experimental results demonstrate that KANS significantly outperforms all the baselines and state-of-the-art methods in soft sensing performance. Furthermore, the analysis shows that KANS can find sensors closely related to different process variables without domain knowledge, significantly improving soft sensing accuracy.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Energy-efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles
Authors:
Fan Ding,
Xuewen Luo,
Gaoxuan Li,
Hwa Hui Tew,
Junn Yong Loo,
Chor Wai Tong,
A. S. M Bakibillah,
Ziyuan Zhao,
Zhiyu Tao
Abstract:
To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additio…
▽ More
To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware. It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74\% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Cross-Domain Transfer Learning using Attention Latent Features for Multi-Agent Trajectory Prediction
Authors:
Jia Quan Loh,
Xuewen Luo,
Fan Ding,
Hwa Hui Tew,
Junn Yong Loo,
Ze Yang Ding,
Susilawati Susilawati,
Chee Pin Tan
Abstract:
With the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effec…
▽ More
With the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effectively to unseen networks. To address this, we proposed a novel spatial-temporal trajectory prediction framework that performs cross-domain adaption on the attention representation of a Transformer-based model. A graph convolutional network is also integrated to construct dynamic graph feature embeddings that accurately model the complex spatial-temporal interactions between the multi-agent vehicles across multiple traffic domains. The proposed framework is validated on two case studies involving the cross-city and cross-period settings. Experimental results show that our proposed framework achieves superior trajectory prediction and domain adaptation performances over the state-of-the-art models.
△ Less
Submitted 12 November, 2024; v1 submitted 9 November, 2024;
originally announced November 2024.
-
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling
Authors:
Junn Yong Loo,
Michelle Adeline,
Arghya Pal,
Vishnu Monn Baskaran,
Chee-Ming Ting,
Raphael C. -W. Phan
Abstract:
Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MC…
▽ More
Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MCMC sampling and does not rely on complementary latent models or cooperative training. The VAPO framework aims to learn a potential energy function whose gradient (flow) guides the prior samples, so that their density evolution closely follows an approximate data likelihood homotopy. An energy loss function is then formulated to minimize the Kullback-Leibler divergence between density evolution of the flow-driven prior and the data likelihood homotopy. Images can be generated after training the potential energy, by initializing the samples from Gaussian prior and solving the ODE governing the potential flow on a fixed time interval using generic ODE solvers. Experiment results show that the proposed VAPO framework is capable of generating realistic images on various image datasets. In particular, our proposed framework achieves competitive FID scores for unconditional image generation on the CIFAR-10 and CelebA datasets.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection
Authors:
Michelle Adeline,
Junn Yong Loo,
Vishnu Monn Baskaran
Abstract:
Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs…
▽ More
Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-$k$ anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4\% mAP and 55.0\% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline where anchor proposals are modelled as learnable embeddings. Code is available at https://github.com/NaomiEX/MDHA.
△ Less
Submitted 9 November, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
Authors:
Julia Yang,
Alina Jade Barnett,
Jon Donnelly,
Satvik Kishore,
Jerry Fang,
Fides Regina Schwartz,
Chaofan Chen,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t…
▽ More
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse- to fine-grained prototypes.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
XCAT-3.0: A Comprehensive Library of Personalized Digital Twins Derived from CT Scans
Authors:
Lavsen Dahal,
Mobina Ghojoghnejad,
Dhrubajyoti Ghosh,
Yubraj Bhandari,
David Kim,
Fong Chi Ho,
Fakrul Islam Tushar,
Sheng Luoa,
Kyle J. Lafata,
Ehsan Abadi,
Ehsan Samei,
Joseph Y. Lo,
W. Paul Segars
Abstract:
Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VITs. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hamper…
▽ More
Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VITs. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers accurate assessment of imaging technologies across different patient groups. Traditionally, the more realistic computational phantoms were created by manual segmentation, which is a laborious and time-consuming task, impeding the expansion of phantom libraries. This study presents a framework for creating realistic computational phantoms using a suite of automatic segmentation models and performing three forms of automated quality control on the segmented organ masks. The result is the release of over 2500 new computational phantoms, so-named XCAT3.0 after the ubiquitous XCAT computational construct. This new formation embodies 140 structures and represents a comprehensive approach to detailed anatomical modeling. The developed computational phantoms are formatted in both voxelized and surface mesh formats. The framework is combined with an in-house CT scanner simulator to produce realistic CT images. The framework has the potential to advance virtual imaging trials, facilitating comprehensive and reliable evaluations of medical imaging technologies. Phantoms may be requested at https://cvit.duke.edu/resources/. Code, model weights, and sample CT images are available at https://xcat-3.github.io/.
△ Less
Submitted 9 September, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets
Authors:
Fakrul Islam Tushar,
Avivah Wang,
Lavsen Dahal,
Michael R. Harowicz,
Kyle J. Lafata,
Tina D. Tailor,
Joseph Y. Lo
Abstract:
Lung cancer remains the leading cause of cancer-related mortality worldwide, and early detection through low-dose computed tomography (LDCT) has shown significant promise in reducing death rates. With the growing integration of artificial intelligence (AI) into medical imaging, the development and evaluation of robust AI models require access to large, well-annotated datasets. In this study, we in…
▽ More
Lung cancer remains the leading cause of cancer-related mortality worldwide, and early detection through low-dose computed tomography (LDCT) has shown significant promise in reducing death rates. With the growing integration of artificial intelligence (AI) into medical imaging, the development and evaluation of robust AI models require access to large, well-annotated datasets. In this study, we introduce the utility of Duke Lung Cancer Screening (DLCS) Dataset, the largest open-access LDCT dataset with over 2,000 scans and 3,000 expert-verified nodules. We benchmark deep learning models for both 3D nodule detection and lung cancer classification across internal and external datasets including LUNA16, LUNA25, and NLST-3D+. For detection, we develop two MONAI-based RetinaNet models (DLCSDmD and LUNA16-mD), evaluated using the Competition Performance Metric (CPM). For classification, we compare five models, including state-of-the-art pretrained models (Models Genesis, Med3D), a selfsupervised foundation model (FMCB), a randomly initialized ResNet50, and proposed a novel Strategic Warm-Start++ (SWS++) model. SWS++ uses curated candidate patches to pretrain a classification backbone within the same detection pipeline, enabling task-relevant feature learning. Our models demonstrated strong generalizability, with SWS++ achieving comparable or superior performance to existing foundational models across multiple datasets (AUC: 0.71 to 0.90). All code, models, and data are publicly released to promote reproducibility and collaboration. This work establishes a standardized benchmarking resource for lung cancer AI research, supporting future efforts in model development, validation, and clinical translation.
△ Less
Submitted 23 April, 2025; v1 submitted 7 May, 2024;
originally announced May 2024.
-
What limits performance of weakly supervised deep learning for chest CT classification?
Authors:
Fakrul Islam Tushar,
Vincent M. D'Anniballe,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels. However, little is known about the limitations of such weakly supervised learning and the effect of these constraints on disease classification performance. In this paper, we test the effects of such weak supervision by examining model tolerance for th…
▽ More
Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels. However, little is known about the limitations of such weakly supervised learning and the effect of these constraints on disease classification performance. In this paper, we test the effects of such weak supervision by examining model tolerance for three conditions. First, we examined model tolerance for noisy data by incrementally increasing error in the labels within the training data. Second, we assessed the impact of dataset size by varying the amount of training data. Third, we compared performance differences between binary and multi-label classification. Results demonstrated that the model could endure up to 10% added label error before experiencing a decline in disease classification performance. Disease classification performance steadily rose as the amount of training data was increased for all disease classes, before experiencing a plateau in performance at 75% of training data. Last, the binary model outperformed the multilabel model in every disease category. However, such interpretations may be misleading, as the binary model was heavily influenced by co-occurring diseases and may not have learned the specific features of the disease in the image. In conclusion, this study may help the medical imaging community understand the benefits and risks of weak supervision with noisy labels. Such studies demonstrate the need to build diverse, large-scale datasets and to develop explainable and responsible AI.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation
Authors:
Kaouther Mouheb,
Mobina Ghojogh Nejad,
Lavsen Dahal,
Ehsan Samei,
Kyle J. Lafata,
W. Paul Segars,
Joseph Y. Lo
Abstract:
Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent…
▽ More
Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.
△ Less
Submitted 20 May, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging
Authors:
Fakrul Islam Tushar,
Lavsen Dahal,
Saman Sotoudeh-Paima,
Ehsan Abadi,
W. Paul Segars,
Ehsan Samei,
Joseph Y. Lo
Abstract:
The credibility of Artificial Intelligence (AI) models in medical imaging, particularly during the COVID-19 pandemic, has been challenged by reproducibility issues and obscured clinical insights. To address these concerns, we propose a Virtual Imaging Trials (VIT) framework, utilizing both clinical and simulated datasets to evaluate AI systems. This study focuses on using convolutional neural netw…
▽ More
The credibility of Artificial Intelligence (AI) models in medical imaging, particularly during the COVID-19 pandemic, has been challenged by reproducibility issues and obscured clinical insights. To address these concerns, we propose a Virtual Imaging Trials (VIT) framework, utilizing both clinical and simulated datasets to evaluate AI systems. This study focuses on using convolutional neural networks (CNNs) for COVID-19 diagnosis using computed tomography (CT) and chest radiography (CXR). We developed and tested multiple AI models, 3D ResNet-like and 2D EfficientNetv2 architectures, across diverse datasets. Our evaluation metrics included the area under the curve (AUC). Statistical analyses, such as the DeLong method for AUC confidence intervals, were employed to assess performance differences. Our findings demonstrate that VIT provides a robust platform for objective assessment, revealing significant influences of dataset characteristics, patient factors, and imaging physics on AI efficacy. Notably, models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR. Internal testing yielded higher AUC values (0.77 to 0.85 for CT and 0.77 to 1.0 for CXR), highlighting a substantial drop in performance during external validation, which underscores the importance of diverse and comprehensive training and testing data. This approach enhances model transparency and reliability, offering nuanced insights into the factors driving AI performance and bridging the gap between experimental and clinical settings. The study underscores the potential of VIT to improve the reproducibility and clinical relevance of AI systems in medical imaging.
△ Less
Submitted 27 October, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Sigma-point Kalman Filter with Nonlinear Unknown Input Estimation via Optimization and Data-driven Approach for Dynamic Systems
Authors:
Junn Yong Loo,
Ze Yang Ding,
Vishnu Monn Baskaran,
Surya Girinatha Nurzaman,
Chee Pin Tan
Abstract:
Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a ge…
▽ More
Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a general nonlinear UI estimator that can be implemented via nonlinear optimization and data-driven approaches. The nonlinear UI estimator uses the posterior state estimate which is less susceptible to state prediction error. In addition, we introduce a joint sigma-point transformation scheme to incorporate both the state and UI uncertainties in the estimation of SPKF-nUI. An in-depth stochastic stability analysis proves that the proposed SPKF-nUI yields exponentially converging estimation error bounds under reasonable assumptions. Finally, two case studies are carried out on a simulation-based rigid robot and a physical soft robot, i.e., robots made of soft materials with complex dynamics to validate effectiveness of the proposed filter on nonlinear dynamic systems. Our results demonstrate that the proposed SPKF-nUI achieves the lowest state and UI estimation errors when compared to the existing nonlinear state-UI filters.
△ Less
Submitted 9 November, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Unsupervised Cross-Domain Soft Sensor Modelling via Deep Physics-Inspired Particle Flow Bayes
Authors:
Junn Yong Loo,
Ze Yang Ding,
Surya G. Nurzaman,
Chee-Ming Ting,
Vishnu Monn Baskaran,
Chee Pin Tan
Abstract:
Data-driven soft sensors are essential for achieving accurate perception through reliable state inference. However, developing representative soft sensor models is challenged by issues such as missing labels, domain adaptability, and temporal coherence in data. To address these challenges, we propose a deep Particle Flow Bayes (DPFB) framework for cross-domain soft sensor modeling in the absence o…
▽ More
Data-driven soft sensors are essential for achieving accurate perception through reliable state inference. However, developing representative soft sensor models is challenged by issues such as missing labels, domain adaptability, and temporal coherence in data. To address these challenges, we propose a deep Particle Flow Bayes (DPFB) framework for cross-domain soft sensor modeling in the absence of target state labels. In particular, a sequential Bayes objective is first formulated to perform the maximum likelihood estimation underlying the cross-domain soft sensing problem. At the core of the framework, we incorporate a physics-inspired particle flow that optimizes the sequential Bayes objective to perform an exact Bayes update of the model extracted latent and hidden features. As a result, these contributions enable the proposed framework to learn a rich approximate posterior feature representation capable of characterizing complex cross-domain system dynamics and performing effective time series unsupervised domain adaptation (UDA). Finally, we validate the framework on a complex industrial multiphase flow process system with complex dynamics and multiple operating conditions. The results demonstrate that the DPFB framework achieves superior cross-domain soft sensing performance, outperforming state-of-the-art deep UDA and normalizing flow approaches.
△ Less
Submitted 8 July, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Cross-domain Transfer Learning and State Inference for Soft Robots via a Semi-supervised Sequential Variational Bayes Framework
Authors:
Shageenderan Sapai,
Junn Yong Loo,
Ze Yang Ding,
Chee Pin Tan,
Raphael CW Phan,
Vishnu Monn Baskaran,
Surya Girinatha Nurzaman
Abstract:
Recently, data-driven models such as deep neural networks have shown to be promising tools for modelling and state inference in soft robots. However, voluminous amounts of data are necessary for deep models to perform effectively, which requires exhaustive and quality data collection, particularly of state labels. Consequently, obtaining labelled state data for soft robotic systems is challenged f…
▽ More
Recently, data-driven models such as deep neural networks have shown to be promising tools for modelling and state inference in soft robots. However, voluminous amounts of data are necessary for deep models to perform effectively, which requires exhaustive and quality data collection, particularly of state labels. Consequently, obtaining labelled state data for soft robotic systems is challenged for various reasons, including difficulty in the sensorization of soft robots and the inconvenience of collecting data in unstructured environments. To address this challenge, in this paper, we propose a semi-supervised sequential variational Bayes (DSVB) framework for transfer learning and state inference in soft robots with missing state labels on certain robot configurations. Considering that soft robots may exhibit distinct dynamics under different robot configurations, a feature space transfer strategy is also incorporated to promote the adaptation of latent features across multiple configurations. Unlike existing transfer learning approaches, our proposed DSVB employs a recurrent neural network to model the nonlinear dynamics and temporal coherence in soft robot data. The proposed framework is validated on multiple setup configurations of a pneumatic-based soft robot finger. Experimental results on four transfer scenarios demonstrate that DSVB performs effective transfer learning and accurate state inference amidst missing state labels. The data and code are available at https://github.com/shageenderan/DSVB.
△ Less
Submitted 9 November, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification
Authors:
Sin-Yee Yap,
Junn Yong Loo,
Chee-Ming Ting,
Fuad Noman,
Raphael C. -W. Phan,
Adeel Razi,
David L. Dowe
Abstract:
Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for ide…
▽ More
Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.
△ Less
Submitted 9 November, 2024; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Fast and Efficient Malware Detection with Joint Static and Dynamic Features Through Transfer Learning
Authors:
Mao V. Ngo,
Tram Truong-Huu,
Dima Rabadi,
Jia Yi Loo,
Sin G. Teo
Abstract:
In malware detection, dynamic analysis extracts the runtime behavior of malware samples in a controlled environment and static analysis extracts features using reverse engineering tools. While the former faces the challenges of anti-virtualization and evasive behavior of malware samples, the latter faces the challenges of code obfuscation. To tackle these drawbacks, prior works proposed to develop…
▽ More
In malware detection, dynamic analysis extracts the runtime behavior of malware samples in a controlled environment and static analysis extracts features using reverse engineering tools. While the former faces the challenges of anti-virtualization and evasive behavior of malware samples, the latter faces the challenges of code obfuscation. To tackle these drawbacks, prior works proposed to develop detection models by aggregating dynamic and static features, thus leveraging the advantages of both approaches. However, simply concatenating dynamic and static features raises an issue of imbalanced contribution due to the heterogeneous dimensions of feature vectors to the performance of malware detection models. Yet, dynamic analysis is a time-consuming task and requires a secure environment, leading to detection delays and high costs for maintaining the analysis infrastructure. In this paper, we first introduce a method of constructing aggregated features via concatenating latent features learned through deep learning with equally-contributed dimensions. We then develop a knowledge distillation technique to transfer knowledge learned from aggregated features by a teacher model to a student model trained only on static features and use the trained student model for the detection of new malware samples. We carry out extensive experiments with a dataset of 86709 samples including both benign and malware samples. The experimental results show that the teacher model trained on aggregated features constructed by our method outperforms the state-of-the-art models with an improvement of up to 2.38% in detection accuracy. The distilled student model not only achieves high performance (97.81% in terms of accuracy) as that of the teacher model but also significantly reduces the detection time (from 70046.6 ms to 194.9 ms) without requiring dynamic analysis.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Virtual vs. Reality: External Validation of COVID-19 Classifiers using XCAT Phantoms for Chest Computed Tomography
Authors:
Fakrul Islam Tushar,
Ehsan Abadi,
Saman Sotoudeh-Paima,
Rafael B. Fricks,
Maciej A. Mazurowski,
W. Paul Segars,
Ehsan Samei,
Joseph Y. Lo
Abstract:
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID…
▽ More
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models under different COVID-19 morphology and imaging properties. We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data and an in-house model trained with the open clinical dataset called MosMed. We further validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance. The open-source model was published with nearly perfect performance on the original Waterloo dataset but showed a consistent performance drop in external testing on another clinical dataset (AUC=0.77) and our simulated CVIT-COVID dataset (AUC=0.55). The in-house model achieved an AUC of 0.87 while testing on the internal test set (MosMed test set). However, performance dropped to an AUC of 0.65 and 0.69 when evaluated on clinical and our simulated CVIT-COVID dataset. The VIT framework offered control over imaging conditions, allowing us to show there was no change in performance as CT exposure was changed from 28.5 to 57 mAs. The VIT framework also provided voxel-level ground truth, revealing that performance of in-house model was much higher at AUC=0.87 for diffuse COVID-19 infection size >2.65% lung volume versus AUC=0.52 for focal disease with <2.65% volume. The virtual imaging framework enabled these uniquely rigorous analyses of model performance.
△ Less
Submitted 6 March, 2022;
originally announced March 2022.
-
Quality or Quantity: Toward a Unified Approach for Multi-organ Segmentation in Body CT
Authors:
Fakrul Islam Tushar,
Husam Nujaim,
Wanyi Fu,
Ehsan Abadi,
Maciej A. Mazurowski,
Ehsan Samei,
William P. Segars,
Joseph Y. Lo
Abstract:
Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will f…
▽ More
Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will facilitate the creation of large numbers of accurate virtual phantoms. Initially, we compared two segmentation architectures, 3D-Unet and DenseVNet, which were trained using XCAT data that is fully labeled with 22 organs, and chose the 3D-Unet as the better performing model. We used the XCAT-trained model to generate pseudo-labels for the CT-ORG dataset that has only 7 organs segmented. We performed two experiments: First, we trained 3D-UNet model on the XCAT dataset, representing quality data, and tested it on both XCAT and CT-ORG datasets. Second, we trained 3D-UNet after including the CT-ORG dataset into the training set to have more quantity. Performance improved for segmentation in the organs where we have true labels in both datasets and degraded when relying on pseudo-labels. When organs were labeled in both datasets, Exp-2 improved Average DSC in XCAT and CT-ORG by 1. This demonstrates that quality data is the key to improving the model's performance.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Interpretable Mammographic Image Classification using Case-Based Reasoning and Deep Learning
Authors:
Alina Jade Barnett,
Fides Regina Schwartz,
Chaofan Tao,
Chaofan Chen,
Yinhao Ren,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neura…
▽ More
When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Designed to aid a radiologist in their decisions, our network presents both a prediction of malignancy and an explanation of that prediction using known medical features. In order to yield helpful explanations, the network is designed to mimic the reasoning processes of a radiologist: our network first detects the clinically relevant semantic features of each image by comparing each new image with a learned set of prototypical image parts from the training images, then uses those clinical features to predict malignancy. Compared to other methods, our model detects clinical features (mass margins) with equal or higher accuracy, provides a more detailed explanation of its prediction, and is better able to differentiate the classification-relevant parts of the image.
△ Less
Submitted 4 October, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography
Authors:
Alina Jade Barnett,
Fides Regina Schwartz,
Chaofan Tao,
Chaofan Chen,
Yinhao Ren,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a m…
▽ More
Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a mammogram alone. In this work, we present a framework for interpretable machine learning-based mammography. In addition to predicting whether a lesion is malignant or benign, our work aims to follow the reasoning processes of radiologists in detecting clinically relevant semantic features of each image, such as the characteristics of the mass margins. The framework includes a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Our algorithm can incorporate a combination of data with whole image labelling and data with pixel-wise annotations, leading to better accuracy and interpretability even with a small number of images. Our interpretable models are able to highlight the classification-relevant parts of the image, whereas other methods highlight healthy tissue and confounding information. Our models are decision aids, rather than decision makers, aimed at better overall human-machine collaboration. We do not observe a loss in mass margin classification accuracy over a black box neural network trained on the same data.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Multi-Label Annotation of Chest Abdomen Pelvis Computed Tomography Text Reports Using Deep Learning
Authors:
Vincent M. D'Anniballe,
Fakrul Islam Tushar,
Khrystyna Faryna,
Songyue Han,
Maciej A. Mazurowski,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Purpose: To develop high throughput multi-label annotators for body (chest, abdomen, and pelvis) Computed Tomography (CT) reports that can be applied across a variety of abnormalities, organs, and disease states.
Approach: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleur…
▽ More
Purpose: To develop high throughput multi-label annotators for body (chest, abdomen, and pelvis) Computed Tomography (CT) reports that can be applied across a variety of abnormalities, organs, and disease states.
Approach: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Confounding effects on model performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. Performance was evaluated using the receiver operating characteristic (ROC) area under the curve (AUC) against 2,158 manually obtained labels.
Results: Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with relatively small number of cases. Pre-trained classification AUCs achieved > 0.95 for all five disease outcomes across all three organ systems.
Conclusions: Our label-extracting pipeline was able to encompass a variety of cases and diseases by generalizing beyond strict rules with exceptional accuracy. This method can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.
△ Less
Submitted 7 March, 2022; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model
Authors:
Mateusz Buda,
Ashirbani Saha,
Ruth Walsh,
Sujata Ghate,
Nianyi Li,
Albert Święcicki,
Joseph Y. Lo,
Maciej A. Mazurowski
Abstract:
Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an is…
▽ More
Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an issue for digital breast tomosynthesis (DBT) which is a relatively new breast cancer screening modality. We have curated and made publicly available a large-scale dataset of digital breast tomosynthesis images. It contains 22,032 reconstructed DBT volumes belonging to 5,610 studies from 5,060 patients. This included four groups: (1) 5,129 normal studies, (2) 280 studies where additional imaging was needed but no biopsy was performed, (3) 112 benign biopsied studies, and (4) 89 studies with cancer. Our dataset included masses and architectural distortions which were annotated by two experienced radiologists. Additionally, we developed a single-phase deep learning detection model and tested it using our dataset to serve as a baseline for future research. Our model reached a sensitivity of 65% at 2 false positives per breast. Our large, diverse, and highly-curated dataset will facilitate development and evaluation of AI algorithms for breast cancer screening through providing data for training as well as common set of cases for model validation. The performance of the model developed in our study shows that the task remains challenging and will serve as a baseline for future model development.
△ Less
Submitted 20 November, 2022; v1 submitted 13 November, 2020;
originally announced November 2020.
-
Weakly Supervised 3D Classification of Chest CT using Aggregated Multi-Resolution Deep Segmentation Features
Authors:
Anindo Saha,
Fakrul I. Tushar,
Khrystyna Faryna,
Vincent M. D'Anniballe,
Rui Hou,
Maciej A. Mazurowski,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Weakly supervised disease classification of CT imaging suffers from poor localization owing to case-level annotations, where even a positive scan can hold hundreds to thousands of negative slices along multiple planes. Furthermore, although deep learning segmentation and classification models extract distinctly unique combinations of anatomical features from the same target class(es), they are typ…
▽ More
Weakly supervised disease classification of CT imaging suffers from poor localization owing to case-level annotations, where even a positive scan can hold hundreds to thousands of negative slices along multiple planes. Furthermore, although deep learning segmentation and classification models extract distinctly unique combinations of anatomical features from the same target class(es), they are typically seen as two independent processes in a computer-aided diagnosis (CAD) pipeline, with little to no feature reuse. In this research, we propose a medical classifier that leverages the semantic structural concepts learned via multi-resolution segmentation feature maps, to guide weakly supervised 3D classification of chest CT volumes. Additionally, a comparative analysis is drawn across two different types of feature aggregation to explore the vast possibilities surrounding feature fusion. Using a dataset of 1593 scans labeled on a case-level basis via rule-based model, we train a dual-stage convolutional neural network (CNN) to perform organ segmentation and binary classification of four representative diseases (emphysema, pneumonia/atelectasis, mass and nodules) in lungs. The baseline model, with separate stages for segmentation and classification, results in AUC of 0.791. Using identical hyperparameters, the connected architecture using static and dynamic feature aggregation improves performance to AUC of 0.832 and 0.851, respectively. This study advances the field in two key ways. First, case-level report data is used to weakly supervise a 3D CT classifier of multiple, simultaneous diseases for an organ. Second, segmentation and classification models are connected with two different feature aggregation strategies to enhance the classification performance.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
iPhantom: a framework for automated creation of individualized computational phantoms and its application to CT organ dosimetry
Authors:
Wanyi Fu,
Shobhit Sharma,
Ehsan Abadi,
Alexandros-Stavros Iliopoulos,
Qi Wang,
Joseph Y. Lo,
Xiaobai Sun,
William P. Segars,
Ehsan Samei
Abstract:
Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas)…
▽ More
Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs challenging to segment (e.g. intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting full-patient phantoms are used to assess organ doses during routine CT exams. Result: iPhantom was validated on both the XCAT (n=50) and an independent clinical (n=10) dataset with similar accuracy. iPhantom precisely predicted all organ locations with good accuracy of Dice Similarity Coefficients (DSC) >0.6 for anchor organs and DSC of 0.3-0.9 for all other organs. iPhantom showed less than 10% dose errors for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). Conclusion: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. Significance: The new framework brings the creation and application of CHPs to the level of individual CHPs through automation, achieving a wider and precise organ localization, paving the way for clinical monitoring, and personalized optimization, and large-scale research.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Classification of Multiple Diseases on Body CT Scans using Weakly Supervised Deep Learning
Authors:
Fakrul Islam Tushar,
Vincent M. D'Anniballe,
Rui Hou,
Maciej A. Mazurowski,
Wanyi Fu,
Ehsan Samei,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Purpose: To design multi-disease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports.Materials & Methods: This retrospective study included a total of 12,092 patients (mean age 57 +- 18; 6,172 women) for model development and testing (from 2012-2017). Rule-based algorithms were used to extract 19,225 disease labels from 1…
▽ More
Purpose: To design multi-disease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports.Materials & Methods: This retrospective study included a total of 12,092 patients (mean age 57 +- 18; 6,172 women) for model development and testing (from 2012-2017). Rule-based algorithms were used to extract 19,225 disease labels from 13,667 body CT scans from 12,092 patients. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura; liver and gallbladder; and kidneys and ureters. For each organ, a three-dimensional convolutional neural network classified no apparent disease versus four common diseases for a total of 15 different labels across all three models. Testing was performed on a subset of 2,158 CT volumes relative to 2,875 manually derived reference labels from 2133 patients (mean age 58 +- 18;1079 women). Performance was reported as receiver operating characteristic area under the curve (AUC) with 95% confidence intervals by the DeLong method. Results: Manual validation of the extracted labels confirmed 91% to 99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were: atelectasis 0.77 (95% CI: 0.74, 0.81), nodule 0.65 (0.61, 0.69), emphysema 0.89 (0.86, 0.92), effusion 0.97 (0.96, 0.98), and no apparent disease 0.89 (0.87, 0.91). AUCs for liver and gallbladder were: hepatobiliary calcification 0.62 (95% CI: 0.56, 0.67), lesion 0.73 (0.69, 0.77), dilation 0.87 (0.84, 0.90), fatty 0.89 (0.86, 0.92), and no apparent disease 0.82 (0.78, 0.85). AUCs for kidneys and ureters were: stone 0.83 (95% CI: 0.79, 0.87), atrophy 0.92 (0.89, 0.94), lesion 0.68 (0.64, 0.72), cyst 0.70 (0.66, 0.73), and no apparent disease 0.79 (0.75, 0.83). Conclusion: Weakly-supervised deep learning models were able to classify diverse diseases in multiple organ systems.
△ Less
Submitted 16 November, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes
Authors:
Rachel Lea Draelos,
David Dov,
Maciej A. Mazurowski,
Joseph Y. Lo,
Ricardo Henao,
Geoffrey D. Rubin,
Lawrence Carin
Abstract:
Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported. To annotate this data set, we developed a rule-based method for automatically extract…
▽ More
Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported. To annotate this data set, we developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports with an average F-score of 0.976 (min 0.941, max 1.0). We also developed a model for multi-organ, multi-disease classification of chest CT volumes that uses a deep convolutional neural network (CNN). This model reached a classification performance of AUROC greater than 0.90 for 18 abnormalities, with an average AUROC of 0.773 for all 83 abnormalities, demonstrating the feasibility of learning from unfiltered whole volume CT data. We show that training on more labels improves performance significantly: for a subset of 9 labels - nodule, opacity, atelectasis, pleural effusion, consolidation, mass, pericardial effusion, cardiomegaly, and pneumothorax - the model's average AUROC increased by 10% when the number of training labels was increased from 9 to all 83. All code for volume preprocessing, automated label extraction, and the volume abnormality prediction model will be made publicly available. The 36,316 CT volumes and labels will also be made publicly available pending institutional approval.
△ Less
Submitted 12 October, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.