-
STACT-Time: Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification
Authors:
Irsyad Adam,
Tengyue Zhang,
Shrayes Raman,
Zhuyu Qiu,
Brandon Taraku,
Hexiang Feng,
Sile Wang,
Ashwath Radhachandran,
Shreeram Athreya,
Vedrana Ivezic,
Peipei Ping,
Corey Arnold,
William Speier
Abstract:
Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thy…
▽ More
Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS) has been developed to reduce benign biopsies. However, such systems are limited by interobserver variability. Recent deep learning approaches have sought to improve risk stratification, but they often fail to utilize the rich temporal and spatial context provided by US cine clips, which contain dynamic global information and surrounding structural changes across various views. In this work, we propose the Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification (STACT-Time) model, a novel representation learning framework that integrates imaging features from US cine clips with features from segmentation masks automatically generated by a pretrained model. By leveraging self-attention and cross-attention mechanisms, our model captures the rich temporal and spatial context of US cine clips while enhancing feature representation through segmentation-guided learning. Our model improves malignancy prediction compared to state-of-the-art models, achieving a cross-validation precision of 0.91 (plus or minus 0.02) and an F1 score of 0.89 (plus or minus 0.02). By reducing unnecessary biopsies of benign nodules while maintaining high sensitivity for malignancy detection, our model has the potential to enhance clinical decision-making and improve patient outcomes.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
CytoFM: The first cytology foundation model
Authors:
Vedrana Ivezić,
Ashwath Radhachandran,
Ekaterina Redekop,
Shreeram Athreya,
Dongwoo Lee,
Vivek Sant,
Corey Arnold,
William Speier
Abstract:
Cytology is essential for cancer diagnostics and screening due to its minimally invasive nature. However, the development of robust deep learning models for digital cytology is challenging due to the heterogeneity in staining and preparation methods of samples, differences across organs, and the limited availability of large, diverse, annotated datasets. Developing a task-specific model for every…
▽ More
Cytology is essential for cancer diagnostics and screening due to its minimally invasive nature. However, the development of robust deep learning models for digital cytology is challenging due to the heterogeneity in staining and preparation methods of samples, differences across organs, and the limited availability of large, diverse, annotated datasets. Developing a task-specific model for every cytology application is impractical and non-cytology-specific foundation models struggle to generalize to tasks in this domain where the emphasis is on cell morphology. To address these challenges, we introduce CytoFM, the first cytology self-supervised foundation model. Using iBOT, a self-supervised Vision Transformer (ViT) training framework incorporating masked image modeling and self-distillation, we pretrain CytoFM on a diverse collection of cytology datasets to learn robust, transferable representations. We evaluate CytoFM on multiple downstream cytology tasks, including breast cancer classification and cell type identification, using an attention-based multiple instance learning framework. Our results demonstrate that CytoFM performs better on two out of three downstream tasks than existing foundation models pretrained on histopathology (UNI) or natural images (iBOT-Imagenet). Visualizations of learned representations demonstrate our model is able to attend to cytologically relevant features. Despite a small pre-training dataset, CytoFM's promising results highlight the ability of task-agnostic pre-training approaches to learn robust and generalizable features from cytology data.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Reducing Overtreatment of Indeterminate Thyroid Nodules Using a Multimodal Deep Learning Model
Authors:
Shreeram Athreya,
Andrew Melehy,
Sujit Silas Armstrong Suthahar,
Vedrana Ivezić,
Ashwath Radhachandran,
Vivek Sant,
Chace Moleta,
Henry Zheng,
Maitraya Patel,
Rinat Masamed,
Corey W. Arnold,
William Speier
Abstract:
Objective: Molecular testing (MT) classifies cytologically indeterminate thyroid nodules as benign or malignant with high sensitivity but low positive predictive value (PPV), only using molecular profiles, ignoring ultrasound (US) imaging and biopsy. We address this limitation by applying attention multiple instance learning (AMIL) to US images.
Methods: We retrospectively reviewed 333 patients…
▽ More
Objective: Molecular testing (MT) classifies cytologically indeterminate thyroid nodules as benign or malignant with high sensitivity but low positive predictive value (PPV), only using molecular profiles, ignoring ultrasound (US) imaging and biopsy. We address this limitation by applying attention multiple instance learning (AMIL) to US images.
Methods: We retrospectively reviewed 333 patients with indeterminate thyroid nodules at UCLA medical center (259 benign, 74 malignant). A multi-modal deep learning AMIL model was developed, combining US images and MT to classify the nodules as benign or malignant and enhance the malignancy risk stratification of MT.
Results: The final AMIL model matched MT sensitivity (0.946) while significantly improving PPV (0.477 vs 0.448 for MT alone), indicating fewer false positives while maintaining high sensitivity.
Conclusion: Our approach reduces false positives compared to MT while maintaining the same ability to identify positive cases, potentially reducing unnecessary benign thyroid resections in patients with indeterminate nodules.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Ultrasound Image Enhancement using CycleGAN and Perceptual Loss
Authors:
Shreeram Athreya,
Ashwath Radhachandran,
Vedrana Ivezić,
Vivek Sant,
Corey W. Arnold,
William Speier
Abstract:
Purpose: The objective of this work is to introduce an advanced framework designed to enhance ultrasound images, especially those captured by portable hand-held devices, which often produce lower quality images due to hardware constraints. Additionally, this framework is uniquely capable of effectively handling non-registered input ultrasound image pairs, addressing a common challenge in medical i…
▽ More
Purpose: The objective of this work is to introduce an advanced framework designed to enhance ultrasound images, especially those captured by portable hand-held devices, which often produce lower quality images due to hardware constraints. Additionally, this framework is uniquely capable of effectively handling non-registered input ultrasound image pairs, addressing a common challenge in medical imaging. Materials and Methods: In this retrospective study, we utilized an enhanced generative adversarial network (CycleGAN) model for ultrasound image enhancement across five organ systems. Perceptual loss, derived from deep features of pretrained neural networks, is applied to ensure the human-perceptual quality of the enhanced images. These images are compared with paired images acquired from high resolution devices to demonstrate the model's ability to generate realistic high-quality images across organ systems. Results: Preliminary validation of the framework reveals promising performance metrics. The model generates images that result in a Structural Similarity Index (SSI) score of 0.722, Locally Normalized Cross-Correlation (LNCC) score of 0.902 and 28.802 for the Peak Signal-to-Noise Ratio (PSNR) metric. Conclusion: This work presents a significant advancement in medical imaging through the development of a CycleGAN model enhanced with Perceptual Loss (PL), effectively bridging the quality gap between ultrasound images from varied devices. By training on paired images, the model not only improves image quality but also ensures the preservation of vital anatomic structural content. This approach may improve equity in access to healthcare by enhancing portable device capabilities, although further validation and optimizations are necessary for broader clinical application.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.