-
A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer
Authors:
Yumeng Zhang,
Zohaib Salahuddin,
Danial Khan,
Shruti Atul Mali,
Henry C. Woodruff,
Sina Amirrajab,
Eduardo Ibor-Crespo,
Ana Jimenez-Pastor,
Luis Marti-Bonmati,
Philippe Lambin
Abstract:
Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and M…
▽ More
Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and MFI on axial and sagittal T2-weighted MRI. Methods: This retrospective study used 331 pre-treatment rectal cancer MRI examinations from three European hospitals. After TotalSegmentator-guided rectal patch extraction, a self-supervised frequency-domain harmonization pipeline was trained to minimize scanner-related contrast shifts. Four classifiers were compared: ResNet50, SeResNet, the universal biomedical pretrained transformer (UMedPT) with a lightweight MLP head, and a logistic-regression variant using frozen UMedPT features (UMedPT_LR). Results: UMedPT_LR achieved the best EVI detection when axial and sagittal features were fused (AUC = 0.82; sensitivity = 0.75; F1 score = 0.73), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.74). The highest MFI performance was attained by UMedPT on axial harmonized images (AUC = 0.77), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.75). Frequency-domain harmonization improved MFI classification but variably affected EVI performance. Conventional CNNs (ResNet50, SeResNet) underperformed, especially in F1 score and balanced accuracy. Conclusion: These findings demonstrate that combining foundation model features, harmonization, and multi-view fusion significantly enhances diagnostic performance in rectal MRI.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Explainable Anatomy-Guided AI for Prostate MRI: Foundation Models and In Silico Clinical Trials for Virtual Biopsy-based Risk Assessment
Authors:
Danial Khan,
Zohaib Salahuddin,
Yumeng Zhang,
Sheng Kuang,
Shruti Atul Mali,
Henry C. Woodruff,
Sina Amirrajab,
Rachel Cavill,
Eduardo Ibor-Crespo,
Ana Jimenez-Pastor,
Adrian Galiana-Bordera,
Paula Jimenez Gomez,
Luis Marti-Bonmati,
Philippe Lambin
Abstract:
We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatom…
▽ More
We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatomical priors and clinical data; and a VAE-GAN framework for generating counterfactual heatmaps that localize decision-driving image regions. The system was developed using 1,500 PI-CAI cases for segmentation and 617 biparametric MRIs with metadata from the CHAIMELEON challenge for classification (split into 70% training, 10% validation, and 20% testing). Segmentation achieved mean Dice scores of 0.95 (gland), 0.94 (peripheral zone), and 0.92 (transition zone). Incorporating gland priors improved AUC from 0.69 to 0.72, with a three-scale ensemble achieving top performance (AUC = 0.79, composite score = 0.76), outperforming the 2024 CHAIMELEON challenge winners. Counterfactual heatmaps reliably highlighted lesions within segmented regions, enhancing model interpretability. In a prospective multi-center in-silico trial with 20 clinicians, AI assistance increased diagnostic accuracy from 0.72 to 0.77 and Cohen's kappa from 0.43 to 0.53, while reducing review time per case by 40%. These results demonstrate that anatomy-aware foundation models with counterfactual explainability can enable accurate, interpretable, and efficient PCa risk assessment, supporting their potential use as virtual biopsies in clinical practice.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom
Authors:
Adrito Das,
Bilal Sidiqi,
Laurent Mennillo,
Zhehua Mao,
Mikael Brudfors,
Miguel Xochicale,
Danyal Z. Khan,
Nicola Newall,
John G. Hanrahan,
Matthew J. Clarkson,
Danail Stoyanov,
Hani J. Marcus,
Sophia Bano
Abstract:
Improved surgical skill is generally associated with improved patient outcomes, although assessment is subjective; labour-intensive; and requires domain specific expertise. Automated data driven metrics can alleviate these difficulties, as demonstrated by existing machine learning instrument tracking models in minimally invasive surgery. However, these models have been tested on limited datasets o…
▽ More
Improved surgical skill is generally associated with improved patient outcomes, although assessment is subjective; labour-intensive; and requires domain specific expertise. Automated data driven metrics can alleviate these difficulties, as demonstrated by existing machine learning instrument tracking models in minimally invasive surgery. However, these models have been tested on limited datasets of laparoscopic surgery, with a focus on isolated tasks and robotic surgery. In this paper, a new public dataset is introduced, focusing on simulated surgery, using the nasal phase of endoscopic pituitary surgery as an exemplar. Simulated surgery allows for a realistic yet repeatable environment, meaning the insights gained from automated assessment can be used by novice surgeons to hone their skills on the simulator before moving to real surgery. PRINTNet (Pituitary Real-time INstrument Tracking Network) has been created as a baseline model for this automated assessment. Consisting of DeepLabV3 for classification and segmentation; StrongSORT for tracking; and the NVIDIA Holoscan SDK for real-time performance, PRINTNet achieved 71.9% Multiple Object Tracking Precision running at 22 Frames Per Second. Using this tracking output, a Multilayer Perceptron achieved 87% accuracy in predicting surgical skill level (novice or expert), with the "ratio of total procedure time to instrument visible time" correlated with higher surgical skill. This therefore demonstrates the feasibility of automated surgical skill assessment in simulated endoscopic pituitary surgery. The new publicly available dataset can be found here: https://doi.org/10.5522/04/26511049.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
PitRSDNet: Predicting Intra-operative Remaining Surgery Duration in Endoscopic Pituitary Surgery
Authors:
Anjana Wijekoon,
Adrito Das,
Roxana R. Herrera,
Danyal Z. Khan,
John Hanrahan,
Eleanor Carter,
Valpuri Luoma,
Danail Stoyanov,
Hani J. Marcus,
Sophia Bano
Abstract:
Accurate intra-operative Remaining Surgery Duration (RSD) predictions allow for anaesthetists to more accurately decide when to administer anaesthetic agents and drugs, as well as to notify hospital staff to send in the next patient. Therefore RSD plays an important role in improving patient care and minimising surgical theatre costs via efficient scheduling. In endoscopic pituitary surgery, it is…
▽ More
Accurate intra-operative Remaining Surgery Duration (RSD) predictions allow for anaesthetists to more accurately decide when to administer anaesthetic agents and drugs, as well as to notify hospital staff to send in the next patient. Therefore RSD plays an important role in improving patient care and minimising surgical theatre costs via efficient scheduling. In endoscopic pituitary surgery, it is uniquely challenging due to variable workflow sequences with a selection of optional steps contributing to high variability in surgery duration. This paper presents PitRSDNet for predicting RSD during pituitary surgery, a spatio-temporal neural network model that learns from historical data focusing on workflow sequences. PitRSDNet integrates workflow knowledge into RSD prediction in two forms: 1) multi-task learning for concurrently predicting step and RSD; and 2) incorporating prior steps as context in temporal learning and inference. PitRSDNet is trained and evaluated on a new endoscopic pituitary surgery dataset with 88 videos to show competitive performance improvements over previous statistical and machine learning methods. The findings also highlight how PitRSDNet improve RSD precision on outlier cases utilising the knowledge of prior steps.
△ Less
Submitted 4 November, 2024; v1 submitted 25 September, 2024;
originally announced September 2024.
-
Biomedical Image Segmentation: A Systematic Literature Review of Deep Learning Based Object Detection Methods
Authors:
Fazli Wahid,
Yingliang Ma,
Dawar Khan,
Muhammad Aamir,
Syed U. K. Bukhari
Abstract:
Biomedical image segmentation plays a vital role in diagnosis of diseases across various organs. Deep learning-based object detection methods are commonly used for such segmentation. There exists an extensive research in this topic. However, there is no standard review on this topic. Existing surveys often lack a standardized approach or focus on broader segmentation techniques. In this paper, we…
▽ More
Biomedical image segmentation plays a vital role in diagnosis of diseases across various organs. Deep learning-based object detection methods are commonly used for such segmentation. There exists an extensive research in this topic. However, there is no standard review on this topic. Existing surveys often lack a standardized approach or focus on broader segmentation techniques. In this paper, we conducted a systematic literature review (SLR), collected and analysed 148 articles that explore deep learning object detection methods for biomedical image segmentation. We critically analyzed these methods, identified the key challenges, and discussed the future directions. From the selected articles we extracted the results including the deep learning models, targeted imaging modalities, targeted diseases, and the metrics for the analysis of the methods. The results have been presented in tabular and/or charted forms. The results are presented in three major categories including two stage detection models, one stage detection models and point-based detection models. Each article is individually analyzed along with its pros and cons. Finally, we discuss open challenges, potential benefits, and future research directions. This SLR aims to provide the research community with a quick yet deeper understanding of these segmentation models, ultimately facilitating the development of more powerful solutions for biomedical image analysis.
△ Less
Submitted 28 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN
Authors:
Muhammad Danyal Khan,
Raheem Ali,
Arshad Aziz
Abstract:
Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sent…
▽ More
Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sentiment analysis. ASR for Call Center requires more robustness as telephonic environment are generally noisy. Moreover, there are many low-resourced languages that are on verge of extinction which can be preserved with help of Automatic Speech Recognition Technology. Urdu is the $10^{th}$ most widely spoken language in the world, with 231,295,440 worldwide still remains a resource constrained language in ASR. Regional call-center conversations operate in local language, with a mix of English numbers and technical terms generally causing a "code-switching" problem. Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. Using Hybrid HMM-DNN approach allowed us to utilize the advantages of Neural Network with less labelled data. Adding CNN with TDNN has shown to work better in noisy environment due to CNN's additional frequency dimension which captures extra information from noisy speech, thus improving accuracy. We collected data from various open sources and labelled some of the unlabelled data after analysing its general context and content from Urdu language as well as from commonly used words from other languages, primarily English and were able to achieve WER of 5.2% with noisy as well as clean environment in isolated words or numbers as well as in continuous spontaneous speech.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Event-Triggered Output Synchronization of Heterogeneous Nonlinear Multi-Agents
Authors:
Gulam Dastagir Khan,
Zhiyong Chen,
Yamin Yan
Abstract:
This paper addresses the output synchronization problem for heterogeneous nonlinear multi-agent systems with distributed event-based controllers. Employing the two-step synchronization process, we first outline the distributed event-triggered consensus controllers for linear reference models under a directed communication topology. It is further shown that the subsequent triggering instants are ba…
▽ More
This paper addresses the output synchronization problem for heterogeneous nonlinear multi-agent systems with distributed event-based controllers. Employing the two-step synchronization process, we first outline the distributed event-triggered consensus controllers for linear reference models under a directed communication topology. It is further shown that the subsequent triggering instants are based on intermittent communication. Secondly, by using certain input-to-state stability (ISS) property, we design an event-triggered perturbed output regulation controller for each nonlinear multi-agent. The ISS technique used in this paper is based on the milder condition that each agent has a certain ISS property from input (actuator) disturbance to state rather than measurement (sensor) disturbance to state. With the two-step design, the objective of output synchronization is successfully achieved with Zeno behavior avoided.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.