-
Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models
Authors:
Francisco Caetano,
Christiaan Viviers,
Peter H. N. De With,
Fons van der Sommen
Abstract:
Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reve…
▽ More
Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reverse transformations jointly, ensuring bi-directional consistency, while preserving sufficient entropy for generative diversity. A new training objective is introduced to explicitly retain semantic information across flows, featuring efficient sampling while preserving semantic structure, allowing for one-step segmentation and classification without iterative refinement. Unlike previous approaches that impose strict one-to-one mapping between masks and images, SymmFlow generalizes to flexible conditioning, supporting both pixel-level and image-level class labels. Experimental results on various benchmarks demonstrate that SymmFlow achieves state-of-the-art performance on semantic image synthesis, obtaining FID scores of 11.9 on CelebAMask-HQ and 7.0 on COCO-Stuff with only 25 inference steps. Additionally, it delivers competitive results on semantic segmentation and shows promising capabilities in classification tasks. The code will be publicly available.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase Recognition
Authors:
Yiping Li,
Ronald de Jong,
Sahar Nasirihaghighi,
Tim Jaspers,
Romy van Jaarsveld,
Gino Kuiper,
Richard van Hillegersberg,
Fons van der Sommen,
Jelle Ruurda,
Marcel Breeuwer,
Yasmina Al Khalil
Abstract:
Accurate surgical phase recognition is crucial for computer-assisted interventions and surgical video analysis. Annotating long surgical videos is labor-intensive, driving research toward leveraging unlabeled data for strong performance with minimal annotations. Although self-supervised learning has gained popularity by enabling large-scale pretraining followed by fine-tuning on small labeled subs…
▽ More
Accurate surgical phase recognition is crucial for computer-assisted interventions and surgical video analysis. Annotating long surgical videos is labor-intensive, driving research toward leveraging unlabeled data for strong performance with minimal annotations. Although self-supervised learning has gained popularity by enabling large-scale pretraining followed by fine-tuning on small labeled subsets, semi-supervised approaches remain largely underexplored in the surgical domain. In this work, we propose a video transformer-based model with a robust pseudo-labeling framework. Our method incorporates temporal consistency regularization for unlabeled data and contrastive learning with class prototypes, which leverages both labeled data and pseudo-labels to refine the feature space. Through extensive experiments on the private RAMIE (Robot-Assisted Minimally Invasive Esophagectomy) dataset and the public Cholec80 dataset, we demonstrate the effectiveness of our approach. By incorporating unlabeled data, we achieve state-of-the-art performance on RAMIE with a 4.9% accuracy increase and obtain comparable results to full supervision while using only 1/4 of the labeled data on Cholec80. Our findings establish a strong benchmark for semi-supervised surgical phase recognition, paving the way for future research in this domain.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs
Authors:
Francisco Caetano,
Christiaan Viviers,
Lena Filatova,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Ensuring the quality and integrity of medical images is crucial for maintaining diagnostic accuracy in deep learning-based Computer-Aided Diagnosis and Computer-Aided Detection (CAD) systems. Covariate shifts are subtle variations in the data distribution caused by different imaging devices or settings and can severely degrade model performance, similar to the effects of adversarial attacks. There…
▽ More
Ensuring the quality and integrity of medical images is crucial for maintaining diagnostic accuracy in deep learning-based Computer-Aided Diagnosis and Computer-Aided Detection (CAD) systems. Covariate shifts are subtle variations in the data distribution caused by different imaging devices or settings and can severely degrade model performance, similar to the effects of adversarial attacks. Therefore, it is vital to have a lightweight and fast method to assess the quality of these images prior to using CAD models. AdverX-Ray addresses this need by serving as an image-quality assessment layer, designed to detect covariate shifts effectively. This Adversarial Variational Autoencoder prioritizes the discriminator's role, using the suboptimal outputs of the generator as negative samples to fine-tune the discriminator's ability to identify high-frequency artifacts. Images generated by adversarial networks often exhibit severe high-frequency artifacts, guiding the discriminator to focus excessively on these components. This makes the discriminator ideal for this approach. Trained on patches from X-ray images of specific machine models, AdverX-Ray can evaluate whether a scan matches the training distribution, or if a scan from the same machine is captured under different settings. Extensive comparisons with various OOD detection methods show that AdverX-Ray significantly outperforms existing techniques, achieving a 96.2% average AUROC using only 64 random patches from an X-ray. Its lightweight and fast architecture makes it suitable for real-time applications, enhancing the reliability of medical imaging systems. The code and pretrained models are publicly available.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Scaling up self-supervised learning for improved surgical foundation models
Authors:
Tim J. M. Jaspers,
Ronald L. P. D. de Jong,
Yiping Li,
Carolus H. J. Kusters,
Franciscus H. A. Bakker,
Romy C. van Jaarsveld,
Gino M. Kuiper,
Richard van Hillegersberg,
Jelle P. Ruurda,
Willem M. Brinkman,
Josien P. W. Pluim,
Peter H. N. de With,
Marcel Breeuwer,
Yasmina Al Khalil,
Fons van der Sommen
Abstract:
Foundation models have revolutionized computer vision by achieving vastly superior performance across diverse tasks through large-scale pretraining on extensive datasets. However, their application in surgical computer vision has been limited. This study addresses this gap by introducing SurgeNetXL, a novel surgical foundation model that sets a new benchmark in surgical computer vision. Trained on…
▽ More
Foundation models have revolutionized computer vision by achieving vastly superior performance across diverse tasks through large-scale pretraining on extensive datasets. However, their application in surgical computer vision has been limited. This study addresses this gap by introducing SurgeNetXL, a novel surgical foundation model that sets a new benchmark in surgical computer vision. Trained on the largest reported surgical dataset to date, comprising over 4.7 million video frames, SurgeNetXL achieves consistent top-tier performance across six datasets spanning four surgical procedures and three tasks, including semantic segmentation, phase recognition, and critical view of safety (CVS) classification. Compared with the best-performing surgical foundation models, SurgeNetXL shows mean improvements of 2.4, 9.0, and 12.6 percent for semantic segmentation, phase recognition, and CVS classification, respectively. Additionally, SurgeNetXL outperforms the best-performing ImageNet-based variants by 14.4, 4.0, and 1.6 percent in the respective tasks. In addition to advancing model performance, this study provides key insights into scaling pretraining datasets, extending training durations, and optimizing model architectures specifically for surgical computer vision. These findings pave the way for improved generalizability and robustness in data-scarce scenarios, offering a comprehensive framework for future research in this domain. All models and a subset of the SurgeNetXL dataset, including over 2 million video frames, are publicly available at: https://github.com/TimJaspers0801/SurgeNet.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
Authors:
Francisco Caetano,
Christiaan Viviers,
Luis A. Zavala-Mondragón,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Out-of-distribution (OOD) detection holds significant importance across many applications. While semantic and domain-shift OOD problems are well-studied, this work focuses on covariate shifts - subtle variations in the data distribution that can degrade machine learning performance. We hypothesize that detecting these subtle shifts can improve our understanding of in-distribution boundaries, ultim…
▽ More
Out-of-distribution (OOD) detection holds significant importance across many applications. While semantic and domain-shift OOD problems are well-studied, this work focuses on covariate shifts - subtle variations in the data distribution that can degrade machine learning performance. We hypothesize that detecting these subtle shifts can improve our understanding of in-distribution boundaries, ultimately improving OOD detection. In adversarial discriminators trained with Batch Normalization (BN), real and adversarial samples form distinct domains with unique batch statistics - a property we exploit for OOD detection. We introduce DisCoPatch, an unsupervised Adversarial Variational Autoencoder (VAE) framework that harnesses this mechanism. During inference, batches consist of patches from the same image, ensuring a consistent data distribution that allows the model to rely on batch statistics. DisCoPatch uses the VAE's suboptimal outputs (generated and reconstructed) as negative samples to train the discriminator, thereby improving its ability to delineate the boundary between in-distribution samples and covariate shifts. By tightening this boundary, DisCoPatch achieves state-of-the-art results in public OOD detection benchmarks. The proposed model not only excels in detecting covariate shifts, achieving 95.5% AUROC on ImageNet-1K(-C) but also outperforms all prior methods on public Near-OOD (95.0%) benchmarks. With a compact model size of 25MB, it achieves high OOD detection performance at notably lower latency than existing methods, making it an efficient and practical solution for real-world OOD detection applications. The code is publicly available.
△ Less
Submitted 30 June, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy
Authors:
Ronald L. P. D. de Jong,
Yasmina al Khalil,
Tim J. M. Jaspers,
Romy C. van Jaarsveld,
Gino M. Kuiper,
Yiping Li,
Richard van Hillegersberg,
Jelle P. Ruurda,
Marcel Breeuwer,
Fons van der Sommen
Abstract:
Esophageal cancer is among the most common types of cancer worldwide. It is traditionally treated using open esophagectomy, but in recent years, robot-assisted minimally invasive esophagectomy (RAMIE) has emerged as a promising alternative. However, robot-assisted surgery can be challenging for novice surgeons, as they often suffer from a loss of spatial orientation. Computer-aided anatomy recogni…
▽ More
Esophageal cancer is among the most common types of cancer worldwide. It is traditionally treated using open esophagectomy, but in recent years, robot-assisted minimally invasive esophagectomy (RAMIE) has emerged as a promising alternative. However, robot-assisted surgery can be challenging for novice surgeons, as they often suffer from a loss of spatial orientation. Computer-aided anatomy recognition holds promise for improving surgical navigation, but research in this area remains limited. In this study, we developed a comprehensive dataset for semantic segmentation in RAMIE, featuring the largest collection of vital anatomical structures and surgical instruments to date. Handling this diverse set of classes presents challenges, including class imbalance and the recognition of complex structures such as nerves. This study aims to understand the challenges and limitations of current state-of-the-art algorithms on this novel dataset and problem. Therefore, we benchmarked eight real-time deep learning models using two pretraining datasets. We assessed both traditional and attention-based networks, hypothesizing that attention-based networks better capture global patterns and address challenges such as occlusion caused by blood or other tissues. The benchmark includes our RAMIE dataset and the publicly available CholecSeg8k dataset, enabling a thorough assessment of surgical segmentation tasks. Our findings indicate that pretraining on ADE20k, a dataset for semantic segmentation, is more effective than pretraining on ImageNet. Furthermore, attention-based models outperform traditional convolutional neural networks, with SegNeXt and Mask2Former achieving higher Dice scores, and Mask2Former additionally excelling in average symmetric surface distance.
△ Less
Submitted 18 December, 2024; v1 submitted 4 December, 2024;
originally announced December 2024.
-
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
Authors:
M. M. A. Valiuddin,
R. J. G. van Sloun,
C. G. A. Viviers,
P. H. N. de With,
F. van der Sommen
Abstract:
Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epist…
▽ More
Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision-making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation, by discussing fundamental concepts of uncertainty quantification, governing advancements in the field as well as the application to various tasks. Moreover, literature on both types of uncertainties trace back to four key applications: (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) Active Learning. An extensive discussion follows that includes an overview of utilized datasets for each of the applications and evaluation of the available methods. We also highlight challenges related to architectures, uncertainty quantification methods, standardization and benchmarking, and finally end with recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.
△ Less
Submitted 2 July, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Can Your Generative Model Detect Out-of-Distribution Covariate Shift?
Authors:
Christiaan Viviers,
Amaan Valiuddin,
Francisco Caetano,
Lemar Abdi,
Lena Filatova,
Peter de With,
Fons van der Sommen
Abstract:
Detecting Out-of-Distribution (OOD) sensory data and covariate distribution shift aims to identify new test examples with different high-level image statistics to the captured, normal and In-Distribution (ID) set. Existing OOD detection literature largely focuses on semantic shift with little-to-no consensus over covariate shift. Generative models capture the ID data in an unsupervised manner, ena…
▽ More
Detecting Out-of-Distribution (OOD) sensory data and covariate distribution shift aims to identify new test examples with different high-level image statistics to the captured, normal and In-Distribution (ID) set. Existing OOD detection literature largely focuses on semantic shift with little-to-no consensus over covariate shift. Generative models capture the ID data in an unsupervised manner, enabling them to effectively identify samples that deviate significantly from this learned distribution, irrespective of the downstream task. In this work, we elucidate the ability of generative models to detect and quantify domain-specific covariate shift through extensive analyses that involves a variety of models. To this end, we conjecture that it is sufficient to detect most occurring sensory faults (anomalies and deviations in global signals statistics) by solely modeling high-frequency signal-dependent and independent details. We propose a novel method, CovariateFlow, for OOD detection, specifically tailored to covariate heteroscedastic high-frequency image-components using conditional Normalizing Flows (cNFs). Our results on CIFAR10 vs. CIFAR10-C and ImageNet200 vs. ImageNet200-C demonstrate the effectiveness of the method by accurately detecting OOD covariate shift. This work contributes to enhancing the fidelity of imaging systems and aiding machine learning models in OOD detection in the presence of covariate shift.
△ Less
Submitted 9 October, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Find the Assembly Mistakes: Error Segmentation for Industrial Applications
Authors:
Dan Lehman,
Tim J. Schoonbeek,
Shao-Hsuan Hung,
Jacek Kustra,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Recognizing errors in assembly and maintenance procedures is valuable for industrial applications, since it can increase worker efficiency and prevent unplanned down-time. Although assembly state recognition is gaining attention, none of the current works investigate assembly error localization. Therefore, we propose StateDiffNet, which localizes assembly errors based on detecting the differences…
▽ More
Recognizing errors in assembly and maintenance procedures is valuable for industrial applications, since it can increase worker efficiency and prevent unplanned down-time. Although assembly state recognition is gaining attention, none of the current works investigate assembly error localization. Therefore, we propose StateDiffNet, which localizes assembly errors based on detecting the differences between a (correct) intended assembly state and a test image from a similar viewpoint. StateDiffNet is trained on synthetically generated image pairs, providing full control over the type of meaningful change that should be detected. The proposed approach is the first to correctly localize assembly errors taken from real ego-centric video data for both states and error types that are never presented during training. Furthermore, the deployment of change detection to this industrial application provides valuable insights and considerations into the mechanisms of state-of-the-art change detection algorithms. The code and data generation pipeline are publicly available at: https://timschoonbeek.github.io/error_seg.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Supervised Representation Learning towards Generalizable Assembly State Recognition
Authors:
Tim J. Schoonbeek,
Goutham Balachandran,
Hans Onvlee,
Tim Houben,
Shao-Hsuan Hung,
Jacek Kustra,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors. However, recognizing assembly states poses challenges in scalability, since parts are frequently updated, and the robustness to execution errors remains underexplored. To address these challenges, this paper proposes an approach based on representation learning…
▽ More
Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors. However, recognizing assembly states poses challenges in scalability, since parts are frequently updated, and the robustness to execution errors remains underexplored. To address these challenges, this paper proposes an approach based on representation learning and the novel intermediate-state informed loss function modification (ISIL). ISIL leverages unlabeled transitions between states and demonstrates significant improvements in clustering and classification performance for all tested architectures and losses. Despite being trained exclusively on images without execution errors, thorough analysis on error states demonstrates that our approach accurately distinguishes between correct states and states with various types of execution errors. The integration of the proposed algorithm can offer meaningful assistance to workers and mitigate unexpected losses due to procedural mishaps in industrial settings. The code is available at: https://timschoonbeek.github.io/state_rec
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision
Authors:
Tim J. M. Jaspers,
Ronald L. P. D. de Jong,
Yasmina Al Khalil,
Tijn Zeelenberg,
Carolus H. J. Kusters,
Yiping Li,
Romy C. van Jaarsveld,
Franciscus H. A. Bakker,
Jelle P. Ruurda,
Willem M. Brinkman,
Peter H. N. De With,
Fons van der Sommen
Abstract:
Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been convention…
▽ More
Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been conventionally the norm to achieve high-performing models, recent advancements in self-supervised learning (SSL) have demonstrated superior performance. In medical image analysis, in-domain SSL pretraining has already been shown to outperform ImageNet-based initialization. Although unlabeled data in the field of surgical computer vision is abundant, the diversity within this data is limited. This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications. The obtained results show that using solely procedure-specific data can lead to substantial improvements of 13.8%, 9.5%, and 36.8% compared to ImageNet pretraining. However, extending this data with more heterogeneous surgical data further increases performance by an additional 5.0%, 5.2%, and 2.5%, suggesting that increasing diversity within SSL data is beneficial for model performance. The code and pretrained model weights are made publicly available at https://github.com/TimJaspers0801/SurgeNet.
△ Less
Submitted 26 July, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries
Authors:
Christiaan G. A. Viviers,
Lena Filatova,
Maurice Termeer,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time com…
▽ More
Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time computation. We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems, a novel and general purpose YOLOv5-6D pose architecture for accurate and fast object pose estimation and a complete method for surgical screw pose estimation under acquisition geometry consideration from a monocular cone-beam X-ray image. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. In addition, the method generalizes across varying X-ray acquisition geometry and semantic image complexity to enable accurate pose estimation over different domains. Finally, the proposed approach is tested for bone-screw pose estimation for computer-aided guidance during spine surgeries. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes. The code for YOLOv5-6D is publicly available at https://github.com/cviviers/YOLOv5-6D-Pose
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting
Authors:
Tim J. Schoonbeek,
Tim Houben,
Hans Onvlee,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Although action recognition for procedural tasks has received notable attention, it has a fundamental flaw in that no measure of success for actions is provided. This limits the applicability of such systems especially within the industrial domain, since the outcome of procedural actions is often significantly more important than the mere execution. To address this limitation, we define the novel…
▽ More
Although action recognition for procedural tasks has received notable attention, it has a fundamental flaw in that no measure of success for actions is provided. This limits the applicability of such systems especially within the industrial domain, since the outcome of procedural actions is often significantly more important than the mere execution. To address this limitation, we define the novel task of procedure step recognition (PSR), focusing on recognizing the correct completion and order of procedural steps. Alongside the new task, we also present the multi-modal IndustReal dataset. Unlike currently available datasets, IndustReal contains procedural errors (such as omissions) as well as execution errors. A significant part of these errors are exclusively present in the validation and test sets, making IndustReal suitable to evaluate robustness of algorithms to new, unseen mistakes. Additionally, to encourage reproducibility and allow for scalable approaches trained on synthetic data, the 3D models of all parts are publicly available. Annotations and benchmark performance are provided for action recognition and assembly state detection, as well as the new PSR task. IndustReal, along with the code and model weights, is available at: https://github.com/TimSchoonbeek/IndustReal .
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Segmentation-based Assessment of Tumor-Vessel Involvement for Surgical Resectability Prediction of Pancreatic Ductal Adenocarcinoma
Authors:
Christiaan Viviers,
Mark Ramaekers,
Amaan Valiuddin,
Terese Hellström,
Nick Tasios,
John van der Ven,
Igor Jacobs,
Lotte Ewals,
Joost Nederend,
Peter de With,
Misha Luyer,
Fons van der Sommen
Abstract:
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with limited treatment options. This research proposes a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement, a key factor in determining tumor resectability. Correct assessment of resectability is vital to determine treatment options. The proposed workflow involves processing CT…
▽ More
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with limited treatment options. This research proposes a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement, a key factor in determining tumor resectability. Correct assessment of resectability is vital to determine treatment options. The proposed workflow involves processing CT scans to segment the tumor and vascular structures, analyzing spatial relationships and the extent of vascular involvement, which follows a similar way of working as expert radiologists in PDAC assessment. Three segmentation architectures (nnU-Net, 3D U-Net, and Probabilistic 3D U-Net) achieve a high accuracy in segmenting veins, arteries, and the tumor. The segmentations enable automated detection of tumor involvement with high accuracy (0.88 sensitivity and 0.86 specificity) and automated computation of the degree of tumor-vessel contact. Additionally, due to significant inter-observer variability in these important structures, we present the uncertainty captured by each of the models to further increase insights into the predicted involvement. This result provides clinicians with a clear indication of tumor-vessel involvement and may be used to facilitate more informed decision-making for surgical interventions. The proposed method offers a valuable tool for improving patient outcomes, personalized treatment strategies and survival rates in pancreatic cancer.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Investigating and Improving Latent Density Segmentation Models for Aleatoric Uncertainty Quantification in Medical Imaging
Authors:
M. M. Amaan Valiuddin,
Christiaan G. A. Viviers,
Ruud J. G. van Sloun,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approac…
▽ More
Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.
△ Less
Submitted 20 August, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
A signal processing interpretation of noise-reduction convolutional neural networks
Authors:
Luis A. Zavala-Mondragón,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Encoding-decoding CNNs play a central role in data-driven noise reduction and can be found within numerous deep-learning algorithms. However, the development of these CNN architectures is often done in ad-hoc fashion and theoretical underpinnings for important design choices is generally lacking. Up to this moment there are different existing relevant works that strive to explain the internal oper…
▽ More
Encoding-decoding CNNs play a central role in data-driven noise reduction and can be found within numerous deep-learning algorithms. However, the development of these CNN architectures is often done in ad-hoc fashion and theoretical underpinnings for important design choices is generally lacking. Up to this moment there are different existing relevant works that strive to explain the internal operation of these CNNs. Still, these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience. In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework. By connecting basic principles from signal processing to the field of deep learning, this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Probabilistic 3D segmentation for aleatoric uncertainty quantification in full 3D medical data
Authors:
Christiaan G. A. Viviers,
Amaan M. M. Valiuddin,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Uncertainty quantification in medical images has become an essential addition to segmentation models for practical application in the real world. Although there are valuable developments in accurate uncertainty quantification methods using 2D images and slices of 3D volumes, in clinical practice, the complete 3D volumes (such as CT and MRI scans) are used to evaluate and plan the medical procedure…
▽ More
Uncertainty quantification in medical images has become an essential addition to segmentation models for practical application in the real world. Although there are valuable developments in accurate uncertainty quantification methods using 2D images and slices of 3D volumes, in clinical practice, the complete 3D volumes (such as CT and MRI scans) are used to evaluate and plan the medical procedure. As a result, the existing 2D methods miss the rich 3D spatial information when resolving the uncertainty. A popular approach for quantifying the ambiguity in the data is to learn a distribution over the possible hypotheses. In recent work, this ambiguity has been modeled to be strictly Gaussian. Normalizing Flows (NFs) are capable of modelling more complex distributions and thus, better fit the embedding space of the data. To this end, we have developed a 3D probabilistic segmentation framework augmented with NFs, to enable capturing the distributions of various complexity. To test the proposed approach, we evaluate the model on the LIDC-IDRI dataset for lung nodule segmentation and quantify the aleatoric uncertainty introduced by the multi-annotator setting and inherent ambiguity in the CT data. Following this approach, we are the first to present a 3D Squared Generalized Energy Distance (GED) of 0.401 and a high 0.468 Hungarian-matched 3D IoU. The obtained results reveal the value in capturing the 3D uncertainty, using a flexible posterior distribution augmented with a Normalizing Flow. Finally, we present the aleatoric uncertainty in a visual manner with the aim to provide clinicians with additional insight into data ambiguity and facilitating more informed decision-making.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Towards real-time 6D pose estimation of objects in single-view cone-beam X-ray
Authors:
Christiaan G. A. Viviers,
Joel de Bruijn,
Lena Filatova,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Deep learning-based pose estimation algorithms can successfully estimate the pose of objects in an image, especially in the field of color images. 6D Object pose estimation based on deep learning models for X-ray images often use custom architectures that employ extensive CAD models and simulated data for training purposes. Recent RGB-based methods opt to solve pose estimation problems using small…
▽ More
Deep learning-based pose estimation algorithms can successfully estimate the pose of objects in an image, especially in the field of color images. 6D Object pose estimation based on deep learning models for X-ray images often use custom architectures that employ extensive CAD models and simulated data for training purposes. Recent RGB-based methods opt to solve pose estimation problems using small datasets, making them more attractive for the X-ray domain where medical data is scarcely available. We refine an existing RGB-based model (SingleShotPose) to estimate the 6D pose of a marked cube from grayscale X-ray images by creating a generic solution trained on only real X-ray data and adjusted for X-ray acquisition geometry. The model regresses 2D control points and calculates the pose through 2D/3D correspondences using Perspective-n-Point(PnP), allowing a single trained model to be used across all supporting cone-beam-based X-ray geometries. Since modern X-ray systems continuously adjust acquisition parameters during a procedure, it is essential for such a pose estimation network to consider these parameters in order to be deployed successfully and find a real use case. With a 5-cm/5-degree accuracy of 93% and an average 3D rotation error of 2.2 degrees, the results of the proposed approach are comparable with state-of-the-art alternatives, while requiring significantly less real training examples and being applicable in real-time applications.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Efficient Out-of-Distribution Detection of Melanoma with Wavelet-based Normalizing Flows
Authors:
M. M. Amaan Valiuddin,
Christiaan G. A. Viviers,
Ruud J. G. van Sloun,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Melanoma is a serious form of skin cancer with high mortality rate at later stages. Fortunately, when detected early, the prognosis of melanoma is promising and malignant melanoma incidence rates are relatively low. As a result, datasets are heavily imbalanced which complicates training current state-of-the-art supervised classification AI models. We propose to use generative models to learn the b…
▽ More
Melanoma is a serious form of skin cancer with high mortality rate at later stages. Fortunately, when detected early, the prognosis of melanoma is promising and malignant melanoma incidence rates are relatively low. As a result, datasets are heavily imbalanced which complicates training current state-of-the-art supervised classification AI models. We propose to use generative models to learn the benign data distribution and detect Out-of-Distribution (OOD) malignant images through density estimation. Normalizing Flows (NFs) are ideal candidates for OOD detection due to their ability to compute exact likelihoods. Nevertheless, their inductive biases towards apparent graphical features rather than semantic context hamper accurate OOD detection. In this work, we aim at using these biases with domain-level knowledge of melanoma, to improve likelihood-based OOD detection of malignant images. Our encouraging results demonstrate potential for OOD detection of melanoma using NFs. We achieve a 9% increase in Area Under Curve of the Receiver Operating Characteristics by using wavelet-based NFs. This model requires significantly less parameters for inference making it more applicable on edge devices. The proposed methodology can aid medical experts with diagnosis of skin-cancer patients and continuously increase survival rates. Furthermore, this research paves the way for other areas in oncology with similar data imbalance issues.
△ Less
Submitted 10 August, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
Improved Pancreatic Tumor Detection by Utilizing Clinically-Relevant Secondary Features
Authors:
Christiaan G. A. Viviers,
Mark Ramaekers,
Peter H. N. de With,
Dimitrios Mavroeidis,
Joost Nederend,
Misha Luyer,
Fons van der Sommen
Abstract:
Pancreatic cancer is one of the global leading causes of cancer-related deaths. Despite the success of Deep Learning in computer-aided diagnosis and detection (CAD) methods, little attention has been paid to the detection of Pancreatic Cancer. We propose a method for detecting pancreatic tumor that utilizes clinically-relevant features in the surrounding anatomical structures, thereby better aimin…
▽ More
Pancreatic cancer is one of the global leading causes of cancer-related deaths. Despite the success of Deep Learning in computer-aided diagnosis and detection (CAD) methods, little attention has been paid to the detection of Pancreatic Cancer. We propose a method for detecting pancreatic tumor that utilizes clinically-relevant features in the surrounding anatomical structures, thereby better aiming to exploit the radiologist's knowledge compared to other, conventional deep learning approaches. To this end, we collect a new dataset consisting of 99 cases with pancreatic ductal adenocarcinoma (PDAC) and 97 control cases without any pancreatic tumor. Due to the growth pattern of pancreatic cancer, the tumor may not be always visible as a hypodense lesion, therefore experts refer to the visibility of secondary external features that may indicate the presence of the tumor. We propose a method based on a U-Net-like Deep CNN that exploits the following external secondary features: the pancreatic duct, common bile duct and the pancreas, along with a processed CT scan. Using these features, the model segments the pancreatic tumor if it is present. This segmentation for classification and localization approach achieves a performance of 99% sensitivity (one case missed) and 99% specificity, which realizes a 5% increase in sensitivity over the previous state-of-the-art method. The model additionally provides location information with reasonable accuracy and a shorter inference time compared to previous PDAC detection methods. These results offer a significant performance improvement and highlight the importance of incorporating the knowledge of the clinical expert when developing novel CAD methods.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
Improving Aleatoric Uncertainty Quantification in Multi-Annotated Medical Image Segmentation with Normalizing Flows
Authors:
M. M. A. Valiuddin,
C. G. A. Viviers,
R. J. G. van Sloun,
P. H. N. de With,
F. van der Sommen
Abstract:
Quantifying uncertainty in medical image segmentation applications is essential, as it is often connected to vital decision-making. Compelling attempts have been made in quantifying the uncertainty in image segmentation architectures, e.g. to learn a density segmentation model conditioned on the input image. Typical work in this field restricts these learnt densities to be strictly Gaussian. In th…
▽ More
Quantifying uncertainty in medical image segmentation applications is essential, as it is often connected to vital decision-making. Compelling attempts have been made in quantifying the uncertainty in image segmentation architectures, e.g. to learn a density segmentation model conditioned on the input image. Typical work in this field restricts these learnt densities to be strictly Gaussian. In this paper, we propose to use a more flexible approach by introducing Normalizing Flows (NFs), which enables the learnt densities to be more complex and facilitate more accurate modeling for uncertainty. We prove this hypothesis by adopting the Probabilistic U-Net and augmenting the posterior density with an NF, allowing it to be more expressive. Our qualitative as well as quantitative (GED and IoU) evaluations on the multi-annotated and single-annotated LIDC-IDRI and Kvasir-SEG segmentation datasets, respectively, show a clear improvement. This is mostly apparent in the quantification of aleatoric uncertainty and the increased predictive performance of up to 14 percent. This result strongly indicates that a more flexible density model should be seriously considered in architectures that attempt to capture segmentation ambiguity through density modeling. The benefit of this improved modeling will increase human confidence in annotation and segmentation, and enable eager adoption of the technology in practice.
△ Less
Submitted 5 August, 2021; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Why rankings of biomedical image analysis competitions should be interpreted with care
Authors:
Lena Maier-Hein,
Matthias Eisenmann,
Annika Reinke,
Sinan Onogur,
Marko Stankovic,
Patrick Scholz,
Tal Arbel,
Hrvoje Bogunovic,
Andrew P. Bradley,
Aaron Carass,
Carolin Feldmann,
Alejandro F. Frangi,
Peter M. Full,
Bram van Ginneken,
Allan Hanbury,
Katrin Honauer,
Michal Kozubek,
Bennett A. Landman,
Keno März,
Oskar Maier,
Klaus Maier-Hein,
Bjoern H. Menze,
Henning Müller,
Peter F. Neher,
Wiro Niessen
, et al. (13 additional authors not shown)
Abstract:
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the imp…
▽ More
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.
△ Less
Submitted 18 September, 2019; v1 submitted 6 June, 2018;
originally announced June 2018.
-
Proceedings of Workshop AEW10: Concepts in Information Theory and Communications
Authors:
Kees A. Schouhamer Immink,
Stan Baggen,
Ferdaous Chaabane,
Yanling Chen,
Peter H. N. de With,
Hela Gassara,
Hamed Gharbi,
Adel Ghazel,
Khaled Grati,
Naira M. Grigoryan,
Ashot Harutyunyan,
Masayuki Imanishi,
Mitsugu Iwamoto,
Ken-ichi Iwata,
Hiroshi Kamabe,
Brian M. Kurkoski,
Shigeaki Kuzuoka,
Patrick Langenhuizen,
Jan Lewandowsky,
Akiko Manada,
Shigeki Miyake,
Hiroyoshi Morita,
Jun Muramatsu,
Safa Najjar,
Arnak V. Poghosyan
, et al. (9 additional authors not shown)
Abstract:
The 10th Asia-Europe workshop in "Concepts in Information Theory and Communications" AEW10 was held in Boppard, Germany on June 21-23, 2017. It is based on a longstanding cooperation between Asian and European scientists. The first workshop was held in Eindhoven, the Netherlands in 1989. The idea of the workshop is threefold: 1) to improve the communication between the scientist in the different p…
▽ More
The 10th Asia-Europe workshop in "Concepts in Information Theory and Communications" AEW10 was held in Boppard, Germany on June 21-23, 2017. It is based on a longstanding cooperation between Asian and European scientists. The first workshop was held in Eindhoven, the Netherlands in 1989. The idea of the workshop is threefold: 1) to improve the communication between the scientist in the different parts of the world; 2) to exchange knowledge and ideas; and 3) to pay a tribute to a well respected and special scientist.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.