-
GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset
Authors:
Sahar Nasirihaghighi,
Negin Ghamsarian,
Leonie Peschek,
Matteo Munari,
Heinrich Husslein,
Raphael Sznitman,
Klaus Schoeffmann
Abstract:
Recent advances in deep learning have transformed computer-assisted intervention and surgical video analysis, driving improvements not only in surgical training, intraoperative decision support, and patient outcomes, but also in postoperative documentation and surgical discovery. Central to these developments is the availability of large, high-quality annotated datasets. In gynecologic laparoscopy…
▽ More
Recent advances in deep learning have transformed computer-assisted intervention and surgical video analysis, driving improvements not only in surgical training, intraoperative decision support, and patient outcomes, but also in postoperative documentation and surgical discovery. Central to these developments is the availability of large, high-quality annotated datasets. In gynecologic laparoscopy, surgical scene understanding and action recognition are fundamental for building intelligent systems that assist surgeons during operations and provide deeper analysis after surgery. However, existing datasets are often limited by small scale, narrow task focus, or insufficiently detailed annotations, limiting their utility for comprehensive, end-to-end workflow analysis. To address these limitations, we introduce GynSurg, the largest and most diverse multi-task dataset for gynecologic laparoscopic surgery to date. GynSurg provides rich annotations across multiple tasks, supporting applications in action recognition, semantic segmentation, surgical documentation, and discovery of novel procedural insights. We demonstrate the dataset quality and versatility by benchmarking state-of-the-art models under a standardized training protocol. To accelerate progress in the field, we publicly release the GynSurg dataset and its annotations
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
WetCat: Automating Skill Assessment in Wetlab Cataract Surgery Videos
Authors:
Negin Ghamsarian,
Raphael Sznitman,
Klaus Schoeffmann,
Jens Kowal
Abstract:
To meet the growing demand for systematic surgical training, wetlab environments have become indispensable platforms for hands-on practice in ophthalmology. Yet, traditional wetlab training depends heavily on manual performance evaluations, which are labor-intensive, time-consuming, and often subject to variability. Recent advances in computer vision offer promising avenues for automated skill ass…
▽ More
To meet the growing demand for systematic surgical training, wetlab environments have become indispensable platforms for hands-on practice in ophthalmology. Yet, traditional wetlab training depends heavily on manual performance evaluations, which are labor-intensive, time-consuming, and often subject to variability. Recent advances in computer vision offer promising avenues for automated skill assessment, enhancing both the efficiency and objectivity of surgical education. Despite notable progress in ophthalmic surgical datasets, existing resources predominantly focus on real surgeries or isolated tasks, falling short of supporting comprehensive skill evaluation in controlled wetlab settings. To address these limitations, we introduce WetCat, the first dataset of wetlab cataract surgery videos specifically curated for automated skill assessment. WetCat comprises high-resolution recordings of surgeries performed by trainees on artificial eyes, featuring comprehensive phase annotations and semantic segmentations of key anatomical structures. These annotations are meticulously designed to facilitate skill assessment during the critical capsulorhexis and phacoemulsification phases, adhering to standardized surgical skill assessment frameworks. By focusing on these essential phases, WetCat enables the development of interpretable, AI-driven evaluation tools aligned with established clinical metrics. This dataset lays a strong foundation for advancing objective, scalable surgical education and sets a new benchmark for automated workflow analysis and skill assessment in ophthalmology training. The dataset and annotations are publicly available in Synapse https://www.synapse.org/Synapse:syn66401174/files.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation
Authors:
Negin Ghamsarian,
Sahar Nasirihaghighi,
Klaus Schoeffmann,
Raphael Sznitman
Abstract:
Semi-supervised learning leverages unlabeled data to enhance model performance, addressing the limitations of fully supervised approaches. Among its strategies, pseudo-supervision has proven highly effective, typically relying on one or multiple teacher networks to refine pseudo-labels before training a student network. A common practice in pseudo-supervision is filtering pseudo-labels based on pr…
▽ More
Semi-supervised learning leverages unlabeled data to enhance model performance, addressing the limitations of fully supervised approaches. Among its strategies, pseudo-supervision has proven highly effective, typically relying on one or multiple teacher networks to refine pseudo-labels before training a student network. A common practice in pseudo-supervision is filtering pseudo-labels based on pre-defined confidence thresholds or entropy. However, selecting optimal thresholds requires large labeled datasets, which are often scarce in real-world semi-supervised scenarios. To overcome this challenge, we propose Ensemble-of-Confidence Reinforcement (ENCORE), a dynamic feedback-driven thresholding strategy for pseudo-label selection. Instead of relying on static confidence thresholds, ENCORE estimates class-wise true-positive confidence within the unlabeled dataset and continuously adjusts thresholds based on the model's response to different levels of pseudo-label filtering. This feedback-driven mechanism ensures the retention of informative pseudo-labels while filtering unreliable ones, enhancing model training without manual threshold tuning. Our method seamlessly integrates into existing pseudo-supervision frameworks and significantly improves segmentation performance, particularly in data-scarce conditions. Extensive experiments demonstrate that integrating ENCORE with existing pseudo-supervision frameworks enhances performance across multiple datasets and network architectures, validating its effectiveness in semi-supervised learning.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Dual Invariance Self-training for Reliable Semi-supervised Surgical Phase Recognition
Authors:
Sahar Nasirihaghighi,
Negin Ghamsarian,
Raphael Sznitman,
Klaus Schoeffmann
Abstract:
Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework…
▽ More
Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework, Dual Invariance Self-Training (DIST), that incorporates both Temporal and Transformation Invariance to enhance surgical phase recognition. Our two-step self-training process dynamically selects reliable pseudo-labels, ensuring robust pseudo-supervision. Our approach mitigates the risk of noisy pseudo-labels, steering decision boundaries toward true data distribution and improving generalization to unseen data. Evaluations on Cataract and Cholec80 datasets show our method outperforms state-of-the-art SSL approaches, consistently surpassing both supervised and SSL baselines across various network architectures.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation
Authors:
Javier Gamazo Tejero,
Moritz Schmid,
Pablo Márquez Neila,
Martin S. Zinkernagel,
Sebastian Wolf,
Raphael Sznitman
Abstract:
This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging. Despite the impressive performance of recent foundational segmentation models like SAM on natural images, they struggle with medical domain images. Beyond this, recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable. To address this, we propose a nove…
▽ More
This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging. Despite the impressive performance of recent foundational segmentation models like SAM on natural images, they struggle with medical domain images. Beyond this, recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable. To address this, we propose a novel SAM adapter approach that minimizes the number of trainable parameters while achieving comparable performances to full fine-tuning. The proposed SAM adapter is strategically placed in the mask decoder, offering excellent and broad generalization capabilities and improved segmentation across both fully supervised and test-time domain adaptation tasks. Extensive validation on four datasets showcases the adapter's efficacy, outperforming existing methods while training less than 1% of SAM's total parameters.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation
Authors:
Fei Wu,
Pablo Marquez-Neila,
Hedyeh Rafi-Tarii,
Raphael Sznitman
Abstract:
Multi-class semantic segmentation remains a cornerstone challenge in computer vision. Yet, dataset creation remains excessively demanding in time and effort, especially for specialized domains. Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically. However, existing patch-based AL methods often overlook boundary pixels critical information, essential f…
▽ More
Multi-class semantic segmentation remains a cornerstone challenge in computer vision. Yet, dataset creation remains excessively demanding in time and effort, especially for specialized domains. Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically. However, existing patch-based AL methods often overlook boundary pixels critical information, essential for accurate segmentation. We present OREAL, a novel patch-based AL method designed for multi-class semantic segmentation. OREAL enhances boundary detection by employing maximum aggregation of pixel-wise uncertainty scores. Additionally, we introduce one-vs-rest entropy, a novel uncertainty score function that computes class-wise uncertainties while achieving implicit class balancing during dataset creation. Comprehensive experiments across diverse datasets and model architectures validate our hypothesis.
△ Less
Submitted 16 March, 2025; v1 submitted 9 December, 2024;
originally announced December 2024.
-
Non-Linear Outlier Synthesis for Out-of-Distribution Detection
Authors:
Lars Doorenbos,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
The reliability of supervised classifiers is severely hampered by their limitations in dealing with unexpected inputs, leading to great interest in out-of-distribution (OOD) detection. Recently, OOD detectors trained on synthetic outliers, especially those generated by large diffusion models, have shown promising results in defining robust OOD decision boundaries. Building on this progress, we pre…
▽ More
The reliability of supervised classifiers is severely hampered by their limitations in dealing with unexpected inputs, leading to great interest in out-of-distribution (OOD) detection. Recently, OOD detectors trained on synthetic outliers, especially those generated by large diffusion models, have shown promising results in defining robust OOD decision boundaries. Building on this progress, we present NCIS, which enhances the quality of synthetic outliers by operating directly in the diffusion's model embedding space rather than combining disjoint models as in previous work and by modeling class-conditional manifolds with a conditional volume-preserving network for more expressive characterization of the training distribution. We demonstrate that these improvements yield new state-of-the-art OOD detection results on standard ImageNet100 and CIFAR100 benchmarks and provide insights into the importance of data pre-processing and other key design choices. We make our code available at \url{https://github.com/LarsDoorenbos/NCIS}.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Online 3D reconstruction and dense tracking in endoscopic videos
Authors:
Michel Hayoz,
Christopher Hahne,
Thomas Kurmann,
Max Allan,
Guido Beldi,
Daniel Candinas,
ablo Márquez-Neila,
Raphael Sznitman
Abstract:
3D scene reconstruction from stereo endoscopic video data is crucial for advancing surgical interventions. In this work, we present an online framework for online, dense 3D scene reconstruction and tracking, aimed at enhancing surgical scene understanding and assisting interventions. Our method dynamically extends a canonical scene representation using Gaussian splatting, while modeling tissue def…
▽ More
3D scene reconstruction from stereo endoscopic video data is crucial for advancing surgical interventions. In this work, we present an online framework for online, dense 3D scene reconstruction and tracking, aimed at enhancing surgical scene understanding and assisting interventions. Our method dynamically extends a canonical scene representation using Gaussian splatting, while modeling tissue deformations through a sparse set of control points. We introduce an efficient online fitting algorithm that optimizes the scene parameters, enabling consistent tracking and accurate reconstruction. Through experiments on the StereoMIS dataset, we demonstrate the effectiveness of our approach, outperforming state-of-the-art tracking methods and achieving comparable performance to offline reconstruction techniques. Our work enables various downstream applications thus contributing to advancing the capabilities of surgical assistance systems.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Targeted Visual Prompting for Medical Visual Question Answering
Authors:
Sergio Tascon-Morales,
Pablo Márquez-Neila,
Raphael Sznitman
Abstract:
With growing interest in recent years, medical visual question answering (Med-VQA) has rapidly evolved, with multimodal large language models (MLLMs) emerging as an alternative to classical model architectures. Specifically, their ability to add visual information to the input of pre-trained LLMs brings new capabilities for image interpretation. However, simple visual errors cast doubt on the actu…
▽ More
With growing interest in recent years, medical visual question answering (Med-VQA) has rapidly evolved, with multimodal large language models (MLLMs) emerging as an alternative to classical model architectures. Specifically, their ability to add visual information to the input of pre-trained LLMs brings new capabilities for image interpretation. However, simple visual errors cast doubt on the actual visual understanding abilities of these models. To address this, region-based questions have been proposed as a means to assess and enhance actual visual understanding through compositional evaluation. To combine these two perspectives, this paper introduces targeted visual prompting to equip MLLMs with region-based questioning capabilities. By presenting the model with both the isolated region and the region in its context in a customized visual prompt, we show the effectiveness of our method across multiple datasets while comparing it to several baseline models. Our code and data are available at https://github.com/sergiotasconmorales/locvqallm.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Authors:
Hao Ding,
Yuqian Zhang,
Tuxun Lu,
Ruixing Liang,
Hongchao Shu,
Lalithkumar Seenivasan,
Yonghao Long,
Qi Dou,
Cong Gao,
Yicheng Leng,
Seok Bong Yoo,
Eung-Joo Lee,
Negin Ghamsarian,
Klaus Schoeffmann,
Raphael Sznitman,
Zijian Wu,
Yuxin Chen,
Septimiu E. Salcudean,
Samra Irshad,
Shadi Albarqouni,
Seong Tae Kim,
Yueyi Sun,
An Wang,
Long Bai,
Hongliang Ren
, et al. (17 additional authors not shown)
Abstract:
Surgical data science has seen rapid advancement due to the excellent performance of end-to-end deep neural networks (DNNs) for surgical video analysis. Despite their successes, end-to-end DNNs have been proven susceptible to even minor corruptions, substantially impairing the model's performance. This vulnerability has become a major concern for the translation of cutting-edge technology, especia…
▽ More
Surgical data science has seen rapid advancement due to the excellent performance of end-to-end deep neural networks (DNNs) for surgical video analysis. Despite their successes, end-to-end DNNs have been proven susceptible to even minor corruptions, substantially impairing the model's performance. This vulnerability has become a major concern for the translation of cutting-edge technology, especially for high-stakes decision-making in surgical data science. We introduce SegSTRONG-C, a benchmark and challenge in surgical data science dedicated, aiming to better understand model deterioration under unforeseen but plausible non-adversarial corruption and the capabilities of contemporary methods that seek to improve it. Through comprehensive baseline experiments and participating submissions from widespread community engagement, SegSTRONG-C reveals key themes for model failure and identifies promising directions for improving robustness. The performance of challenge winners, achieving an average 0.9394 DSC and 0.9301 NSD across the unreleased test sets with corruption types: bleeding, smoke, and low brightness, shows inspiring improvement of 0.1471 DSC and 0.2584 NSD in average comparing to strongest baseline methods with UNet architecture trained with AutoAugment. In conclusion, the SegSTRONG-C challenge has identified some practical approaches for enhancing model robustness, yet most approaches relied on conventional techniques that have known, and sometimes quite severe, limitations. Looking ahead, we advocate for expanding intellectual diversity and creativity in non-adversarial robustness beyond data augmentation or training scale, calling for new paradigms that enhance universal robustness to corruptions and may enable richer applications in surgical data science.
△ Less
Submitted 7 April, 2025; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection
Authors:
Lars Doorenbos,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
The inability of deep learning models to handle data drawn from unseen distributions has sparked much interest in unsupervised out-of-distribution (U-OOD) detection, as it is crucial for reliable deep learning models. Despite considerable attention, theoretically-motivated approaches are few and far between, with most methods building on top of some form of heuristic. Recently, U-OOD was formalize…
▽ More
The inability of deep learning models to handle data drawn from unseen distributions has sparked much interest in unsupervised out-of-distribution (U-OOD) detection, as it is crucial for reliable deep learning models. Despite considerable attention, theoretically-motivated approaches are few and far between, with most methods building on top of some form of heuristic. Recently, U-OOD was formalized in the context of data invariants, allowing a clearer understanding of how to characterize U-OOD, and methods leveraging affine invariants have attained state-of-the-art results on large-scale benchmarks. Nevertheless, the restriction to affine invariants hinders the expressiveness of the approach. In this work, we broaden the affine invariants formulation to a more general case and propose a framework consisting of a normalizing flow-like architecture capable of learning non-linear invariants. Our novel approach achieves state-of-the-art results on an extensive U-OOD benchmark, and we demonstrate its further applicability to tabular data. Finally, we show our method has the same desirable properties as those based on affine invariants.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Galaxy spectroscopy without spectra: Galaxy properties from photometric images with conditional diffusion models
Authors:
Lars Doorenbos,
Eva Sextl,
Kevin Heng,
Stefano Cavuoti,
Massimo Brescia,
Olena Torbaniuk,
Giuseppe Longo,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Modern spectroscopic surveys can only target a small fraction of the vast amount of photometrically cataloged sources in wide-field surveys. Here, we report the development of a generative AI method capable of predicting optical galaxy spectra from photometric broad-band images alone. This method draws from the latest advances in diffusion models in combination with contrastive networks. We pass m…
▽ More
Modern spectroscopic surveys can only target a small fraction of the vast amount of photometrically cataloged sources in wide-field surveys. Here, we report the development of a generative AI method capable of predicting optical galaxy spectra from photometric broad-band images alone. This method draws from the latest advances in diffusion models in combination with contrastive networks. We pass multi-band galaxy images into the architecture to obtain optical spectra. From these, robust values for galaxy properties can be derived with any methods in the spectroscopic toolbox, such as standard population synthesis techniques and Lick indices. When trained and tested on 64x64-pixel images from the Sloan Digital Sky Survey, the global bimodality of star-forming and quiescent galaxies in photometric space is recovered, as well as a mass-metallicity relation of star-forming galaxies. The comparison between the observed and the artificially created spectra shows good agreement in overall metallicity, age, Dn4000, stellar velocity dispersion, and E(B-V) values. Photometric redshift estimates of our generative algorithm can compete with other current, specialized deep-learning techniques. Moreover, this work is the first attempt in the literature to infer velocity dispersion from photometric images. Additionally, we can predict the presence of an active galactic nucleus up to an accuracy of 82%. With our method, scientifically interesting galaxy properties, normally requiring spectroscopic inputs, can be obtained in future data sets from large-scale photometric surveys alone. The spectra prediction via AI can further assist in creating realistic mock catalogs.
△ Less
Submitted 28 October, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Iterative Deployment Exposure for Unsupervised Out-of-Distribution Detection
Authors:
Lars Doorenbos,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Deep learning models are vulnerable to performance degradation when encountering out-of-distribution (OOD) images, potentially leading to misdiagnoses and compromised patient care. These shortcomings have led to great interest in the field of OOD detection. Existing unsupervised OOD (U-OOD) detection methods typically assume that OOD samples originate from an unconcentrated distribution complement…
▽ More
Deep learning models are vulnerable to performance degradation when encountering out-of-distribution (OOD) images, potentially leading to misdiagnoses and compromised patient care. These shortcomings have led to great interest in the field of OOD detection. Existing unsupervised OOD (U-OOD) detection methods typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution, neglecting the reality that deployed models passively accumulate task-specific OOD samples over time. To better reflect this real-world scenario, we introduce Iterative Deployment Exposure (IDE), a novel and more realistic setting for U-OOD detection. We propose CSO, a method for IDE that starts from a U-OOD detector that is agnostic to the OOD distribution and slowly refines it during deployment using observed unlabeled data. CSO uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach, along with a novel confidence-scaled few-shot OOD detector to effectively learn from limited OOD examples. We validate our approach on a dedicated benchmark, showing that our method greatly improves upon strong baselines on three medical imaging modalities.
△ Less
Submitted 19 May, 2025; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Masked Image Modelling for retinal OCT understanding
Authors:
Theodoros Pissas,
Pablo Márquez-Neila,
Sebastian Wolf,
Martin Zinkernagel,
Raphael Sznitman
Abstract:
This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage Masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide…
▽ More
This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage Masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide the first extensive evaluation for a model of OCT on a challenging battery of 6 downstream tasks. Our model achieves strong performance when fully finetuned but can also serve as a versatile frozen feature extractor for many tasks using lightweight adapters. Furthermore, we propose an extension of the MAE pretraining to fuse OCT with an auxiliary modality, namely, IR fundus images and learn a joint model for both. We demonstrate our approach improves performance on a multimodal downstream application. Our experiments utilize most publicly available OCT datasets, thus enabling future comparisons. Our code and model weights are publicly available https://github.com/TheoPis/MIM_OCT.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection
Authors:
Negin Ghamsarian,
Yosuf El-Shabrawi,
Sahar Nasirihaghighi,
Doris Putzgruber-Adamitsch,
Martin Zinkernagel,
Sebastian Wolf,
Klaus Schoeffmann,
Raphael Sznitman
Abstract:
In recent years, the landscape of computer-assisted interventions and post-operative surgical video analysis has been dramatically reshaped by deep-learning techniques, resulting in significant advancements in surgeons' skills, operation room management, and overall surgical outcomes. However, the progression of deep-learning-powered surgical technologies is profoundly reliant on large-scale datas…
▽ More
In recent years, the landscape of computer-assisted interventions and post-operative surgical video analysis has been dramatically reshaped by deep-learning techniques, resulting in significant advancements in surgeons' skills, operation room management, and overall surgical outcomes. However, the progression of deep-learning-powered surgical technologies is profoundly reliant on large-scale datasets and annotations. Particularly, surgical scene understanding and phase recognition stand as pivotal pillars within the realm of computer-assisted surgery and post-operative assessment of cataract surgery videos. In this context, we present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis and detecting post-operative irregularities in cataract surgery. We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures for phase recognition and surgical scene segmentation. Besides, we initiate the research on domain adaptation for instrument segmentation in cataract surgery by evaluating cross-domain instrument segmentation performance in cataract surgery videos. The dataset and annotations will be publicly available upon acceptance of the paper.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
DeepPyramid+: Medical Image Segmentation using Pyramid View Fusion and Deformable Pyramid Reception
Authors:
Negin Ghamsarian,
Sebastian Wolf,
Martin Zinkernagel,
Klaus Schoeffmann,
Raphael Sznitman
Abstract:
Semantic Segmentation plays a pivotal role in many applications related to medical image and video analysis. However, designing a neural network architecture for medical image and surgical video segmentation is challenging due to the diverse features of relevant classes, including heterogeneity, deformability, transparency, blunt boundaries, and various distortions. We propose a network architectu…
▽ More
Semantic Segmentation plays a pivotal role in many applications related to medical image and video analysis. However, designing a neural network architecture for medical image and surgical video segmentation is challenging due to the diverse features of relevant classes, including heterogeneity, deformability, transparency, blunt boundaries, and various distortions. We propose a network architecture, DeepPyramid+, which addresses diverse challenges encountered in medical image and surgical video segmentation. The proposed DeepPyramid+ incorporates two major modules, namely "Pyramid View Fusion" (PVF) and "Deformable Pyramid Reception," (DPR), to address the outlined challenges. PVF replicates a deduction process within the neural network, aligning with the human visual system, thereby enhancing the representation of relative information at each pixel position. Complementarily, DPR introduces shape- and scale-adaptive feature extraction techniques using dilated deformable convolutions, enhancing accuracy and robustness in handling heterogeneous classes and deformable shapes. Extensive experiments conducted on diverse datasets, including endometriosis videos, MRI images, OCT scans, and cataract and laparoscopy videos, demonstrate the effectiveness of DeepPyramid+ in handling various challenges such as shape and scale variation, reflection, and blur degradation. DeepPyramid+ demonstrates significant improvements in segmentation performance, achieving up to a 3.65% increase in Dice coefficient for intra-domain segmentation and up to a 17% increase in Dice coefficient for cross-domain segmentation. DeepPyramid+ consistently outperforms state-of-the-art networks across diverse modalities considering different backbone networks, showcasing its versatility.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Predicting Postoperative Intraocular Lens Dislocation in Cataract Surgery via Deep Learning
Authors:
Negin Ghamsarian,
Doris Putzgruber-Adamitsch,
Stephanie Sarny,
Raphael Sznitman,
Klaus Schoeffmann,
Yosuf El-Shabrawi
Abstract:
A critical yet unpredictable complication following cataract surgery is intraocular lens dislocation. Postoperative stability is imperative, as even a tiny decentration of multifocal lenses or inadequate alignment of the torus in toric lenses due to postoperative rotation can lead to a significant drop in visual acuity. Investigating possible intraoperative indicators that can predict post-surgica…
▽ More
A critical yet unpredictable complication following cataract surgery is intraocular lens dislocation. Postoperative stability is imperative, as even a tiny decentration of multifocal lenses or inadequate alignment of the torus in toric lenses due to postoperative rotation can lead to a significant drop in visual acuity. Investigating possible intraoperative indicators that can predict post-surgical instabilities of intraocular lenses can help prevent this complication. In this paper, we develop and evaluate the first fully-automatic framework for the computation of lens unfolding delay, rotation, and instability during surgery. Adopting a combination of three types of CNNs, namely recurrent, region-based, and pixel-based, the proposed framework is employed to assess the possibility of predicting post-operative lens dislocation during cataract surgery. This is achieved via performing a large-scale study on the statistical differences between the behavior of different brands of intraocular lenses and aligning the results with expert surgeons' hypotheses and observations about the lenses. We exploit a large-scale dataset of cataract surgery videos featuring four intraocular lens brands. Experimental results confirm the reliability of the proposed framework in evaluating the lens' statistics during the surgery. The Pearson correlation and t-test results reveal significant correlations between lens unfolding delay and lens rotation and significant differences between the intra-operative rotations stability of four groups of lenses. These results suggest that the proposed framework can help surgeons select the lenses based on the patient's eye conditions and predict post-surgical lens dislocation.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Correlation-aware active learning for surgery video segmentation
Authors:
Fei Wu,
Pablo Marquez-Neila,
Mingyi Zheng,
Hedyeh Rafii-Tari,
Raphael Sznitman
Abstract:
Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, i…
▽ More
Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, it is important to consider the model uncertainty and the temporal nature of the sequences when selecting images for annotation. This work proposes a novel AL strategy for surgery video segmentation, COWAL, COrrelation-aWare Active Learning. Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames. We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets. The datasets and code will be made publicly available upon receiving necessary approvals.
△ Less
Submitted 11 December, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Learning Super-Resolution Ultrasound Localization Microscopy from Radio-Frequency Data
Authors:
Christopher Hahne,
Georges Chabouh,
Olivier Couture,
Raphael Sznitman
Abstract:
Ultrasound Localization Microscopy (ULM) enables imaging of vascular structures in the micrometer range by accumulating contrast agent particle locations over time. Precise and efficient target localization accuracy remains an active research topic in the ULM field to further push the boundaries of this promising medical imaging technology. Existing work incorporates Delay-And-Sum (DAS) beamformin…
▽ More
Ultrasound Localization Microscopy (ULM) enables imaging of vascular structures in the micrometer range by accumulating contrast agent particle locations over time. Precise and efficient target localization accuracy remains an active research topic in the ULM field to further push the boundaries of this promising medical imaging technology. Existing work incorporates Delay-And-Sum (DAS) beamforming into particle localization pipelines, which ultimately determines the ULM image resolution capability. In this paper we propose to feed unprocessed Radio-Frequency (RF) data into a super-resolution network while bypassing DAS beamforming and its limitations. To facilitate this, we demonstrate label projection and inverse point transformation between B-mode and RF coordinate space as required by our approach. We assess our method against state-of-the-art techniques based on a public dataset featuring in silico and in vivo data. Results from our RF-trained network suggest that excluding DAS beamforming offers a great potential to optimize on the ULM resolution performance.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
RF-ULM: Ultrasound Localization Microscopy Learned from Radio-Frequency Wavefronts
Authors:
Christopher Hahne,
Georges Chabouh,
Arthur Chavignon,
Olivier Couture,
Raphael Sznitman
Abstract:
In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across a series of beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) channel data, while its implications for localization remain largely u…
▽ More
In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across a series of beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) channel data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF channel data. Our approach involves a custom super-resolution DNN using learned feature channel shuffling, non-maximum suppression, and a semi-global convolutional block for reliable and accurate wavefront localization. Additionally, we introduce a geometric point transformation that facilitates seamless mapping to the B-mode coordinate space. To understand the impact of beamforming on ULM, we validate the effectiveness of our method by conducting an extensive comparison with State-Of-The-Art (SOTA) techniques. We present the inaugural in vivo results from a wavefront-localizing DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain shift between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at https://github.com/hahnec/rf-ulm.
△ Less
Submitted 5 April, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Hyperbolic Random Forests
Authors:
Lars Doorenbos,
Pablo Márquez-Neila,
Raphael Sznitman,
Pascal Mettes
Abstract:
Hyperbolic space is becoming a popular choice for representing data due to the hierarchical structure - whether implicit or explicit - of many real-world datasets. Along with it comes a need for algorithms capable of solving fundamental tasks, such as classification, in hyperbolic space. Recently, multiple papers have investigated hyperbolic alternatives to hyperplane-based classifiers, such as lo…
▽ More
Hyperbolic space is becoming a popular choice for representing data due to the hierarchical structure - whether implicit or explicit - of many real-world datasets. Along with it comes a need for algorithms capable of solving fundamental tasks, such as classification, in hyperbolic space. Recently, multiple papers have investigated hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and SVMs. While effective, these approaches struggle with more complex hierarchical data. We, therefore, propose to generalize the well-known random forests to hyperbolic space. We do this by redefining the notion of a split using horospheres. Since finding the globally optimal split is computationally intractable, we find candidate horospheres through a large-margin classifier. To make hyperbolic random forests work on multi-class data and imbalanced experiments, we furthermore outline a new method for combining classes based on their lowest common ancestor and a class-balanced version of the large-margin loss. Experiments on standard and new benchmarks show that our approach outperforms both conventional random forest algorithms and recent hyperbolic classifiers.
△ Less
Submitted 24 June, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
StofNet: Super-resolution Time of Flight Network
Authors:
Christopher Hahne,
Michel Hayoz,
Raphael Sznitman
Abstract:
Time of Flight (ToF) is a prevalent depth sensing technology in the fields of robotics, medical imaging, and non-destructive testing. Yet, ToF sensing faces challenges from complex ambient conditions making an inverse modelling from the sparse temporal information intractable. This paper highlights the potential of modern super-resolution techniques to learn varying surroundings for a reliable and…
▽ More
Time of Flight (ToF) is a prevalent depth sensing technology in the fields of robotics, medical imaging, and non-destructive testing. Yet, ToF sensing faces challenges from complex ambient conditions making an inverse modelling from the sparse temporal information intractable. This paper highlights the potential of modern super-resolution techniques to learn varying surroundings for a reliable and accurate ToF detection. Unlike existing models, we tailor an architecture for sub-sample precise semi-global signal localization by combining super-resolution with an efficient residual contraction block to balance between fine signal details and large scale contextual information. We consolidate research on ToF by conducting a benchmark comparison against six state-of-the-art methods for which we employ two publicly available datasets. This includes the release of our SToF-Chirp dataset captured by an airborne ultrasound transducer. Results showcase the superior performance of our proposed StofNet in terms of precision, reliability and model complexity. Our code is available at https://github.com/hahnec/stofnet.
△ Less
Submitted 23 December, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Domain Adaptation for Medical Image Segmentation using Transformation-Invariant Self-Training
Authors:
Negin Ghamsarian,
Javier Gamazo Tejero,
Pablo Márquez Neila,
Sebastian Wolf,
Martin Zinkernagel,
Klaus Schoeffmann,
Raphael Sznitman
Abstract:
Models capable of leveraging unlabelled data are crucial in overcoming large distribution gaps between the acquired datasets across different imaging devices and configurations. In this regard, self-training techniques based on pseudo-labeling have been shown to be highly effective for semi-supervised domain adaptation. However, the unreliability of pseudo labels can hinder the capability of self-…
▽ More
Models capable of leveraging unlabelled data are crucial in overcoming large distribution gaps between the acquired datasets across different imaging devices and configurations. In this regard, self-training techniques based on pseudo-labeling have been shown to be highly effective for semi-supervised domain adaptation. However, the unreliability of pseudo labels can hinder the capability of self-training techniques to induce abstract representation from the unlabeled target dataset, especially in the case of large distribution gaps. Since the neural network performance should be invariant to image transformations, we look to this fact to identify uncertain pseudo labels. Indeed, we argue that transformation invariant detections can provide more reasonable approximations of ground truth. Accordingly, we propose a semi-supervised learning strategy for domain adaptation termed transformation-invariant self-training (TI-ST). The proposed method assesses pixel-wise pseudo-labels' reliability and filters out unreliable detections during self-training. We perform comprehensive evaluations for domain adaptation using three different modalities of medical images, two different network architectures, and several alternative state-of-the-art domain adaptation methods. Experimental results confirm the superiority of our proposed method in mitigating the lack of target domain annotation and boosting segmentation performance in the target domain.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading
Authors:
Tatiana Fountoukidou,
Raphael Sznitman
Abstract:
Recent advances in machine learning models have greatly increased the performance of automated methods in medical image analysis. However, the internal functioning of such models is largely hidden, which hinders their integration in clinical practice. Explainability and trust are viewed as important aspects of modern methods, for the latter's widespread use in clinical communities. As such, valida…
▽ More
Recent advances in machine learning models have greatly increased the performance of automated methods in medical image analysis. However, the internal functioning of such models is largely hidden, which hinders their integration in clinical practice. Explainability and trust are viewed as important aspects of modern methods, for the latter's widespread use in clinical communities. As such, validation of machine learning models represents an important aspect and yet, most methods are only validated in a limited way. In this work, we focus on providing a richer and more appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms. To better understand the performance of these methods, which answer arbitrary questions related to images, this work focuses on an automatic visual Turing test (VTT). That is, we propose an automatic adaptive questioning method, that aims to expose the reasoning behavior of a VQA algorithm. Specifically, we introduce a reinforcement learning (RL) agent that observes the history of previously asked questions, and uses it to select the next question to pose. We demonstrate our approach in the context of evaluating algorithms that automatically answer questions related to diabetic macular edema (DME) grading. The experiments show that such an agent has similar behavior to a clinician, whereby asking questions that are relevant to key clinical concepts.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Localized Questions in Medical Visual Question Answering
Authors:
Sergio Tascon-Morales,
Pablo Márquez-Neila,
Raphael Sznitman
Abstract:
Visual Question Answering (VQA) models aim to answer natural language questions about given images. Due to its ability to ask questions that differ from those used when training the model, medical VQA has received substantial attention in recent years. However, existing medical VQA models typically focus on answering questions that refer to an entire image rather than where the relevant content ma…
▽ More
Visual Question Answering (VQA) models aim to answer natural language questions about given images. Due to its ability to ask questions that differ from those used when training the model, medical VQA has received substantial attention in recent years. However, existing medical VQA models typically focus on answering questions that refer to an entire image rather than where the relevant content may be located in the image. Consequently, VQA models are limited in their interpretability power and the possibility to probe the model about specific image regions. This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions while considering the context necessary to answer the questions. Our experimental results demonstrate the effectiveness of our proposed model, outperforming existing methods on three datasets. Our code and data are available at https://github.com/sergiotasconmorales/locvqa.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Geometric Ultrasound Localization Microscopy
Authors:
Christopher Hahne,
Raphael Sznitman
Abstract:
Contrast-Enhanced Ultra-Sound (CEUS) has become a viable method for non-invasive, dynamic visualization in medical diagnostics, yet Ultrasound Localization Microscopy (ULM) has enabled a revolutionary breakthrough by offering ten times higher resolution. To date, Delay-And-Sum (DAS) beamformers are used to render ULM frames, ultimately determining the image resolution capability. To take full adva…
▽ More
Contrast-Enhanced Ultra-Sound (CEUS) has become a viable method for non-invasive, dynamic visualization in medical diagnostics, yet Ultrasound Localization Microscopy (ULM) has enabled a revolutionary breakthrough by offering ten times higher resolution. To date, Delay-And-Sum (DAS) beamformers are used to render ULM frames, ultimately determining the image resolution capability. To take full advantage of ULM, this study questions whether beamforming is the most effective processing step for ULM, suggesting an alternative approach that relies solely on Time-Difference-of-Arrival (TDoA) information. To this end, a novel geometric framework for micro bubble localization via ellipse intersections is proposed to overcome existing beamforming limitations. We present a benchmark comparison based on a public dataset for which our geometric ULM outperforms existing baseline methods in terms of accuracy and robustness while only utilizing a portion of the available transducer data.
△ Less
Submitted 18 July, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Learning How To Robustly Estimate Camera Pose in Endoscopic Videos
Authors:
Michel Hayoz,
Christopher Hahne,
Mathias Gallardo,
Daniel Candinas,
Thomas Kurmann,
Maximilian Allan,
Raphael Sznitman
Abstract:
Purpose: Surgical scene understanding plays a critical role in the technology stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. Method: We propose a solution for stereo endoscopes that estimates depth and o…
▽ More
Purpose: Surgical scene understanding plays a critical role in the technology stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. Method: We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep-learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. Results: Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. Conclusion: We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally-invasive surgery.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Unsupervised out-of-distribution detection for safer robotically guided retinal microsurgery
Authors:
Alain Jungo,
Lars Doorenbos,
Tommaso Da Col,
Maarten Beelen,
Martin Zinkernagel,
Pablo Márquez-Neila,
Raphael Sznitman
Abstract:
Purpose: A fundamental problem in designing safe machine learning systems is identifying when samples presented to a deployed model differ from those observed at training time. Detecting so-called out-of-distribution (OoD) samples is crucial in safety-critical applications such as robotically guided retinal microsurgery, where distances between the instrument and the retina are derived from sequen…
▽ More
Purpose: A fundamental problem in designing safe machine learning systems is identifying when samples presented to a deployed model differ from those observed at training time. Detecting so-called out-of-distribution (OoD) samples is crucial in safety-critical applications such as robotically guided retinal microsurgery, where distances between the instrument and the retina are derived from sequences of 1D images that are acquired by an instrument-integrated optical coherence tomography (iiOCT) probe.
Methods: This work investigates the feasibility of using an OoD detector to identify when images from the iiOCT probe are inappropriate for subsequent machine learning-based distance estimation. We show how a simple OoD detector based on the Mahalanobis distance can successfully reject corrupted samples coming from real-world ex vivo porcine eyes.
Results: Our results demonstrate that the proposed approach can successfully detect OoD samples and help maintain the performance of the downstream task within reasonable levels. MahaAD outperformed a supervised approach trained on the same kind of corruptions and achieved the best performance in detecting OoD cases from a collection of iiOCT samples with real-world corruptions.
Conclusion: The results indicate that detecting corrupted iiOCT data through OoD detection is feasible and does not need prior knowledge of possible corruptions. Consequently, MahaAD could aid in ensuring patient safety during robotically guided microsurgery by preventing deployed prediction models from estimating distances that put the patient at risk.
△ Less
Submitted 3 May, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns
Authors:
Javier Gamazo Tejero,
Martin S. Zinkernagel,
Sebastian Wolf,
Raphael Sznitman,
Pablo Márquez Neila
Abstract:
Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly. For segmentation applications, the burden is particularly high as manual delineations of relevant image content are often extremely expensive or can only be done by experts with domain-specific knowledge. Thanks to developments in transfer learning and training with weak supervision, segmentation models can…
▽ More
Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly. For segmentation applications, the burden is particularly high as manual delineations of relevant image content are often extremely expensive or can only be done by experts with domain-specific knowledge. Thanks to developments in transfer learning and training with weak supervision, segmentation models can now also greatly benefit from annotations of different kinds. However, for any new domain application looking to use weak supervision, the dataset builder still needs to define a strategy to distribute full segmentation and other weak annotations. Doing so is challenging, however, as it is a priori unknown how to distribute an annotation budget for a given new dataset. To this end, we propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. To do so, our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions by modeling the expected improvement of the final segmentation model. We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Logical Implications for Visual Question Answering Consistency
Authors:
Sergio Tascon-Morales,
Pablo Márquez-Neila,
Raphael Sznitman
Abstract:
Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce model consistency. Instead, we propose a novel strategy intended to improve model performance by di…
▽ More
Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce model consistency. Instead, we propose a novel strategy intended to improve model performance by directly reducing logical inconsistencies. To do this, we introduce a new consistency loss term that can be used by a wide range of the VQA models and which relies on knowing the logical relation between pairs of questions and answers. While such information is typically not available in VQA datasets, we propose to infer these logical relations using a dedicated language model and use these in our proposed consistency loss function. We conduct extensive experiments on the VQA Introspect and DME datasets and show that our method brings improvements to state-of-the-art VQA models, while being robust across different architectures and settings.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Stochastic Segmentation with Conditional Categorical Diffusion Models
Authors:
Lukas Zbinden,
Lars Doorenbos,
Theodoros Pissas,
Adrian Thomas Huber,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true…
▽ More
Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes.
△ Less
Submitted 11 September, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
ULISSE: A Tool for One-shot Sky Exploration and its Application to Active Galactic Nuclei Detection
Authors:
Lars Doorenbos,
Olena Torbaniuk,
Stefano Cavuoti,
Maurizio Paolillo,
Giuseppe Longo,
Massimo Brescia,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time-consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods. We propose ULISSE, a new deep learning tool that, starting from a single pr…
▽ More
Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time-consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods. We propose ULISSE, a new deep learning tool that, starting from a single prototype object, is capable of identifying objects sharing the same morphological and photometric properties, and hence of creating a list of candidate sosia. In this work, we focus on applying our method to the detection of AGN candidates in a Sloan Digital Sky Survey galaxy sample, since the identification and classification of Active Galactic Nuclei (AGN) in the optical band still remains a challenging task in extragalactic astronomy. Intended for the initial exploration of large sky surveys, ULISSE directly uses features extracted from the ImageNet dataset to perform a similarity search. The method is capable of rapidly identifying a list of candidates, starting from only a single image of a given prototype, without the need for any time-consuming neural network training. Our experiments show ULISSE is able to identify AGN candidates based on a combination of host galaxy morphology, color and the presence of a central nuclear source, with a retrieval efficiency ranging from 21% to 65% (including composite sources) depending on the prototype, where the random guess baseline is 12%. We find ULISSE to be most effective in retrieving AGN in early-type host galaxies, as opposed to prototypes with spiral- or late-type properties. Based on the results described in this work, ULISSE can be a promising tool for selecting different types of astrophysical objects in current and future wide-field surveys (e.g. Euclid, LSST etc.) that target millions of sources every single night.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
DeepPyramid: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos
Authors:
Negin Ghamsarian,
Mario Taschwer,
Raphael Sznitman,
Klaus Schoeffmann
Abstract:
Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant structures in these surgeries make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network, termed DeepPyramid, that can deal with thes…
▽ More
Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant structures in these surgeries make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network, termed DeepPyramid, that can deal with these challenges using three novelties: (1) a Pyramid View Fusion module which provides a varying-angle global view of the surrounding region centering at each pixel position in the input convolutional feature map; (2) a Deformable Pyramid Reception module which enables a wide deformable receptive field that can adapt to geometric transformations in the object of interest; and (3) a dedicated Pyramid Loss that adaptively supervises multi-scale semantic feature maps. Combined, we show that these modules can effectively boost semantic segmentation performance, especially in the case of transparency, deformability, scalability, and blunt edges in objects. We demonstrate that our approach performs at a state-of-the-art level and outperforms a number of existing methods with a large margin (3.66% overall improvement in intersection over union compared to the best rival approach).
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Consistency-preserving Visual Question Answering in Medical Imaging
Authors:
Sergio Tascon-Morales,
Pablo Márquez-Neila,
Raphael Sznitman
Abstract:
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitatio…
▽ More
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitations, answer consistency has been overlooked even though it plays a critical role in establishing trustworthy models. In this work, we propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Specifically, we consider the case where implications between perception and reasoning questions are known a-priori. To show the benefits of our approach, we evaluate it on the clinically relevant task of Diabetic Macular Edema (DME) staging from fundus imaging. Our experiments show that our method outperforms state-of-the-art baselines, not only by improving model consistency, but also in terms of overall model accuracy. Our code and data are available at https://github.com/sergiotasconmorales/consistency_vqa.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Data Invariants to Understand Unsupervised Out-of-Distribution Detection
Authors:
Lars Doorenbos,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Unsupervised out-of-distribution (U-OOD) detection has recently attracted much attention due its importance in mission-critical systems and broader applicability over its supervised counterpart. Despite this increase in attention, U-OOD methods suffer from important shortcomings. By performing a large-scale evaluation on different benchmarks and image modalities, we show in this work that most pop…
▽ More
Unsupervised out-of-distribution (U-OOD) detection has recently attracted much attention due its importance in mission-critical systems and broader applicability over its supervised counterpart. Despite this increase in attention, U-OOD methods suffer from important shortcomings. By performing a large-scale evaluation on different benchmarks and image modalities, we show in this work that most popular state-of-the-art methods are unable to consistently outperform a simple anomaly detector based on pre-trained features and the Mahalanobis distance (MahaAD). A key reason for the inconsistencies of these methods is the lack of a formal description of U-OOD. Motivated by a simple thought experiment, we propose a characterization of U-OOD based on the invariants of the training dataset. We show how this characterization is unknowingly embodied in the top-scoring MahaAD method, thereby explaining its quality. Furthermore, our approach can be used to interpret predictions of U-OOD detectors and provides insights into good practices for evaluating future U-OOD methods.
△ Less
Submitted 21 July, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
A Positive/Unlabeled Approach for the Segmentation of Medical Sequences using Point-Wise Supervision
Authors:
Laurent Lejeune,
Raphael Sznitman
Abstract:
The ability to quickly annotate medical imaging data plays a critical role in training deep learning frameworks for segmentation. Doing so for image volumes or video sequences is even more pressing as annotating these is particularly burdensome. To alleviate this problem, this work proposes a new method to efficiently segment medical imaging volumes or videos using point-wise annotations only. Thi…
▽ More
The ability to quickly annotate medical imaging data plays a critical role in training deep learning frameworks for segmentation. Doing so for image volumes or video sequences is even more pressing as annotating these is particularly burdensome. To alleviate this problem, this work proposes a new method to efficiently segment medical imaging volumes or videos using point-wise annotations only. This allows annotations to be collected extremely quickly and remains applicable to numerous segmentation tasks. Our approach trains a deep learning model using an appropriate Positive/Unlabeled objective function using sparse point-wise annotations. While most methods of this kind assume that the proportion of positive samples in the data is known a-priori, we introduce a novel self-supervised method to estimate this prior efficiently by combining a Bayesian estimation framework and new stopping criteria. Our method iteratively estimates appropriate class priors and yields high segmentation quality for a variety of object types and imaging modalities. In addition, by leveraging a spatio-temporal tracking framework, we regularize our predictions by leveraging the complete data volume. We show experimentally that our approach outperforms state-of-the-art methods tailored to the same problem.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
CataNet: Predicting remaining cataract surgery duration
Authors:
Andrés Marafioti,
Michel Hayoz,
Mathias Gallardo,
Pablo Márquez Neila,
Sebastian Wolf,
Martin Zinkernagel,
Raphael Sznitman
Abstract:
Cataract surgery is a sight saving surgery that is performed over 10 million times each year around the world. With such a large demand, the ability to organize surgical wards and operating rooms efficiently is critical to delivery this therapy in routine clinical care. In this context, estimating the remaining surgical duration (RSD) during procedures is one way to help streamline patient through…
▽ More
Cataract surgery is a sight saving surgery that is performed over 10 million times each year around the world. With such a large demand, the ability to organize surgical wards and operating rooms efficiently is critical to delivery this therapy in routine clinical care. In this context, estimating the remaining surgical duration (RSD) during procedures is one way to help streamline patient throughput and workflows. To this end, we propose CataNet, a method for cataract surgeries that predicts in real time the RSD jointly with two influential elements: the surgeon's experience, and the current phase of the surgery. We compare CataNet to state-of-the-art RSD estimation methods, showing that it outperforms them even when phase and experience are not considered. We investigate this improvement and show that a significant contributor is the way we integrate the elapsed time into CataNet's feature extractor.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Stereo Correspondence and Reconstruction of Endoscopic Data Challenge
Authors:
Max Allan,
Jonathan Mcleod,
Congcong Wang,
Jean Claude Rosenthal,
Zhenglei Hu,
Niklas Gard,
Peter Eisert,
Ke Xue Fu,
Trevor Zeffiro,
Wenyao Xia,
Zhanshi Zhu,
Huoling Luo,
Fucang Jia,
Xiran Zhang,
Xiaohong Li,
Lalith Sharan,
Tom Kurmann,
Sebastian Schmid,
Raphael Sznitman,
Dimitris Psychogyios,
Mahdi Azizian,
Danail Stoyanov,
Lena Maier-Hein,
Stefanie Speidel
Abstract:
The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day.…
▽ More
The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.
△ Less
Submitted 28 January, 2021; v1 submitted 4 January, 2021;
originally announced January 2021.
-
Surgical Data Science -- from Concepts toward Clinical Translation
Authors:
Lena Maier-Hein,
Matthias Eisenmann,
Duygu Sarikaya,
Keno März,
Toby Collins,
Anand Malpani,
Johannes Fallert,
Hubertus Feussner,
Stamatia Giannarou,
Pietro Mascagni,
Hirenkumar Nakawala,
Adrian Park,
Carla Pugh,
Danail Stoyanov,
Swaroop S. Vedula,
Kevin Cleary,
Gabor Fichtinger,
Germain Forestier,
Bernard Gibaud,
Teodor Grantcharov,
Makoto Hashizume,
Doreen Heckmann-Nötzel,
Hannes G. Kenngott,
Ron Kikinis,
Lars Mündermann
, et al. (25 additional authors not shown)
Abstract:
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applica…
▽ More
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.
△ Less
Submitted 30 July, 2021; v1 submitted 30 October, 2020;
originally announced November 2020.
-
A Question-Centric Model for Visual Question Answering in Medical Imaging
Authors:
Minh H. Vu,
Tommy Löfstedt,
Tufve Nyholm,
Raphael Sznitman
Abstract:
Deep learning methods have proven extremely effective at performing a variety of medical image analysis tasks. With their potential use in clinical routine, their lack of transparency has however been one of their few weak points, raising concerns regarding their behavior and failure modes. While most research to infer model behavior has focused on indirect strategies that estimate prediction unce…
▽ More
Deep learning methods have proven extremely effective at performing a variety of medical image analysis tasks. With their potential use in clinical routine, their lack of transparency has however been one of their few weak points, raising concerns regarding their behavior and failure modes. While most research to infer model behavior has focused on indirect strategies that estimate prediction uncertainties and visualize model support in the input image space, the ability to explicitly query a prediction model regarding its image content offers a more direct way to determine the behavior of trained models. To this end, we present a novel Visual Question Answering approach that allows an image to be queried by means of a written question. Experiments on a variety of medical and natural image datasets show that by fusing image and question features in a novel way, the proposed approach achieves an equal or higher accuracy compared to current methods.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Fused Detection of Retinal Biomarkers in OCT Volumes
Authors:
Thomas Kurmann,
Pablo Márquez-Neila,
Siqing Yu,
Marion Munk,
Sebastian Wolf,
Raphael Sznitman
Abstract:
Optical Coherence Tomography (OCT) is the primary imaging modality for detecting pathological biomarkers associated to retinal diseases such as Age-Related Macular Degeneration. In practice, clinical diagnosis and treatment strategies are closely linked to biomarkers visible in OCT volumes and the ability to identify these plays an important role in the development of ophthalmic pharmaceutical pro…
▽ More
Optical Coherence Tomography (OCT) is the primary imaging modality for detecting pathological biomarkers associated to retinal diseases such as Age-Related Macular Degeneration. In practice, clinical diagnosis and treatment strategies are closely linked to biomarkers visible in OCT volumes and the ability to identify these plays an important role in the development of ophthalmic pharmaceutical products. In this context, we present a method that automatically predicts the presence of biomarkers in OCT cross-sections by incorporating information from the entire volume. We do so by adding a bidirectional LSTM to fuse the outputs of a Convolutional Neural Network that predicts individual biomarkers. We thus avoid the need to use pixel-wise annotations to train our method, and instead provide fine-grained biomarker information regardless. On a dataset of 416 volumes, we show that our approach imposes coherence between biomarker predictions across volume slices and our predictions are superior to several existing approaches.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Concept-Centric Visual Turing Tests for Method Validation
Authors:
Tatiana Fountoukidou,
Raphael Sznitman
Abstract:
Recent advances in machine learning for medical imaging have led to impressive increases in model complexity and overall capabilities. However, the ability to discern the precise information a machine learning method is using to make decisions has lagged behind and it is often unclear how these performances are in fact achieved. Conventional evaluation metrics that reduce method performance to a s…
▽ More
Recent advances in machine learning for medical imaging have led to impressive increases in model complexity and overall capabilities. However, the ability to discern the precise information a machine learning method is using to make decisions has lagged behind and it is often unclear how these performances are in fact achieved. Conventional evaluation metrics that reduce method performance to a single number or a curve only provide limited insights. Yet, systems used in clinical practice demand thorough validation that such crude characterizations miss. To this end, we present a framework to evaluate classification methods based on a number of interpretable concepts that are crucial for a clinical task. Our approach is inspired by the Turing Test concept and how to devise a test that adaptively questions a method for its ability to interpret medical images. To do this, we make use of a Twenty Questions paradigm whereby we use a probabilistic model to characterize the method's capacity to grasp task-specific concepts, and we introduce a strategy to sequentially query the method according to its previous answers. The results show that the probabilistic model is able to expose both the dataset's and the method's biases, and can be used to reduced the number of queries needed for confident performance evaluation.
△ Less
Submitted 15 March, 2020; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Deep Multi Label Classification in Affine Subspaces
Authors:
Thomas Kurmann,
Pablo Marquez Neila,
Sebastian Wolf,
Raphael Sznitman
Abstract:
Multi-label classification (MLC) problems are becoming increasingly popular in the context of medical imaging. This has in part been driven by the fact that acquiring annotations for MLC is far less burdensome than for semantic segmentation and yet provides more expressiveness than multi-class classification. However, to train MLCs, most methods have resorted to similar objective functions as with…
▽ More
Multi-label classification (MLC) problems are becoming increasingly popular in the context of medical imaging. This has in part been driven by the fact that acquiring annotations for MLC is far less burdensome than for semantic segmentation and yet provides more expressiveness than multi-class classification. However, to train MLCs, most methods have resorted to similar objective functions as with traditional multi-class classification settings. We show in this work that such approaches are not optimal and instead propose a novel deep MLC classification method in affine subspace. At its core, the method attempts to pull features of class-labels towards different affine subspaces while maximizing the distance between them. We evaluate the method using two MLC medical imaging datasets and show a large performance increase compared to previous multi-label frameworks. This method can be seen as a plug-in replacement loss function and is trainable in an end-to-end fashion.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Discovering General-Purpose Active Learning Strategies
Authors:
Ksenia Konyushkova,
Raphael Sznitman,
Pascal Fua
Abstract:
We propose a general-purpose approach to discovering active learning (AL) strategies from data. These strategies are transferable from one domain to another and can be used in conjunction with many machine learning models. To this end, we formalize the annotation process as a Markov decision process, design universal state and action spaces and introduce a new reward function that precisely model…
▽ More
We propose a general-purpose approach to discovering active learning (AL) strategies from data. These strategies are transferable from one domain to another and can be used in conjunction with many machine learning models. To this end, we formalize the annotation process as a Markov decision process, design universal state and action spaces and introduce a new reward function that precisely model the AL objective of minimizing the annotation cost. We seek to find an optimal (non-myopic) AL strategy using reinforcement learning. We evaluate the learned strategies on multiple unrelated domains and show that they consistently outperform state-of-the-art baselines.
△ Less
Submitted 2 April, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Iterative multi-path tracking for video and volume segmentation with sparse point supervision
Authors:
Laurent Lejeune,
Jan Grossrieder,
Raphael Sznitman
Abstract:
Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volume…
▽ More
Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volumetric image data. To this end, this work presents a novel framework to produce pixel-wise segmentations using minimal supervision. Our method relies on 2D point supervision, whereby a single 2D location within an object of interest is provided on each image of the data. Our method then estimates the object appearance in a semi-supervised fashion by learning object-image-specific features and by using these in a semi-supervised learning framework. Our object model is then used in a graph-based optimization problem that takes into account all provided locations and the image data in order to infer the complete pixel-wise segmentation. In practice, we solve this optimally as a tracking problem using a K-shortest path approach. Both the object model and segmentation are then refined iteratively to further improve the final segmentation. We show that by collecting 2D locations using a gaze tracker, our approach can provide state-of-the-art segmentations on a range of objects and image modalities (video and 3D volumes), and that these can then be used to train supervised machine learning classifiers.
△ Less
Submitted 27 August, 2018;
originally announced September 2018.
-
Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery
Authors:
Sebastian Bodenstedt,
Max Allan,
Anthony Agustinos,
Xiaofei Du,
Luis Garcia-Peraza-Herrera,
Hannes Kenngott,
Thomas Kurmann,
Beat Müller-Stich,
Sebastien Ourselin,
Daniil Pakhomov,
Raphael Sznitman,
Marvin Teichmann,
Martin Thoma,
Tom Vercauteren,
Sandrine Voros,
Martin Wagner,
Pamela Wochner,
Lena Maier-Hein,
Danail Stoyanov,
Stefanie Speidel
Abstract:
Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are…
▽ More
Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are common image data sets for consistent evaluation and benchmarking of algorithms against each other. The paper presents a comparative validation study of different vision-based methods for instrument segmentation and tracking in the context of robotic as well as conventional laparoscopic surgery. The contribution of the paper is twofold: we introduce a comprehensive validation data set that was provided to the study participants and present the results of the comparative validation study. Based on the results of the validation study, we arrive at the conclusion that modern deep learning approaches outperform other methods in instrument segmentation tasks, but the results are still not perfect. Furthermore, we show that merging results from different methods actually significantly increases accuracy in comparison to the best stand-alone method. On the other hand, the results of the instrument tracking task show that this is still an open challenge, especially during challenging scenarios in conventional laparoscopic surgery.
△ Less
Submitted 7 May, 2018;
originally announced May 2018.
-
Simultaneous Recognition and Pose Estimation of Instruments in Minimally Invasive Surgery
Authors:
Thomas Kurmann,
Pablo Marquez Neila,
Xiaofei Du,
Pascal Fua,
Danail Stoyanov,
Sebastian Wolf,
Raphael Sznitman
Abstract:
Detection of surgical instruments plays a key role in ensuring patient safety in minimally invasive surgery. In this paper, we present a novel method for 2D vision-based recognition and pose estimation of surgical instruments that generalizes to different surgical applications. At its core, we propose a novel scene model in order to simultaneously recognize multiple instruments as well as their pa…
▽ More
Detection of surgical instruments plays a key role in ensuring patient safety in minimally invasive surgery. In this paper, we present a novel method for 2D vision-based recognition and pose estimation of surgical instruments that generalizes to different surgical applications. At its core, we propose a novel scene model in order to simultaneously recognize multiple instruments as well as their parts. We use a Convolutional Neural Network architecture to embody our model and show that the cross-entropy loss is well suited to optimize its parameters which can be trained in an end-to-end fashion. An additional advantage of our approach is that instrument detection at test time is achieved while avoiding the need for scale-dependent sliding window evaluation. This allows our approach to be relatively parameter free at test time and shows good performance for both instrument detection and tracking. We show that our approach surpasses state-of-the-art results on in-vivo retinal microsurgery image data, as well as ex-vivo laparoscopic sequences.
△ Less
Submitted 18 October, 2017;
originally announced October 2017.
-
Pathological OCT Retinal Layer Segmentation using Branch Residual U-shape Networks
Authors:
Stefanos Apostolopoulos,
Sandro De Zanet,
Carlos Ciller,
Sebastian Wolf,
Raphael Sznitman
Abstract:
The automatic segmentation of retinal layer structures enables clinically-relevant quantification and monitoring of eye disorders over time in OCT imaging. Eyes with late-stage diseases are particularly challenging to segment, as their shape is highly warped due to pathological biomarkers. In this context, we propose a novel fully Convolutional Neural Network (CNN) architecture which combines dila…
▽ More
The automatic segmentation of retinal layer structures enables clinically-relevant quantification and monitoring of eye disorders over time in OCT imaging. Eyes with late-stage diseases are particularly challenging to segment, as their shape is highly warped due to pathological biomarkers. In this context, we propose a novel fully Convolutional Neural Network (CNN) architecture which combines dilated residual blocks in an asymmetric U-shape configuration, and can segment multiple layers of highly pathological eyes in one shot. We validate our approach on a dataset of late-stage AMD patients and demonstrate lower computational costs and higher performance compared to other state-of-the-art methods.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.
-
Expected exponential loss for gaze-based video and volume ground truth annotation
Authors:
Laurent Lejeune,
Mario Christoudias,
Raphael Sznitman
Abstract:
Many recent machine learning approaches used in medical imaging are highly reliant on large amounts of image and ground truth data. In the context of object segmentation, pixel-wise annotations are extremely expensive to collect, especially in video and 3D volumes. To reduce this annotation burden, we propose a novel framework to allow annotators to simply observe the object to segment and record…
▽ More
Many recent machine learning approaches used in medical imaging are highly reliant on large amounts of image and ground truth data. In the context of object segmentation, pixel-wise annotations are extremely expensive to collect, especially in video and 3D volumes. To reduce this annotation burden, we propose a novel framework to allow annotators to simply observe the object to segment and record where they have looked at with a \$200 eye gaze tracker. Our method then estimates pixel-wise probabilities for the presence of the object throughout the sequence from which we train a classifier in semi-supervised setting using a novel Expected Exponential loss function. We show that our framework provides superior performances on a wide range of medical image settings compared to existing strategies and that our method can be combined with current crowd-sourcing paradigms as well.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.
-
Learning Active Learning from Data
Authors:
Ksenia Konyushkova,
Raphael Sznitman,
Pascal Fua
Abstract:
In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from…
▽ More
In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains.
△ Less
Submitted 14 July, 2017; v1 submitted 9 March, 2017;
originally announced March 2017.