Search | arXiv e-print repository

arXiv:2506.12006 [pdf, ps, other]

crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023

Authors: Navodini Wijethilake, Reuben Dorent, Marina Ivory, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Mohamed Okasha, Anna Oviedova, Hexin Dong, Bogyeong Kang, Guillaume Sallé, Luyi Han, Ziyuan Zhao, Han Liu, Yubo Fan, Tao Yang, Shahad Hardan, Hussain Alasmawi, Santosh Sanjeev, Yuzhou Zhuang, Satoshi Kondo, Maria Baldeon Calisto, Shaikh Muhammad Uzair Noman, Cancan Chen, Ipek Oguz , et al. (16 additional authors not shown)

Abstract: The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a mea… ▽ More The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a meaningful and illustrative benchmark. From a clinical application perspective, it aims to automate Vestibular Schwannoma (VS) and cochlea segmentation on T2 scans for more cost-effective VS management. Over time, the challenge objectives have evolved to enhance its clinical relevance. The challenge evolved from using single-institutional data and basic segmentation in 2021 to incorporating multi-institutional data and Koos grading in 2022, and by 2023, it included heterogeneous routine data and sub-segmentation of intra- and extra-meatal tumour components. In this work, we report the findings of the 2022 and 2023 editions and perform a retrospective analysis of the challenge progression over the years. The observations from the successive challenge contributions indicate that the number of outliers decreases with an expanding dataset. This is notable since the diversity of scanning protocols of the datasets concurrently increased. The winning approach of the 2023 edition reduced the number of outliers on the 2021 and 2022 testing data, demonstrating how increased data heterogeneity can enhance segmentation performance even on homogeneous data. However, the cochlea Dice score declined in 2023, likely due to the added complexity from tumour sub-annotations affecting overall segmentation performance. While progress is still needed for clinically acceptable VS segmentation, the plateauing performance suggests that a more challenging cross-modal task may better serve future benchmarking. △ Less

Submitted 24 July, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

arXiv:2505.13746 [pdf, ps, other]

ReSW-VL: Representation Learning for Surgical Workflow Analysis Using Vision-Language Model

Authors: Satoshi Kondo

Abstract: Surgical phase recognition from video is a technology that automatically classifies the progress of a surgical procedure and has a wide range of potential applications, including real-time surgical support, optimization of medical resources, training and skill assessment, and safety improvement. Recent advances in surgical phase recognition technology have focused primarily on Transform-based meth… ▽ More Surgical phase recognition from video is a technology that automatically classifies the progress of a surgical procedure and has a wide range of potential applications, including real-time surgical support, optimization of medical resources, training and skill assessment, and safety improvement. Recent advances in surgical phase recognition technology have focused primarily on Transform-based methods, although methods that extract spatial features from individual frames using a CNN and video features from the resulting time series of spatial features using time series modeling have shown high performance. However, there remains a paucity of research on training methods for CNNs employed for feature extraction or representation learning in surgical phase recognition. In this study, we propose a method for representation learning in surgical workflow analysis using a vision-language model (ReSW-VL). Our proposed method involves fine-tuning the image encoder of a CLIP (Convolutional Language Image Model) vision-language model using prompt learning for surgical phase recognition. The experimental results on three surgical phase recognition datasets demonstrate the effectiveness of the proposed method in comparison to conventional methods. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2407.02738 [pdf, other]

ZEAL: Surgical Skill Assessment with Zero-shot Tool Inference Using Unified Foundation Model

Authors: Satoshi Kondo

Abstract: Surgical skill assessment is paramount for ensuring patient safety and enhancing surgical outcomes. This study addresses the need for efficient and objective evaluation methods by introducing ZEAL (surgical skill assessment with Zero-shot surgical tool segmentation with a unifiEd foundAtion modeL). ZEAL uses segmentation masks of surgical instruments obtained through a unified foundation model for… ▽ More Surgical skill assessment is paramount for ensuring patient safety and enhancing surgical outcomes. This study addresses the need for efficient and objective evaluation methods by introducing ZEAL (surgical skill assessment with Zero-shot surgical tool segmentation with a unifiEd foundAtion modeL). ZEAL uses segmentation masks of surgical instruments obtained through a unified foundation model for proficiency assessment. Through zero-shot inference with text prompts, ZEAL predicts segmentation masks, capturing essential features of both instruments and surroundings. Utilizing sparse convolutional neural networks and segmentation masks, ZEAL extracts feature vectors for foreground (instruments) and background. Long Short-Term Memory (LSTM) networks encode temporal dynamics, modeling sequential data and dependencies in surgical videos. Combining LSTM-encoded vectors, ZEAL produces a surgical skill score, offering an objective measure of proficiency. Comparative analysis with conventional methods using open datasets demonstrates ZEAL's superiority, affirming its potential in advancing surgical training and evaluation. This innovative approach to surgical skill assessment addresses challenges in traditional supervised learning techniques, paving the way for enhanced surgical care quality and patient outcomes. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2308.13553 [pdf]

Synthesizing 3D computed tomography from MRI or CBCT using 2.5D deep neural networks

Authors: Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa

Abstract: Deep learning techniques, particularly convolutional neural networks (CNNs), have gained traction for synthetic computed tomography (sCT) generation from Magnetic resonance imaging (MRI), Cone-beam computed tomography (CBCT) and PET. In this report, we introduce a method to syn-thesize CT from MRI or CBCT. Our method is based on multi-slice (2.5D) CNNs. 2.5D CNNs offer distinct advantages over 3D… ▽ More Deep learning techniques, particularly convolutional neural networks (CNNs), have gained traction for synthetic computed tomography (sCT) generation from Magnetic resonance imaging (MRI), Cone-beam computed tomography (CBCT) and PET. In this report, we introduce a method to syn-thesize CT from MRI or CBCT. Our method is based on multi-slice (2.5D) CNNs. 2.5D CNNs offer distinct advantages over 3D CNNs when dealing with volumetric data. In the experiments, we evaluate the performance of our method for two tasks, MRI-to-sCT and CBCT-to-sCT generation. Target organs for both tasks are brain and pelvis. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2305.18033 [pdf]

doi 10.1038/s41597-023-02422-6

The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue

Authors: Philippe Weitz, Masi Valkonen, Leslie Solorzano, Circe Carr, Kimmo Kartasalo, Constance Boissin, Sonja Koivukoski, Aino Kuusela, Dusan Rasic, Yanbo Feng, Sandra Sinius Pouplier, Abhinav Sharma, Kajsa Ledesma Eriksson, Stephanie Robertson, Christian Marzahl, Chandler D. Gatenbee, Alexander R. A. Anderson, Marek Wodzinski, Artur Jurgas, Niccolò Marini, Manfredo Atzori, Henning Müller, Daniel Budelmann, Nick Weiss, Stefan Heldmann , et al. (16 additional authors not shown)

Abstract: The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration… ▽ More The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results establish the current state-of-the-art in WSI registration and guide researchers in selecting and developing methods. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2302.12774 [pdf]

Automated Lesion Segmentation in Whole-Body FDG-PET/CT with Multi-modality Deep Neural Networks

Authors: Satoshi Kondo, Satoshi Kasai

Abstract: Recent progress in automated PET/CT lesion segmentation using deep learning methods has demonstrated the feasibility of this task. However, tumor lesion detection and segmentation in whole-body PET/CT is still a chal-lenging task. To promote research on machine learning-based automated tumor lesion segmentation on whole-body FDG-PET/CT data, Automated Lesion Segmentation in Whole-Body FDG-PET/CT (… ▽ More Recent progress in automated PET/CT lesion segmentation using deep learning methods has demonstrated the feasibility of this task. However, tumor lesion detection and segmentation in whole-body PET/CT is still a chal-lenging task. To promote research on machine learning-based automated tumor lesion segmentation on whole-body FDG-PET/CT data, Automated Lesion Segmentation in Whole-Body FDG-PET/CT (autoPET) challenge is held, and a large, publicly available training dataset is provided. In this report, we present our solution to the autoPET challenge. We employ multi-modal residual U-Net with deep super vision. The experimental results for five preliminary test cases show that Dice score is 0.79 +/- 0.21. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.08016

arXiv:2302.08016 [pdf]

Unsupervised Domain Adaptation for MRI Volume Segmentation and Classification Using Image-to-Image Translation

Authors: Satoshi Kondo, Satoshi Kasai

Abstract: Unsupervised domain adaptation is a type of domain adaptation and exploits labeled data from the source domain and unlabeled data from the target one. In the Cross-Modality Domain Adaptation for Medical Image Segmenta-tion challenge (crossMoDA2022), contrast enhanced T1 MRI volumes for brain are provided as the source domain data, and high-resolution T2 MRI volumes are provided as the target domai… ▽ More Unsupervised domain adaptation is a type of domain adaptation and exploits labeled data from the source domain and unlabeled data from the target one. In the Cross-Modality Domain Adaptation for Medical Image Segmenta-tion challenge (crossMoDA2022), contrast enhanced T1 MRI volumes for brain are provided as the source domain data, and high-resolution T2 MRI volumes are provided as the target domain data. The crossMoDA2022 challenge contains two tasks, segmentation of vestibular schwannoma (VS) and cochlea, and clas-sification of VS with Koos grade. In this report, we presented our solution for the crossMoDA2022 challenge. We employ an image-to-image translation method for unsupervised domain adaptation and residual U-Net the segmenta-tion task. We use SVM for the classification task. The experimental results show that the mean DSC and ASSD are 0.614 and 2.936 for the segmentation task and MA-MAE is 0.84 for the classification task. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2302.01738 [pdf, other]

AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge

Authors: Coen de Vente, Koenraad A. Vermeer, Nicolas Jaccard, He Wang, Hongyi Sun, Firas Khader, Daniel Truhn, Temirgali Aimyshev, Yerkebulan Zhanibekuly, Tien-Dung Le, Adrian Galdran, Miguel Ángel González Ballester, Gustavo Carneiro, Devika R G, Hrishikesh P S, Densen Puthussery, Hong Liu, Zekang Yang, Satoshi Kondo, Satoshi Kasai, Edward Wang, Ashritha Durvasula, Jónathan Heras, Miguel Ángel Zapata, Teresa Araújo , et al. (11 additional authors not shown)

Abstract: The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios… ▽ More The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper, and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening. △ Less

Submitted 10 February, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 19 pages, 8 figures, 3 tables

arXiv:2209.01300 [pdf, ps, other]

Source-Free Unsupervised Domain Adaptation with Norm and Shape Constraints for Medical Image Segmentation

Authors: Satoshi Kondo

Abstract: Unsupervised domain adaptation (UDA) is one of the key technologies to solve a problem where it is hard to obtain ground truth labels needed for supervised learning. In general, UDA assumes that all samples from source and target domains are available during the training process. However, this is not a realistic assumption under applications where data privacy issues are concerned. To overcome thi… ▽ More Unsupervised domain adaptation (UDA) is one of the key technologies to solve a problem where it is hard to obtain ground truth labels needed for supervised learning. In general, UDA assumes that all samples from source and target domains are available during the training process. However, this is not a realistic assumption under applications where data privacy issues are concerned. To overcome this limitation, UDA without source data, referred to source-free unsupervised domain adaptation (SFUDA) has been recently proposed. Here, we propose a SFUDA method for medical image segmentation. In addition to the entropy minimization method, which is commonly used in UDA, we introduce a loss function for avoiding feature norms in the target domain small and a prior to preserve shape constraints of the target organ. We conduct experiments using datasets including multiple types of source-target domain combinations in order to show the versatility and robustness of our method. We confirm that our method outperforms the state-of-the-art in all datasets. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.12635 [pdf]

A Two Step Approach for Whole Slide Image Registration

Authors: Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa

Abstract: Multi-stain whole-slide-image (WSI) registration is an active field of research. It is unclear, however, how the current WSI registration methods would perform on a real-world data set. AutomatiC Registration Of Breast cAncer Tissue (ACROBAT) challenge is held to verify the performance of the current WSI registration methods by using a new dataset that originates from routine diagnostics to assess… ▽ More Multi-stain whole-slide-image (WSI) registration is an active field of research. It is unclear, however, how the current WSI registration methods would perform on a real-world data set. AutomatiC Registration Of Breast cAncer Tissue (ACROBAT) challenge is held to verify the performance of the current WSI registration methods by using a new dataset that originates from routine diagnostics to assess real-world applicability. In this report, we present our solution for the ACROBAT challenge. We employ a two-step approach including rigid and non-rigid transforms. The experimental results show that the median 90th percentile is 1,250 um for the validation dataset. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2208.12041 [pdf]

Multi-Modality Abdominal Multi-Organ Segmentation with Deep Supervised 3D Segmentation Model

Authors: Satoshi Kondo, Satoshi Kasai

Abstract: To promote the development of medical image segmentation technology, AMOS, a large-scale abdominal multi-organ dataset for versatile medical image segmentation, is provided and AMOS 2022 challenge is held by using the dataset. In this report, we present our solution for the AMOS 2022 challenge. We employ residual U-Net with deep super vision as our base model. The experimental results show that th… ▽ More To promote the development of medical image segmentation technology, AMOS, a large-scale abdominal multi-organ dataset for versatile medical image segmentation, is provided and AMOS 2022 challenge is held by using the dataset. In this report, we present our solution for the AMOS 2022 challenge. We employ residual U-Net with deep super vision as our base model. The experimental results show that the mean scores of Dice similarity coefficient and normalized surface dice are 0.8504 and 0.8476 for CT only task and CT/MRI task, respectively. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2204.03742 [pdf, other]

doi 10.1016/j.media.2022.102699

Mitosis domain generalization in histopathology images -- The MIDOG challenge

Authors: Marc Aubreville, Nikolas Stathonikos, Christof A. Bertram, Robert Klopleisch, Natalie ter Hoeve, Francesco Ciompi, Frauke Wilm, Christian Marzahl, Taryn A. Donovan, Andreas Maier, Jack Breen, Nishant Ravikumar, Youjin Chung, Jinah Park, Ramin Nateghi, Fattaneh Pourakpour, Rutger H. J. Fick, Saima Ben Hadj, Mostafa Jahanifar, Nasir Rajpoot, Jakob Dexl, Thomas Wittenberg, Satoshi Kondo, Maxime W. Lafarge, Viktor H. Koelzer , et al. (10 additional authors not shown)

Abstract: The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly… ▽ More The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly deteriorate when applied in a different clinical environment than was used for training. One decisive component in the underlying domain shift has been identified as the variability caused by using different whole slide scanners. The goal of the MICCAI MIDOG 2021 challenge has been to propose and evaluate methods that counter this domain shift and derive scanner-agnostic mitosis detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As a test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were given. The best approaches performed on an expert level, with the winning algorithm yielding an F_1 score of 0.748 (CI95: 0.704-0.781). In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 19 pages, 9 figures, summary paper of the 2021 MICCAI MIDOG challenge

Journal ref: Medical Image Analysis 84 (2023) 102699

arXiv:2202.11804 [pdf, ps, other]

Nuclei panoptic segmentation and composition regression with multi-task deep neural networks

Authors: Satoshi Kondo, Satoshi Kasai

Abstract: Nuclear segmentation, classification and quantification within Haematoxylin & Eosin stained histology images enables the extraction of interpretable cell-based features that can be used in downstream explainable models in computational pathology. The Colon Nuclei Identification and Counting (CoNIC) Challenge is held to help drive forward research and innovation for automatic nuclei recognition in… ▽ More Nuclear segmentation, classification and quantification within Haematoxylin & Eosin stained histology images enables the extraction of interpretable cell-based features that can be used in downstream explainable models in computational pathology. The Colon Nuclei Identification and Counting (CoNIC) Challenge is held to help drive forward research and innovation for automatic nuclei recognition in computational pathology. This report describes our proposed method submitted to the CoNIC challenge. Our method employs a multi-task learning framework, which performs a panoptic segmentation task and a regression task. For the panoptic segmentation task, we use encoder-decoder type deep neural networks predicting a direction map in addition to a segmentation map in order to separate neighboring nuclei into different instances △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2201.02831 [pdf, other]

doi 10.1016/j.media.2022.102628

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image. △ Less

Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: In Medical Image Analysis

arXiv:2112.06695 [pdf, other]

Bi-directional Beamforming Feedback-based Firmware-agnostic WiFi Sensing: An Empirical Study

Authors: S. Kondo, S. Itahara, K. Yamashita, K. Yamamoto, Y. Koda, T. Nishio, A. Taya

Abstract: In the field of WiFi sensing, as an alternative sensing source of the channel state information (CSI) matrix, the use of a beamforming feedback matrix (BFM)that is a right singular matrix of the CSI matrix has attracted significant interest owing to its wide availability regarding the underlying WiFi systems. In the IEEE 802.11ac/ax standard, the station (STA) transmits a BFM to an access point (A… ▽ More In the field of WiFi sensing, as an alternative sensing source of the channel state information (CSI) matrix, the use of a beamforming feedback matrix (BFM)that is a right singular matrix of the CSI matrix has attracted significant interest owing to its wide availability regarding the underlying WiFi systems. In the IEEE 802.11ac/ax standard, the station (STA) transmits a BFM to an access point (AP), which uses the BFM for precoded multiple-input and multiple-output communications. In addition, in the same way, the AP transmits a BFM to the STA, and the STA uses the received BFM. Regarding BFM-based sensing, extensive real-world experiments were conducted as part of this study, and two key insights were reported: Firstly, this report identified a potential issue related to accuracy in existing uni-directional BFM-based sensing frameworks that leverage only BFMs transmitted for the AP or STA. Such uni-directionality introduces accuracy concerns when there is a sensing capability gap between the uni-directional BFMs for the AP and STA. Thus, this report experimentally evaluates the sensing ability disparity between the uni-directional BFMs, and shows that the BFMs transmitted for an AP achieve higher sensing accuracy compared to the BFMs transmitted from the STA when the sensing target values are estimated depending on the angle of departure of the AP. Secondly, to complement the sensing gap, this paper proposes a bi-directional sensing framework, which simultaneously leverages the BFMs transmitted from the AP and STA. The experimental evaluations reveal that bi-directional sensing achieves higher accuracy than uni-directional sensing in terms of the human localization task. △ Less

Submitted 27 February, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: 10 pages, 7 figures

arXiv:2110.14211 [pdf, other]

Beamforming Feedback-based Model-Driven Angle of Departure Estimation Toward Legacy Support in WiFi Sensing: An Experimental Study

Authors: Sohei Itahara, Sota Kondo, Kota Yamashita, Takayuki Nishio, Koji Yamamoto, Yusuke Koda

Abstract: This study experimentally validated the possibility of angle of departure (AoD) estimation using multiple signal classification (MUSIC) with only WiFi control frames for beamforming feedback (BFF), defined in IEEE 802.11ac/ax. The examined BFF-based MUSIC is a model-driven algorithm, which does not require a pre-obtained database. This contrasts with most existing BFF-based sensing techniques, whi… ▽ More This study experimentally validated the possibility of angle of departure (AoD) estimation using multiple signal classification (MUSIC) with only WiFi control frames for beamforming feedback (BFF), defined in IEEE 802.11ac/ax. The examined BFF-based MUSIC is a model-driven algorithm, which does not require a pre-obtained database. This contrasts with most existing BFF-based sensing techniques, which are data-driven and require a pre-obtained database. Moreover, the BFF-based MUSIC affords an alternative AoD estimation method without access to channel state information (CSI). Specifically, the extensive experimental and numerical evaluations demonstrated that the BFF-based MUSIC successfully estimates the AoDs for multiple propagation paths. Moreover, the evaluations performed in this study revealed that the BFF-based MUSIC achieved a comparable error of AoD estimation to the CSI-based MUSIC, while BFF is a highly compressed version of CSI in IEEE 802.11ac/ax. △ Less

Submitted 2 February, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Submitted to IEEE Access

arXiv:2109.14956 [pdf]

Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming Jin, Yonghao Long , et al. (16 additional authors not shown)

Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.01503 [pdf]

Multi-source Domain Adaptation Using Gradient Reversal Layer for Mitotic Cell Detection

Authors: Satoshi Kondo

Abstract: This is a write-up of our method submitted to Mitosis Domain Generalization (MIDOG 2021) Challenge held in MICCAI2021 conference. This is a write-up of our method submitted to Mitosis Domain Generalization (MIDOG 2021) Challenge held in MICCAI2021 conference. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2108.03170 [pdf, other]

doi 10.1109/CCNC49033.2022.9700721

Respiratory Rate Estimation Based on WiFi Frame Capture

Authors: T. Kanda, T. Sato, H. Awano, S. Kondo, K. Yamamoto

Abstract: This paper presents a method that estimates the respiratory rate based on the frame capturing of wireless local area networks. The method uses beamforming feedback matrices (BFMs) contained in the captured frames, which is a rotation matrix of channel state information (CSI). BFMs are transmitted unencrypted and easily obtained using frame capturing, requiring no specific firmware or WiFi chipsets… ▽ More This paper presents a method that estimates the respiratory rate based on the frame capturing of wireless local area networks. The method uses beamforming feedback matrices (BFMs) contained in the captured frames, which is a rotation matrix of channel state information (CSI). BFMs are transmitted unencrypted and easily obtained using frame capturing, requiring no specific firmware or WiFi chipsets, unlike the methods that use CSI. Such properties of BFMs allow us to apply frame capturing to various sensing tasks, e.g., vital sensing. In the proposed method, principal component analysis is applied to BFMs to isolate the effect of the chest movement of the subject, and then, discrete Fourier transform is performed to extract respiratory rates in a frequency domain. Experimental evaluation results confirm that the frame-capture-based respiratory rate estimation can achieve estimation error lower than 3.2 breaths/minute. △ Less

Submitted 14 September, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

Journal ref: Proc. IEEE 19th Annual Consumer Communications & Networking Conference (CCNC 2022)

Showing 1–20 of 20 results for author: Kondo, S