Search | arXiv e-print repository

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos

Authors: Nathan Darjana, Ryo Fujii, Hideo Saito, Hiroki Kajita

Abstract: Egocentric open-surgery videos capture rich, fine-grained details essential for accurately modeling surgical procedures and human behavior in the operating room. A detailed, pixel-level understanding of hands and surgical tools is crucial for interpreting a surgeon's actions and intentions. We introduce EgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suite for segmenting… ▽ More Egocentric open-surgery videos capture rich, fine-grained details essential for accurately modeling surgical procedures and human behavior in the operating room. A detailed, pixel-level understanding of hands and surgical tools is crucial for interpreting a surgeon's actions and intentions. We introduce EgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suite for segmenting surgical tools, hands, and interacting tools in egocentric open-surgery videos. Specifically, we provide a labeled dataset for (1) tool instance segmentation of 14 distinct surgical tools, (2) hand instance segmentation, and (3) hand-tool segmentation to label hands and the tools they manipulate. Using EgoSurgery-HTS, we conduct extensive evaluations of state-of-the-art segmentation methods and demonstrate significant improvements in the accuracy of hand and hand-tool segmentation in egocentric open-surgery videos compared to existing datasets. The dataset will be released at https://github.com/Fujiry0/EgoSurgery. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2503.03558 [pdf, other]

High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights

Authors: Yuna Kato, Mariko Isogawa, Shohei Mori, Hideo Saito, Hiroki Kajita, Yoshifumi Takatsume

Abstract: Occlusion-free video generation is challenging due to surgeons' obstructions in the camera field of view. Prior work has addressed this issue by installing multiple cameras on a surgical light, hoping some cameras will observe the surgical field with less occlusion. However, this special camera setup poses a new imaging challenge since camera configurations can change every time surgeons move the… ▽ More Occlusion-free video generation is challenging due to surgeons' obstructions in the camera field of view. Prior work has addressed this issue by installing multiple cameras on a surgical light, hoping some cameras will observe the surgical field with less occlusion. However, this special camera setup poses a new imaging challenge since camera configurations can change every time surgeons move the light, and manual image alignment is required. This paper proposes an algorithm to automate this alignment task. The proposed method detects frames where the lighting system moves, realigns them, and selects the camera with the least occlusion. This algorithm results in a stabilized video with less occlusion. Quantitative results show that our method outperforms conventional approaches. A user study involving medical doctors also confirmed the superiority of our method. △ Less

Submitted 5 March, 2025; originally announced March 2025.

Comments: Accepted at MICCAI2023

arXiv:2406.03095 [pdf, other]

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Authors: Ryo Fujii, Hideo Saito, Hiroki Kajita

Abstract: Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an exten… ▽ More Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery. △ Less

Submitted 26 November, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.19644 [pdf, other]

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Authors: Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita

Abstract: Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datase… ▽ More Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset is released at https://github.com/Fujiry0/EgoSurgery. △ Less

Submitted 26 November, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Early accepted by MICCAI 2024

arXiv:2303.15947 [pdf, other]

Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings

Authors: Ryo Hachiuma, Tomohiro Shimizu, Hideo Saito, Hiroki Kajita, Yoshifumi Takatsume

Abstract: Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one… ▽ More Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: MICCAI 2020

arXiv:2010.03341 [pdf, other]

doi 10.1016/j.compbiomed.2021.104596

Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive Evaluation

Authors: Moi Hoon Yap, Ryo Hachiuma, Azadeh Alavi, Raphael Brungel, Bill Cassidy, Manu Goyal, Hongtao Zhu, Johannes Ruckert, Moshe Olshansky, Xiao Huang, Hideo Saito, Saeed Hassanpour, Christoph M. Friedrich, David Ascher, Anping Song, Hiroki Kajita, David Gillespie, Neil D. Reeves, Joseph Pappachan, Claire O'Shea, Eibe Frank

Abstract: There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 i… ▽ More There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 images for testing. This paper summarises the results of DFUC2020 by comparing the deep learning-based algorithms proposed by the winning teams: Faster R-CNN, three variants of Faster R-CNN and an ensemble method; YOLOv3; YOLOv5; EfficientDet; and a new Cascade Attention Network. For each deep learning method, we provide a detailed description of model architecture, parameter settings for training and additional stages including pre-processing, data augmentation and post-processing. We provide a comprehensive evaluation for each method. All the methods required a data augmentation stage to increase the number of images available for training and a post-processing stage to remove false positives. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434. Finally, we demonstrate that the ensemble method based on different deep learning methods can enhanced the F1-Score but not the mAP. △ Less

Submitted 24 May, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: 19 pages, 18 figures, 10 tables

Journal ref: Computers in Biology and Medicine, Volume 135, 2021, 104596, ISSN 0010-4825,

arXiv:1111.5031 [pdf, ps, other]

doi 10.1103/PhysRevD.85.052007

Supernova Relic Neutrino Search at Super-Kamiokande

Authors: The Super-Kamiokande Collaboration, :, K. Bays, T. Iida, K. Abe, Y. Hayato, K. Iyogi, J. Kameda, Y. Koshio, L. Marti, M. Miura, S. Moriyama, M. Nakahata, S. Nakayama, Y. Obayashi, H. Sekiya, M. Shiozawa, Y. Suzuki, A. Takeda, Y. Takenaga, K. Ueno, K. Ueshima S. Yamada T. Yokozawa H. Kaji T. Kajita, K. Kaneyuki, T. McLachlan, K. Okumura , et al. (83 additional authors not shown)

Abstract: A new Super-Kamiokande (SK) search for Supernova Relic Neutrinos (SRNs) was conducted using 2853 live days of data. Sensitivity is now greatly improved compared to the 2003 SK result, which placed a flux limit near many theoretical predictions. This more detailed analysis includes a variety of improvements such as increased efficiency, a lower energy threshold, and an expanded data set. New combin… ▽ More A new Super-Kamiokande (SK) search for Supernova Relic Neutrinos (SRNs) was conducted using 2853 live days of data. Sensitivity is now greatly improved compared to the 2003 SK result, which placed a flux limit near many theoretical predictions. This more detailed analysis includes a variety of improvements such as increased efficiency, a lower energy threshold, and an expanded data set. New combined upper limits on SRN flux are between 2.8 and 3.0 nu_e cm^-2 s^-1 > 16 MeV total positron energy (17.3 MeV E_nu). △ Less

Submitted 21 November, 2011; originally announced November 2011.

Journal ref: Phys. Rev. D 85, 052007 (2012)

Showing 1–7 of 7 results for author: Kajita, H