-
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
Authors:
Chinedu Innocent Nwoye,
Tong Yu,
Saurav Sharma,
Aditya Murali,
Deepak Alapatt,
Armine Vardazaryan,
Kun Yuan,
Jonas Hajek,
Wolfgang Reiter,
Amine Yamlahi,
Finn-Henri Smidt,
Xiaoyang Zou,
Guoyan Zheng,
Bruno Oliveira,
Helena R. Torres,
Satoshi Kondo,
Satoshi Kasai,
Felix Holm,
Ege Özsoy,
Shuangchun Gui,
Han Li,
Sista Raviteja,
Rachana Sathish,
Pranav Poudel,
Binod Bhattarai
, et al. (24 additional authors not shown)
Abstract:
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor…
▽ More
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
△ Less
Submitted 14 July, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
2020 CATARACTS Semantic Segmentation Challenge
Authors:
Imanol Luengo,
Maria Grammatikopoulou,
Rahim Mohammadi,
Chris Walsh,
Chinedu Innocent Nwoye,
Deepak Alapatt,
Nicolas Padoy,
Zhen-Liang Ni,
Chen-Chen Fan,
Gui-Bin Bian,
Zeng-Guang Hou,
Heonjin Ha,
Jiacheng Wang,
Haojie Wang,
Dong Guo,
Lu Wang,
Guotai Wang,
Mobarakol Islam,
Bharat Giddwani,
Ren Hongliang,
Theodoros Pissas,
Claudio Ravasio,
Martin Huber,
Jeremy Birch,
Joan M. Nunez Do Rio
, et al. (15 additional authors not shown)
Abstract:
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc…
▽ More
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.
△ Less
Submitted 24 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark
Authors:
Martin Wagner,
Beat-Peter Müller-Stich,
Anna Kisilenko,
Duc Tran,
Patrick Heger,
Lars Mündermann,
David M Lubotsky,
Benjamin Müller,
Tornike Davitashvili,
Manuela Capek,
Annika Reinke,
Tong Yu,
Armine Vardazaryan,
Chinedu Innocent Nwoye,
Nicolas Padoy,
Xinyang Liu,
Eung-Joo Lee,
Constantin Disch,
Hans Meine,
Tong Xia,
Fucang Jia,
Satoshi Kondo,
Wolfgang Reiter,
Yueming Jin,
Yonghao Long
, et al. (16 additional authors not shown)
Abstract:
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo…
▽ More
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets
Authors:
Chinedu Innocent Nwoye,
Cristians Gonzalez,
Tong Yu,
Pietro Mascagni,
Didier Mutter,
Jacques Marescaux,
Nicolas Padoy
Abstract:
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80…
▽ More
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called Class Activation Guide (CAG), which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D Interaction Space, which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.