SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Nwoye, Chinedu Innocent; Padoy, Nicolas

doi:10.1016/j.media.2024.103438

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.20333 (cs)

[Submitted on 30 May 2024 (v1), last revised 27 Dec 2024 (this version, v3)]

Title:SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Authors:Chinedu Innocent Nwoye, Nicolas Padoy

View PDF HTML (experimental)

Abstract:Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.

Comments:	15 pages, 7 figures, 7 tables, 1 video. Supplementary video available at: this https URL . Article published in Medical Image Analysis Journal 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.20333 [cs.CV]
	(or arXiv:2405.20333v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.20333
Journal reference:	Medical Image Analysis, Volume 101, Article 103438 (April 2025)
Related DOI:	https://doi.org/10.1016/j.media.2024.103438

Submission history

From: Chinedu Nwoye [view email]
[v1] Thu, 30 May 2024 17:59:10 UTC (1,341 KB)
[v2] Sun, 8 Dec 2024 23:30:46 UTC (1,357 KB)
[v3] Fri, 27 Dec 2024 09:24:45 UTC (1,357 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators