Tracking Everything in Robotic-Assisted Surgery

Zhan, Bohan; Zhao, Wang; Fang, Yi; Du, Bo; Vasconcelos, Francisco; Stoyanov, Danail; Elson, Daniel S.; Huang, Baoru

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.19821 (cs)

[Submitted on 29 Sep 2024 (v1), last revised 20 Mar 2025 (this version, v2)]

Title:Tracking Everything in Robotic-Assisted Surgery

Authors:Bohan Zhan, Wang Zhao, Yi Fang, Bo Du, Francisco Vasconcelos, Danail Stoyanov, Daniel S. Elson, Baoru Huang

View PDF HTML (experimental)

Abstract:Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery (RAMIS), as it enables the robot to comprehend the surgical scene with precise locations and interactions of tissues and tools. Traditional keypoint-based sparse tracking is limited by featured points, while flow-based dense two-view matching suffers from long-term drifts. Recently, the Tracking Any Point (TAP) algorithm was proposed to overcome these limitations and achieve dense accurate long-term tracking. However, its efficacy in surgical scenarios remains untested, largely due to the lack of a comprehensive surgical tracking dataset for evaluation. To address this gap, we introduce a new annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios, comprising real-world surgical videos with complex tissue and instrument motions. We extensively evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios, including fast instrument motion, severe occlusions, and motion blur, etc. Furthermore, we propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance. Our proposed method outperforms most TAP-based algorithms in surgical instruments tracking, and especially demonstrates significant improvements over baselines in challenging medical videos. Our code and dataset are available at this https URL.

Comments:	7 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.19821 [cs.CV]
	(or arXiv:2409.19821v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.19821

Submission history

From: Baoru Huang [view email]
[v1] Sun, 29 Sep 2024 23:06:57 UTC (15,146 KB)
[v2] Thu, 20 Mar 2025 19:50:04 UTC (15,146 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Tracking Everything in Robotic-Assisted Surgery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Tracking Everything in Robotic-Assisted Surgery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators