Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Raine, Scarlett; Marchant, Ross; Kusy, Brano; Maire, Frederic; Suenderhauf, Niko; Fischer, Tobias

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.09406 (cs)

[Submitted on 15 Apr 2024 (v1), last revised 12 Nov 2024 (this version, v3)]

Title:Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Authors:Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer

View PDF HTML (experimental)

Abstract:Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain experts. Point label propagation is a technique that uses existing images labeled with sparse points to create augmented ground truth data, which can be used to train a semantic segmentation model. In this work, we show that recent advances in large foundation models facilitate the creation of augmented ground truth masks using only features extracted by the denoised version of the DINOv2 foundation model and K-Nearest Neighbors (KNN), without any pre-training. For images with extremely sparse labels, we present a labeling method based on human-in-the-loop principles, which greatly enhances annotation efficiency: in the case that there are 5 point labels per image, our human-in-the-loop method outperforms the prior state-of-the-art by 14.2% for pixel accuracy and 19.7% for mIoU; and by 8.9% and 18.3% if there are 10 point labels. When human-in-the-loop labeling is not available, using the denoised DINOv2 features with a KNN still improves on the prior state-of-the-art by 2.7% for pixel accuracy and 5.8% for mIoU (5 grid points). On the semantic segmentation task, we outperform the prior state-of-the-art by 8.8% for pixel accuracy and by 13.5% for mIoU when only 5 point labels are used for point label propagation. Additionally, we perform a comprehensive study into the impacts of the point label placement style and the number of points on the point label propagation quality, and make several recommendations for improving the efficiency of labeling images with points.

Comments:	Journal article preprint of extended paper, 30 pages, 11 figures. Original conference paper (v2) accepted at the CVPR2024 3rd Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2404.09406 [cs.CV]
	(or arXiv:2404.09406v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.09406

Submission history

From: Scarlett Raine Dr [view email]
[v1] Mon, 15 Apr 2024 01:47:44 UTC (6,520 KB)
[v2] Tue, 16 Apr 2024 05:58:39 UTC (6,520 KB)
[v3] Tue, 12 Nov 2024 04:37:47 UTC (7,185 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators