Detect Everything with Few Examples

Zhang, Xinyu; Liu, Yuhan; Wang, Yuting; Boularias, Abdeslam

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.12969 (cs)

[Submitted on 22 Sep 2023 (v1), last revised 2 Oct 2024 (this version, v4)]

Title:Detect Everything with Few Examples

Authors:Xinyu Zhang, Yuhan Liu, Yuting Wang, Abdeslam Boularias

View PDF HTML (experimental)

Abstract:Few-shot object detection aims at detecting novel categories given only a few example images. It is a basic skill for a robot to perform tasks in open environments. Recent methods focus on finetuning strategies, with complicated procedures that prohibit a wider application. In this paper, we introduce DE-ViT, a few-shot object detector without the need for finetuning. DE-ViT's novel architecture is based on a new region-propagation mechanism for localization. The propagated region masks are transformed into bounding boxes through a learnable spatial integral layer. Instead of training prototype classifiers, we propose to use prototypes to project ViT features into a subspace that is robust to overfitting on base classes. We evaluate DE-ViT on few-shot, and one-shot object detection benchmarks with Pascal VOC, COCO, and LVIS. DE-ViT establishes new state-of-the-art results on all benchmarks. Notably, for COCO, DE-ViT surpasses the few-shot SoTA by 15 mAP on 10-shot and 7.2 mAP on 30-shot and one-shot SoTA by 2.8 AP50. For LVIS, DE-ViT outperforms few-shot SoTA by 17 box APr. Further, we evaluate DE-ViT with a real robot by building a pick-and-place system for sorting novel objects based on example images. The videos of our robot demonstrations, the source code and the models of DE-ViT can be found at this https URL.

Comments:	CoRL 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.12969 [cs.CV]
	(or arXiv:2309.12969v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.12969

Submission history

From: Xinyu Zhang [view email]
[v1] Fri, 22 Sep 2023 16:07:16 UTC (6,974 KB)
[v2] Tue, 21 Nov 2023 16:27:11 UTC (33,580 KB)
[v3] Thu, 7 Mar 2024 12:43:21 UTC (38,129 KB)
[v4] Wed, 2 Oct 2024 19:26:18 UTC (42,635 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Detect Everything with Few Examples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Detect Everything with Few Examples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators