Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

Zhang, Yingying; Shi, Chuangji; Guo, Xin; Lao, Jiangwei; Wang, Jian; Wang, Jiaotuan; Chen, Jingdong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.03318 (cs)

[Submitted on 6 May 2024]

Title:Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

Authors:Yingying Zhang, Chuangji Shi, Xin Guo, Jiangwei Lao, Jian Wang, Jiaotuan Wang, Jingdong Chen

View PDF HTML (experimental)

Abstract:The design of the query is crucial for the performance of DETR and its variants. Each query consists of two components: a content part and a positional one. Traditionally, the content query is initialized with a zero or learnable embedding, lacking essential content information and resulting in sub-optimal performance. In this paper, we introduce a novel plug-and-play module, Self-Adaptive Content Query (SACQ), to address this limitation. The SACQ module utilizes features from the transformer encoder to generate content queries via self-attention pooling. This allows candidate queries to adapt to the input image, resulting in a more comprehensive content prior and better focus on target objects. However, this improved concentration poses a challenge for the training process that utilizes the Hungarian matching, which selects only a single candidate and suppresses other similar ones. To overcome this, we propose a query aggregation strategy to cooperate with SACQ. It merges similar predicted candidates from different queries, easing the optimization. Our extensive experiments on the COCO dataset demonstrate the effectiveness of our proposed approaches across six different DETR's variants with multiple configurations, achieving an average improvement of over 1.0 AP.

Comments:	11 pages, 7 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2405.03318 [cs.CV]
	(or arXiv:2405.03318v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.03318

Submission history

From: Yingying Zhang [view email]
[v1] Mon, 6 May 2024 09:50:04 UTC (7,494 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators