Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval

Wen, Haokun; Song, Xuemeng; Yin, Jianhua; Wu, Jianlong; Guan, Weili; Nie, Liqiang

doi:10.1109/TPAMI.2023.3346434

Computer Science > Multimedia

arXiv:2305.09979 (cs)

[Submitted on 17 May 2023 (v1), last revised 28 Nov 2024 (this version, v2)]

Title:Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval

Authors:Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan, Liqiang Nie

View PDF HTML (experimental)

Abstract:The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1) ignoring the multi-faceted query-target matching factors; 2) ignoring the potential unlabeled reference-target image pairs in existing benchmark datasets. To address these two limitations is non-trivial due to the following challenges: 1) how to effectively model the multi-faceted matching factors in a latent way without direct supervision signals; 2) how to fully utilize the potential unlabeled reference-target image pairs to improve the generalization ability of the CIR model. To address these challenges, in this work, we first propose a muLtI-faceted Matching Network (LIMN), which consists of three key modules: multi-grained image/text encoder, latent factor-oriented feature aggregation, and query-target matching modeling. Thereafter, we design an iterative dual self-training paradigm to further enhance the performance of LIMN by fully utilizing the potential unlabeled reference-target image pairs in a semi-supervised manner. Specifically, we denote the iterative dual self-training paradigm enhanced LIMN as LIMN+. Extensive experiments on three real-world datasets, FashionIQ, Shoes, and Birds-to-Words, show that our proposed method significantly surpasses the state-of-the-art baselines.

Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2305.09979 [cs.MM]
	(or arXiv:2305.09979v2 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2305.09979
Journal reference:	IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3665-3678, May 2024
Related DOI:	https://doi.org/10.1109/TPAMI.2023.3346434

Submission history

From: Haokun Wen [view email]
[v1] Wed, 17 May 2023 06:23:06 UTC (3,960 KB)
[v2] Thu, 28 Nov 2024 07:49:44 UTC (1,989 KB)

Computer Science > Multimedia

Title:Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators