FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

Huang, Jiale; Gao, Dehong; Zhang, Jinxia; Zhan, Zechao; Hu, Yang; Wang, Xin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.19997 (cs)

[Submitted on 28 Dec 2024 (v1), last revised 12 Jan 2025 (this version, v2)]

Title:FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

Authors:Jiale Huang, Dehong Gao, Jinxia Zhang, Zechao Zhan, Yang Hu, Xin Wang

View PDF HTML (experimental)

Abstract:Large-scale Vision-Language Pre-training (VLP) has demonstrated remarkable success in the general domain. However, in the fashion domain, items are distinguished by fine-grained attributes like texture and material, which are crucial for tasks such as retrieval. Existing models often fail to leverage these fine-grained attributes from both text and image modalities. To address the above issues, we propose a novel approach for the fashion domain, Fine-grained Attributes Enhanced VLP (FashionFAE), which focuses on the detailed characteristics of fashion data. An attribute-emphasized text prediction task is proposed to predict fine-grained attributes of the items. This forces the model to focus on the salient attributes from the text modality. Additionally, a novel attribute-promoted image reconstruction task is proposed, which further enhances the fine-grained ability of the model by leveraging the representative attributes from the image modality. Extensive experiments show that FashionFAE significantly outperforms State-Of-The-Art (SOTA) methods, achieving 2.9% and 5.2% improvements in retrieval on sub-test and full test sets, respectively, and a 1.6% average improvement in recognition tasks.

Comments:	5 pages, Accepted by ICASSP2025, full paper
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.19997 [cs.CV]
	(or arXiv:2412.19997v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.19997

Submission history

From: JiaLe Huang [view email]
[v1] Sat, 28 Dec 2024 03:45:49 UTC (1,872 KB)
[v2] Sun, 12 Jan 2025 07:27:03 UTC (2,064 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators