AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Yang, Jheng-Hong; Lassance, Carlos; de Rezende, Rafael Sampaio; Srinivasan, Krishna; Redi, Miriam; Clinchant, Stéphane; Lin, Jimmy

Computer Science > Information Retrieval

arXiv:2304.01961 (cs)

[Submitted on 4 Apr 2023]

Title:AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Authors:Jheng-Hong Yang, Carlos Lassance, Rafael Sampaio de Rezende, Krishna Srinivasan, Miriam Redi, Stéphane Clinchant, Jimmy Lin

View PDF

Abstract:This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval. While vision-language pretrained transformers have led to significant improvements in retrieval effectiveness, existing research has relied on image-caption datasets that feature only simplistic image-text relationships and underspecified user models of retrieval tasks. To address the gap between these oversimplified settings and real-world applications for multimedia content creation, we introduce a new approach for building retrieval test collections. We leverage hierarchical structures and diverse domains of texts, styles, and types of images, as well as large-scale image-document associations embedded in Wikipedia. We formulate two tasks based on a realistic user model and validate our dataset through retrieval experiments using baseline models. AToMiC offers a testbed for scalable, diverse, and reproducible multimedia retrieval research. Finally, the dataset provides the basis for a dedicated track at the 2023 Text Retrieval Conference (TREC), and is publicly available at this https URL.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.01961 [cs.IR]
	(or arXiv:2304.01961v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2304.01961

Submission history

From: Jheng-Hong Yang [view email]
[v1] Tue, 4 Apr 2023 17:11:34 UTC (30,356 KB)

Computer Science > Information Retrieval

Title:AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators