Study on LLMs for Promptagator-Style Dense Retriever Training

Gwon, Daniel; Jedidi, Nour; Lin, Jimmy

Computer Science > Information Retrieval

arXiv:2510.02241 (cs)

[Submitted on 2 Oct 2025]

Title:Study on LLMs for Promptagator-Style Dense Retriever Training

Authors:Daniel Gwon, Nour Jedidi, Jimmy Lin

View PDF HTML (experimental)

Abstract:Promptagator demonstrated that Large Language Models (LLMs) with few-shot prompts can be used as task-specific query generators for fine-tuning domain-specialized dense retrieval models. However, the original Promptagator approach relied on proprietary and large-scale LLMs which users may not have access to or may be prohibited from using with sensitive data. In this work, we study the impact of open-source LLMs at accessible scales ($\leq$14B parameters) as an alternative. Our results demonstrate that open-source LLMs as small as 3B parameters can serve as effective Promptagator-style query generators. We hope our work will inform practitioners with reliable alternatives for synthetic data generation and give insights to maximize fine-tuning results for domain-specific applications.

Comments:	CIKM 2025 short research paper
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2510.02241 [cs.IR]
	(or arXiv:2510.02241v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2510.02241

Submission history

From: Nour Jedidi [view email]
[v1] Thu, 2 Oct 2025 17:29:51 UTC (79 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2025-10

Change to browse by:

cs
cs.CL

References & Citations

export BibTeX citation

Computer Science > Information Retrieval

Title:Study on LLMs for Promptagator-Style Dense Retriever Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Study on LLMs for Promptagator-Style Dense Retriever Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators