Revisiting Automated Topic Model Evaluation with Large Language Models

Stammbach, Dominik; Zouhar, Vilém; Hoyle, Alexander; Sachan, Mrinmaya; Ash, Elliott

Computer Science > Computation and Language

arXiv:2305.12152 (cs)

[Submitted on 20 May 2023 (v1), last revised 22 Oct 2023 (this version, v2)]

Title:Revisiting Automated Topic Model Evaluation with Large Language Models

Authors:Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, Elliott Ash

View PDF

Abstract:Topic models are used to make sense of large text collections. However, automatically evaluating topic model output and determining the optimal number of topics both have been longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models to evaluate such output. We find that large language models appropriately assess the resulting topics, correlating more strongly with human judgments than existing automated metrics. We then investigate whether we can use large language models to automatically determine the optimal number of topics. We automatically assign labels to documents and choosing configurations with the most pure labels returns reasonable values for the optimal number of topics.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.12152 [cs.CL]
	(or arXiv:2305.12152v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.12152
Journal reference:	Forthcoming in EMNLP 2023

Submission history

From: Dominik Stammbach [view email]
[v1] Sat, 20 May 2023 09:42:00 UTC (119 KB)
[v2] Sun, 22 Oct 2023 09:46:13 UTC (196 KB)

Computer Science > Computation and Language

Title:Revisiting Automated Topic Model Evaluation with Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revisiting Automated Topic Model Evaluation with Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators