Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

Wang, Yuan; Wu, Xuyang; Wu, Hsin-Tai; Tao, Zhiqiang; Fang, Yi

Computer Science > Information Retrieval

arXiv:2404.03192 (cs)

[Submitted on 4 Apr 2024 (v1), last revised 25 Jun 2024 (this version, v2)]

Title:Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

Authors:Yuan Wang, Xuyang Wu, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

View PDF HTML (experimental)

Abstract:The integration of Large Language Models (LLMs) in information retrieval has raised a critical reevaluation of fairness in the text-ranking models. LLMs, such as GPT models and Llama2, have shown effectiveness in natural language understanding tasks, and prior works (e.g., RankGPT) have also demonstrated that the LLMs exhibit better performance than the traditional ranking models in the ranking task. However, their fairness remains largely unexplored. This paper presents an empirical study evaluating these LLMs using the TREC Fair Ranking dataset, focusing on the representation of binary protected attributes such as gender and geographic location, which are historically underrepresented in search outcomes. Our analysis delves into how these LLMs handle queries and documents related to these attributes, aiming to uncover biases in their ranking algorithms. We assess fairness from both user and content perspectives, contributing an empirical benchmark for evaluating LLMs as the fair ranker.

Comments:	Accepted at NAACL 2024 Main Conference
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2404.03192 [cs.IR]
	(or arXiv:2404.03192v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2404.03192

Submission history

From: Xuyang Wu [view email]
[v1] Thu, 4 Apr 2024 04:23:19 UTC (8,149 KB)
[v2] Tue, 25 Jun 2024 20:54:16 UTC (8,150 KB)

Computer Science > Information Retrieval

Title:Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators