Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Cheng, Ruoxi; Ma, Haoxuan; Cao, Shuirong; Li, Jiaqi; Pei, Aihua; Wang, Zhiqiang; Ji, Pengliang; Wang, Haoyu; Huo, Jiaqi

Computer Science > Artificial Intelligence

arXiv:2404.10160 (cs)

[Submitted on 15 Apr 2024 (v1), last revised 16 Aug 2024 (this version, v6)]

Title:Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Authors:Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

View PDF HTML (experimental)

Abstract:Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning. Our approach comprises two modes: (1) self-reflection, where the same LLM participates in multi-role debates, and (2) teacher-student, where a more advanced LLM like GPT-3.5-turbo guides the LLM to perform this task. Experimental results across different LLMs on BBQ and our datasets demonstrate the effectiveness of our approach in bias mitigation. Our source code and datasets are available at \texttt{this https URL}.

Comments:	The first three authors contributed equally to this work
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.10160 [cs.AI]
	(or arXiv:2404.10160v6 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2404.10160

Submission history

From: Rosy Cheng [view email]
[v1] Mon, 15 Apr 2024 22:18:50 UTC (2,079 KB)
[v2] Sun, 28 Apr 2024 04:08:39 UTC (2,239 KB)
[v3] Wed, 12 Jun 2024 12:42:28 UTC (2,829 KB)
[v4] Sun, 16 Jun 2024 16:34:42 UTC (1 KB) (withdrawn)
[v5] Tue, 18 Jun 2024 16:19:40 UTC (4,572 KB)
[v6] Fri, 16 Aug 2024 12:20:22 UTC (3,071 KB)

Computer Science > Artificial Intelligence

Title:Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators