Rewarding Chatbots for Real-World Engagement with Millions of Users

Irvine, Robert; Boubert, Douglas; Raina, Vyas; Liusie, Adian; Zhu, Ziyi; Mudupalli, Vineet; Korshuk, Aliaksei; Liu, Zongyi; Cremer, Fritz; Assassi, Valentin; Beauchamp, Christie-Carol; Lu, Xiaoding; Rialan, Thomas; Beauchamp, William

Computer Science > Computation and Language

arXiv:2303.06135 (cs)

[Submitted on 10 Mar 2023 (v1), last revised 30 Mar 2023 (this version, v2)]

Title:Rewarding Chatbots for Real-World Engagement with Millions of Users

Authors:Robert Irvine, Douglas Boubert, Vyas Raina, Adian Liusie, Ziyi Zhu, Vineet Mudupalli, Aliaksei Korshuk, Zongyi Liu, Fritz Cremer, Valentin Assassi, Christie-Carol Beauchamp, Xiaoding Lu, Thomas Rialan, William Beauchamp

View PDF

Abstract:The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2303.06135 [cs.CL]
	(or arXiv:2303.06135v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.06135

Submission history

From: Douglas Boubert [view email]
[v1] Fri, 10 Mar 2023 18:53:52 UTC (8,150 KB)
[v2] Thu, 30 Mar 2023 18:28:05 UTC (8,146 KB)

Computer Science > Computation and Language

Title:Rewarding Chatbots for Real-World Engagement with Millions of Users

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Rewarding Chatbots for Real-World Engagement with Millions of Users

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators