Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills

Yamagata, Taku; McConville, Ryan; Santos-Rodriguez, Raul

Computer Science > Machine Learning

arXiv:2111.08596 (cs)

[Submitted on 16 Nov 2021]

Title:Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills

Authors:Taku Yamagata, Ryan McConville, Raul Santos-Rodriguez (Department of Engineering Mathematics, University of Bristol)

View PDF

Abstract:A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback and that way incorporating prior knowledge of the target environment. It is, however, often too expensive to obtain enough feedback of good quality. To mitigate the issue, we aim to rely on a group of multiple experts (and non-experts) with different skill levels to generate enough feedback. Such feedback can therefore be inconsistent and infrequent. In this paper, we build upon prior work -- Advise, a Bayesian approach attempting to maximise the information gained from human feedback -- extending the algorithm to accept feedback from this larger group of humans, the trainers, while also estimating each trainer's reliability. We show how aggregating feedback from multiple trainers improves the total feedback's accuracy and make the collection process easier in two ways. Firstly, this approach addresses the case of some of the trainers being adversarial. Secondly, having access to the information about each trainer reliability provides a second layer of robustness and offers valuable information for people managing the whole system to improve the overall trust in the system. It offers an actionable tool for improving the feedback collection process or modifying the reward function design if needed. We empirically show that our approach can accurately learn the reliability of each trainer correctly and use it to maximise the information gained from the multiple trainers' feedback, even if some of the sources are adversarial.

Comments:	Accepted NeurIPS 2021 Workshop on Safe and Robust Control of Uncertain Systems. arXiv admin note: text overlap with arXiv:1908.06134
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2111.08596 [cs.LG]
	(or arXiv:2111.08596v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2111.08596

Submission history

From: Taku Yamagata [view email]
[v1] Tue, 16 Nov 2021 16:19:19 UTC (1,301 KB)

Computer Science > Machine Learning

Title:Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators