Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Kaushik, Rituraj; Chatzilygeroudis, Konstantinos; Mouret, Jean-Baptiste

Computer Science > Machine Learning

arXiv:1806.09351 (cs)

[Submitted on 25 Jun 2018 (v1), last revised 3 Mar 2020 (this version, v3)]

Title:Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Authors:Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret

View PDF

Abstract:The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

Comments:	Conference on Robot Learning (CoRL)- 2018; Code at this https URL ; Video at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as:	arXiv:1806.09351 [cs.LG]
	(or arXiv:1806.09351v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1806.09351
Journal reference:	Proceedings of the Conference on Robot Learning, PMLR 87:839-855, 2018

Submission history

From: Rituraj Kaushik [view email]
[v1] Mon, 25 Jun 2018 09:46:47 UTC (2,317 KB)
[v2] Thu, 11 Oct 2018 10:20:33 UTC (2,501 KB)
[v3] Tue, 3 Mar 2020 22:57:46 UTC (2,507 KB)

Computer Science > Machine Learning

Title:Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators