Online Meta-learning by Parallel Algorithm Competition

Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

Abstract:The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10$\times$10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa($\lambda$) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.

Comments:	15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.03118
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1702.07490 [cs.LG]
	(or arXiv:1702.07490v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1702.07490

Computer Science > Machine Learning

Title:Online Meta-learning by Parallel Algorithm Competition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators