Non-Markovian Control with Gated End-to-End Memory Policy Networks

Perez, Julien; Silander, Tomi

Statistics > Machine Learning

arXiv:1705.10993 (stat)

[Submitted on 31 May 2017]

Title:Non-Markovian Control with Gated End-to-End Memory Policy Networks

Authors:Julien Perez, Tomi Silander

View PDF

Abstract:Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning.

Comments:	11 pages, 1 figure, 1 table
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1705.10993 [stat.ML]
	(or arXiv:1705.10993v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1705.10993

Submission history

From: Julien Perez [view email]
[v1] Wed, 31 May 2017 09:00:44 UTC (131 KB)

Statistics > Machine Learning

Title:Non-Markovian Control with Gated End-to-End Memory Policy Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Non-Markovian Control with Gated End-to-End Memory Policy Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators