Causally Correct Partial Models for Reinforcement Learning
Authors:
Danilo J. Rezende,
Ivo Danihelka,
George Papamakarios,
Nan Rosemary Ke,
Ray Jiang,
Theophane Weber,
Karol Gregor,
Hamza Merzic,
Fabio Viola,
Jane Wang,
Jovana Mitrovic,
Frederic Besse,
Ioannis Antonoglou,
Lars Buesing
Abstract:
In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this pa…
▽ More
In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don't model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
Shaping Belief States with Generative Environment Models for RL
Authors:
Karol Gregor,
Danilo Jimenez Rezende,
Frederic Besse,
Yan Wu,
Hamza Merzic,
Aaron van den Oord
Abstract:
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show tha…
▽ More
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.
△ Less
Submitted 24 June, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.