Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models
Authors:
Alex Lamb,
Riashat Islam,
Yonathan Efroni,
Aniket Didolkar,
Dipendra Misra,
Dylan Foster,
Lekan Molu,
Rajan Chari,
Akshay Krishnamurthy,
John Langford
Abstract:
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex…
▽ More
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.
△ Less
Submitted 27 December, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
Efficient Contextual Bandits with Continuous Actions
Authors:
Maryam Majzoubi,
Chicheng Zhang,
Rajan Chari,
Akshay Krishnamurthy,
John Langford,
Aleksandrs Slivkins
Abstract:
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
△ Less
Submitted 3 December, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.