-
Minimax-Bayes Reinforcement Learning
Authors:
Thomas Kleine Buening,
Christos Dimitrakakis,
Hannes Eriksson,
Divya Grover,
Emilio Jorge
Abstract:
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) min…
▽ More
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Adaptive Belief Discretization for POMDP Planning
Authors:
Divya Grover,
Christos Dimitrakakis
Abstract:
Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's comple…
▽ More
Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's complexity (and hence planning complexity) is characterized by its covering number. Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number. We instead propose an adaptive belief discretization scheme, and give its associated planning error. We furthermore characterize the covering number with respect to the POMDP parameters. This allows us to specify the exact memory requirements on the planner, needed to bound the value function error. We then propose a novel, computationally efficient solver using this scheme. We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
A Maneuver-based Urban Driving Dataset and Model for Cooperative Vehicle Applications
Authors:
Behrad Toghi,
Divas Grover,
Mahdi Razzaghpour,
Rajat Jain,
Rodolfo Valiente,
Mahdi Zaman,
Ghayoor Shah,
Yaser P. Fallah
Abstract:
Short-term future of automated driving can be imagined as a hybrid scenario in which both automated and human-driven vehicles co-exist in the same environment. In order to address the needs of such road configuration, many technology solutions such as vehicular communication and predictive control for automated vehicles have been introduced in the literature. Both aforementioned solutions rely on…
▽ More
Short-term future of automated driving can be imagined as a hybrid scenario in which both automated and human-driven vehicles co-exist in the same environment. In order to address the needs of such road configuration, many technology solutions such as vehicular communication and predictive control for automated vehicles have been introduced in the literature. Both aforementioned solutions rely on driving data of the human driver. In this work, we investigate the currently available driving datasets and introduce a real-world maneuver-based driving dataset that is collected during our urban driving data collection campaign. We also provide a model that embeds the patterns in maneuver-specific samples. Such model can be employed for classification and prediction purposes.
△ Less
Submitted 21 August, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning
Authors:
Hannes Eriksson,
Emilio Jorge,
Christos Dimitrakakis,
Debabrota Basu,
Divya Grover
Abstract:
Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumpti…
▽ More
Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumptions or approximations. We describe a novel Bayesian framework, Inferential Induction, for correctly inferring value function distributions from data, which leads to the development of a new class of BRL algorithms. We design an algorithm, Bayesian Backwards Induction, with this framework. We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.
△ Less
Submitted 1 July, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
Bayesian Reinforcement Learning via Deep, Sparse Sampling
Authors:
Divya Grover,
Debabrota Basu,
Christos Dimitrakakis
Abstract:
We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term…
▽ More
We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.
△ Less
Submitted 27 June, 2020; v1 submitted 7 February, 2019;
originally announced February 2019.
-
MNIST Dataset Classification Utilizing k-NN Classifier with Modified Sliding-window Metric
Authors:
Divas Grover,
Behrad Toghi
Abstract:
The MNIST dataset of the handwritten digits is known as one of the commonly used datasets for machine learning and computer vision research. We aim to study a widely applicable classification problem and apply a simple yet efficient K-nearest neighbor classifier with an enhanced heuristic. We evaluate the performance of the K-nearest neighbor classification algorithm on the MNIST dataset where the…
▽ More
The MNIST dataset of the handwritten digits is known as one of the commonly used datasets for machine learning and computer vision research. We aim to study a widely applicable classification problem and apply a simple yet efficient K-nearest neighbor classifier with an enhanced heuristic. We evaluate the performance of the K-nearest neighbor classification algorithm on the MNIST dataset where the $L2$ Euclidean distance metric is compared to a modified distance metric which utilizes the sliding window technique in order to avoid performance degradation due to slight spatial misalignments. The accuracy metric and confusion matrices are used as the performance indicators to compare the performance of the baseline algorithm versus the enhanced sliding window method and results show significant improvement using this proposed method.
△ Less
Submitted 12 March, 2019; v1 submitted 18 September, 2018;
originally announced September 2018.