Search | arXiv e-print repository

arXiv:2010.14680 [pdf, other]

Learning to Represent Action Values as a Hypergraph on the Action Vertices

Authors: Arash Tavakoli, Mehdi Fatemi, Petar Kormushev

Abstract: Action-value estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate action-value estimation. While advances in deep learning have seamlessly dr… ▽ More Action-value estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate action-value estimation. While advances in deep learning have seamlessly driven progress in learning state representations, given the specificity of the notion of agency to RL, little attention has been paid to learning action representations. We conjecture that leveraging the combinatorial structure of multi-dimensional action spaces is a key ingredient for learning good representations of action. To test this, we set forth the action hypergraph networks framework -- a class of functions for learning action representations in multi-dimensional discrete action spaces with a structural inductive bias. Using this framework we realise an agent class based on a combination with deep Q-networks, which we dub hypergraph Q-networks. We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and discretised physical control benchmarks. △ Less

Submitted 20 June, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: ICLR 2021, code: https://github.com/atavakol/action-hypergraph-networks

arXiv:1906.00572 [pdf, other]

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Authors: Harm van Seijen, Mehdi Fatemi, Arash Tavakoli

Abstract: In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis tha… ▽ More In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods. △ Less

Submitted 23 December, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: NeurIPS 2019, code: https://github.com/microsoft/logrl

arXiv:1811.03154 [pdf, other]

doi 10.1109/TSP.2017.2675866

Poisson Multi-Bernoulli Mapping Using Gibbs Sampling

Authors: Maryam Fatemi, Karl Granström, Lennart Svensson, Francisco J. R. Ruiz, Lars Hammarstrand

Abstract: This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multi-object posterior density of the map given a set of measurements. The landmarks in the map are modeled as extended objects, and the measurements are described as a Poisson process, conditioned on the map. We use a Poisson process prior on the map and prove that the posterior distribut… ▽ More This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multi-object posterior density of the map given a set of measurements. The landmarks in the map are modeled as extended objects, and the measurements are described as a Poisson process, conditioned on the map. We use a Poisson process prior on the map and prove that the posterior distribution is a hybrid Poisson, multi-Bernoulli mixture distribution. We devise a Gibbs sampling algorithm to sample from the batch multi-object posterior. The proposed method can handle uncertainties in the data associations and the cardinality of the set of landmarks, and is parallelizable, making it suitable for large-scale problems. The performance of the proposed method is evaluated on synthetic data and is shown to outperform a state-of-the-art method. △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: 14 pages, 6 figures

Journal ref: IEEE Transactions on Signal Processing, Vol. 65, Issue 11, June 2017

arXiv:1704.00756 [pdf, other]

Multi-Advisor Reinforcement Learning

Authors: Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen

Abstract: We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the… ▽ More We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefficient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical findings on a fruit collection task. △ Less

Submitted 14 November, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: Submitted at ICLR2018

arXiv:1605.06311 [pdf, other]

doi 10.1109/TAES.2019.2920220

Poisson multi-Bernoulli conjugate prior for multiple extended object filtering

Authors: Karl Granstrom, Maryam Fatemi, Lennart Svensson

Abstract: This paper presents a Poisson multi-Bernoulli mixture (PMBM) conjugate prior for multiple extended object filtering. A Poisson point process is used to describe the existence of yet undetected targets, while a multi-Bernoulli mixture describes the distribution of the targets that have been detected. The prediction and update equations are presented for the standard transition density and measureme… ▽ More This paper presents a Poisson multi-Bernoulli mixture (PMBM) conjugate prior for multiple extended object filtering. A Poisson point process is used to describe the existence of yet undetected targets, while a multi-Bernoulli mixture describes the distribution of the targets that have been detected. The prediction and update equations are presented for the standard transition density and measurement likelihood. Both the prediction and the update preserve the PMBM form of the density, and in this sense the PMBM density is a conjugate prior. However, the unknown data associations lead to an intractably large number of terms in the PMBM density, and approximations are necessary for tractability. A gamma Gaussian inverse Wishart implementation is presented, along with methods to handle the data association problem. A simulation study shows that the extended target PMBM filter performs well in comparison to the extended target d-GLMB and LMB filters. An experiment with Lidar data illustrates the benefit of tracking both detected and undetected targets. △ Less

Submitted 6 December, 2019; v1 submitted 20 May, 2016; originally announced May 2016.

Showing 1–5 of 5 results for author: Fatemi, M