Skip to main content

Showing 1–26 of 26 results for author: Tamar, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2306.02418  [pdf, other

    cs.LG cs.AI stat.ML

    ContraBAR: Contrastive Bayes-Adaptive Deep RL

    Authors: Era Choshen, Aviv Tamar

    Abstract: In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task that is sampled from some known task distribution. Previous approaches tackled this problem by inferring a belief over task parameters, using variational inference methods. Motivated by recent successes of contrastive learning approaches in RL, such as contrastive predi… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: ICML 2023. Pytorch code available at https://github.com/ec2604/ContraBAR

  2. arXiv:2301.01320  [pdf, ps, other

    cs.LG stat.ML

    Towards Deployable RL -- What's Broken with RL Research and a Potential Fix

    Authors: Shie Mannor, Aviv Tamar

    Abstract: Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams. We point to some difficulties with current research which we feel are endemic to the direction taken by the community. To us, the current direction is not likely to lead to "deployable" RL: RL that works in practice and can work in practical situations yet still is economically viable… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  3. arXiv:2109.11792  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

    Authors: Aviv Tamar, Daniel Soudry, Ev Zisselman

    Abstract: In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of $N$ problem instances from the prior, with the hope that for lar… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  4. arXiv:2008.02598  [pdf, other

    cs.LG cs.AI stat.ML

    Offline Meta Learning of Exploration

    Authors: Ron Dorfman, Idan Shenfeld, Aviv Tamar

    Abstract: Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-a… ▽ More

    Submitted 12 February, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

  5. arXiv:2001.05419  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Residual Flow for Out of Distribution Detection

    Authors: Ev Zisselman, Aviv Tamar

    Abstract: The effective application of neural networks in the real-world relies on proficiently detecting out-of-distribution examples. Contemporary methods seek to model the distribution of feature activations in the training data for adequately distinguishing abnormalities, and the state-of-the-art method uses Gaussian distribution models. In this work, we present a novel approach that improves upon the s… ▽ More

    Submitted 19 July, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

  6. arXiv:1911.04971  [pdf, other

    cs.LG cs.CV cs.RO stat.ML

    Deep Variational Semi-Supervised Novelty Detection

    Authors: Tal Daniel, Thanard Kurutach, Aviv Tamar

    Abstract: In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we p… ▽ More

    Submitted 4 November, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2021 Workshop on DGMs and Downstream Applications

  7. arXiv:1906.05329  [pdf, other

    cs.LG cs.AI stat.ML

    Sub-Goal Trees -- a Framework for Goal-Directed Trajectory Prediction and Optimization

    Authors: Tom Jurgenson, Edward Groshev, Aviv Tamar

    Abstract: Many AI problems, in robotics and other domains, are goal-directed, essentially seeking a trajectory leading to some goal state. In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization. Interestingly, most all prior work in imitation and reinforcement learning builds on a sequential trajectory representation -- calculating the n… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: 15 pages (8 main), 2 figures, 4 tables

  8. arXiv:1906.00214  [pdf, other

    cs.RO cs.LG stat.ML

    Harnessing Reinforcement Learning for Neural Motion Planning

    Authors: Tom Jurgenson, Aviv Tamar

    Abstract: Motion planning is an essential component in most of today's robotic applications. In this work, we consider the learning setting, where a set of solved motion planning problems is used to improve the efficiency of motion planning on different, yet similar problems. This setting is important in applications with rapidly changing environments such as in e-commerce, among others. We investigate a ge… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

    Comments: 13 pages (all), 8 pages (main sections), 6 figures, 4 tables, accepted to rss2019

  9. arXiv:1901.10251  [pdf, other

    cs.LG stat.ML

    Multi-Agent Reinforcement Learning with Multi-Step Generative Models

    Authors: Orr Krupnik, Igor Mordatch, Aviv Tamar

    Abstract: We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace. For non-trivial dynamical systems, MBRL typically suffers from accumulating errors. Several recent studies have addressed this problem by learning latent variable models for trajectory segments and optimiz… ▽ More

    Submitted 1 November, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

  10. arXiv:1809.10842  [pdf, other

    cs.LG cs.AI stat.ML

    Learning and Planning with a Semantic Model

    Authors: Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

    Abstract: Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are visually diverse but contain intrinsic semantic regularities. We propose a hybrid model-based and model-free approach, LEArning and Planning with Semantics (LEAPS… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: submitted to ICLR 2019

  11. arXiv:1808.01960  [pdf, other

    cs.LG stat.ML

    Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

    Authors: Dror Freirich, Ron Meir, Aviv Tamar

    Abstract: The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep gen… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  12. arXiv:1807.09341  [pdf, other

    cs.LG cs.AI cs.CV cs.NE cs.RO stat.ML

    Learning Plannable Representations with Causal InfoGAN

    Authors: Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel

    Abstract: In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual plans -- a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used… ▽ More

    Submitted 24 July, 2018; originally announced July 2018.

    Comments: ICML / IJCAI / AAMAS 2018 Workshop on Planning and Learning (PAL-18)

  13. arXiv:1805.07805  [pdf, other

    cs.LG cs.AI stat.ML

    Constrained Policy Improvement for Safe and Efficient Reinforcement Learning

    Authors: Elad Sarafian, Aviv Tamar, Sarit Kraus

    Abstract: We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the $Q$-value from finite past experience data. Greedy policies or even constrained policy optimization algorithms which ignore these errors may suffer… ▽ More

    Submitted 10 July, 2019; v1 submitted 20 May, 2018; originally announced May 2018.

  14. arXiv:1711.08534  [pdf, other

    cs.LG cs.AI stat.ML

    Safer Classification by Synthesis

    Authors: William Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel

    Abstract: The discriminative approach to classification using deep neural networks has become the de-facto standard in various fields. Complementing recent reservations about safety against adversarial examples, we show that conventional discriminative methods can easily be fooled to provide incorrect labels with very high confidence to out of distribution examples. We posit that a generative approach is th… ▽ More

    Submitted 23 July, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

  15. arXiv:1705.07461  [pdf, other

    cs.AI cs.LG stat.ML

    Shallow Updates for Deep Reinforcement Learning

    Authors: Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

    Abstract: Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the ot… ▽ More

    Submitted 2 November, 2017; v1 submitted 21 May, 2017; originally announced May 2017.

  16. arXiv:1609.04436  [pdf, other

    cs.AI cs.LG stat.ML

    Bayesian Reinforcement Learning: A Survey

    Authors: Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

    Abstract: Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-se… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.

    Journal ref: Foundations and Trends in Machine Learning, Vol. 8: No. 5-6, pp 359-492, 2015

  17. arXiv:1602.02867  [pdf, other

    cs.AI cs.LG cs.NE stat.ML

    Value Iteration Networks

    Authors: Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel

    Abstract: We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a… ▽ More

    Submitted 20 March, 2017; v1 submitted 9 February, 2016; originally announced February 2016.

    Comments: Fixed missing table values

    Journal ref: Advances in Neural Information Processing Systems 29 pages 2154--2162, 2016

  18. arXiv:1509.05172  [pdf, ps, other

    stat.ML cs.LG

    Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

    Authors: Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

    Abstract: We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \emph{emphatic temporal differences} (ETD) algorithm \citep{SuttonMW15}, which encompasses the original ETD($λ$), as well as several other off-policy evaluation algorithms as special cases. We call this framework \ETD, where our introduced p… ▽ More

    Submitted 27 November, 2015; v1 submitted 17 September, 2015; originally announced September 2015.

    Comments: arXiv admin note: text overlap with arXiv:1508.03411

  19. arXiv:1508.03411  [pdf, ps, other

    stat.ML cs.LG

    Emphatic TD Bellman Operator is a Contraction

    Authors: Assaf Hallak, Aviv Tamar, Shie Mannor

    Abstract: Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $\sqrtγ$-contraction modulus (where $γ$ is the discount factor). This allows us to provide error bounds on the approximation er… ▽ More

    Submitted 23 August, 2015; v1 submitted 13 August, 2015; originally announced August 2015.

  20. arXiv:1502.03919  [pdf, other

    cs.AI cs.LG stat.ML

    Policy Gradient for Coherent Risk Measures

    Authors: Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

    Abstract: Several authors have recently developed risk-sensitive policy gradient methods that augment the standard expected cost minimization problem with a measure of variability in cost. These studies have focused on specific risk-measures, such as the variance or conditional value at risk (CVaR). In this work, we extend the policy gradient method to the whole class of coherent risk measures, which is wid… ▽ More

    Submitted 8 June, 2015; v1 submitted 13 February, 2015; originally announced February 2015.

  21. arXiv:1412.6734  [pdf, other

    stat.ML cs.LG

    Implicit Temporal Differences

    Authors: Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

    Abstract: In reinforcement learning, the TD($λ$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($λ$) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability… ▽ More

    Submitted 21 December, 2014; originally announced December 2014.

  22. arXiv:1404.3862  [pdf, other

    stat.ML cs.AI cs.LG

    Optimizing the CVaR via Sampling

    Authors: Aviv Tamar, Yonatan Glassner, Shie Mannor

    Abstract: Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the conv… ▽ More

    Submitted 22 November, 2014; v1 submitted 15 April, 2014; originally announced April 2014.

    Comments: To appear in AAAI 2015

  23. arXiv:1310.3697  [pdf, ps, other

    stat.ML cs.LG eess.SY

    Variance Adjusted Actor Critic Algorithms

    Authors: Aviv Tamar, Shie Mannor

    Abstract: We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.

    Submitted 14 October, 2013; originally announced October 2013.

  24. arXiv:1306.6189  [pdf, other

    cs.LG stat.ML

    Scaling Up Robust MDPs by Reinforcement Learning

    Authors: Aviv Tamar, Huan Xu, Shie Mannor

    Abstract: We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large… ▽ More

    Submitted 26 June, 2013; originally announced June 2013.

  25. arXiv:1301.0104  [pdf, other

    cs.LG stat.ML

    Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

    Authors: Aviv Tamar, Dotan Di Castro, Shie Mannor

    Abstract: In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility… ▽ More

    Submitted 1 January, 2013; originally announced January 2013.

    Journal ref: JMLR Workshop and Conference Proceedings 28 (3): 495-503, 2013

  26. arXiv:1206.6404  [pdf

    cs.LG cs.CY math.OC stat.ML

    Policy Gradients with Variance Related Risk Criteria

    Authors: Dotan Di Castro, Aviv Tamar, Shie Mannor

    Abstract: Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for l… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)