Skip to main content

Showing 1–7 of 7 results for author: Imagawa, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.14377  [pdf, other

    cs.LG cs.AI cs.RO

    Unsupervised Discovery of Continuous Skills on a Sphere

    Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

    Abstract: Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing methods learn a finite number of discrete skills, and thus the variety of behaviors that can be exhibited with the learned skills is limited. In this paper, we propose a novel method for learn… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: 14 pages, 12 figures

  2. arXiv:2110.02034  [pdf, other

    cs.LG cs.AI

    Dropout Q-Functions for Doubly Efficient Reinforcement Learning

    Authors: Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al.,… ▽ More

    Submitted 16 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Source code: https://github.com/TakuyaHiraoka/Dropout-Q-Functions-for-Doubly-Efficient-Reinforcement-Learning Poster: https://drive.google.com/file/d/1_JSuwlUsMjzo6zRaAIcXXj3__AmOvu2t/view?usp=sharing Slides: https://drive.google.com/file/d/1ecq9SQ2KSNpfeblCkr6TYPz5gRk_Y4S8/view?usp=sharing

  3. arXiv:2101.01883  [pdf, other

    cs.AI

    Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

    Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

    Abstract: Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: 14pages

  4. arXiv:2006.02608  [pdf, ps, other

    cs.LG stat.ML

    Meta-Model-Based Meta-Policy Optimization

    Authors: Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarante… ▽ More

    Submitted 11 October, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: ACML 2021. Video demo: https://drive.google.com/file/d/1DRA-pmIWnHGNv5G_gFrml8YzKCtMcGnu/view?usp=sharing URL Source code: https://github.com/TakuyaHiraoka/Meta-Model-Based-Meta-Policy-Optimization

  5. arXiv:1906.11075  [pdf, other

    cs.LG cs.AI

    Optimistic Proximal Policy Optimization

    Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

    Abstract: Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated to… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: Exploration in RL (workshop @ ICML2019)

  6. arXiv:1905.09191  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Learning Robust Options by Conditional Value at Risk Optimization

    Authors: Takuya Hiraoka, Takahisa Imagawa, Tatsuya Mori, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces op… ▽ More

    Submitted 31 October, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019. Video demo: https://drive.google.com/open?id=1xXgSeEa_nNG397ZkIayk3CwYPy_BPy8X Source codes: https://github.com/TakuyaHiraoka/Learning-Robust-Options-by-Conditional-Value-at-Risk-Optimization

  7. arXiv:1810.00177  [pdf, ps, other

    cs.AI

    Refining Manually-Designed Symbol Grounding and High-Level Planning by Policy Gradients

    Authors: Takuya Hiraoka, Takashi Onishi, Takahisa Imagawa, Yoshimasa Tsuruoka

    Abstract: Hierarchical planners that produce interpretable and appropriate plans are desired, especially in its application to supporting human decision making. In the typical development of the hierarchical planners, higher-level planners and symbol grounding functions are manually created, and this manual creation requires much human effort. In this paper, we propose a framework that can automatically ref… ▽ More

    Submitted 29 September, 2018; originally announced October 2018.

    Comments: presented at the IJCAI-ICAI 2018 workshop on Learning & Reasoning (L&R 2018)