-
Model-based Adversarial Meta-Reinforcement Learning
Authors:
Zichuan Lin,
Garrett Thomas,
Guangwen Yang,
Tengyu Ma
Abstract:
Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this pape…
▽ More
Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes Model-based Adversarial Meta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-case sub-optimality gap -- the difference between the optimal return and the return that the algorithm achieves after adaptation -- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the adversarial task for the current model -- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms.
△ Less
Submitted 27 February, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
MOPO: Model-based Offline Policy Optimization
Authors:
Tianhe Yu,
Garrett Thomas,
Lantao Yu,
Stefano Ermon,
James Zou,
Sergey Levine,
Chelsea Finn,
Tengyu Ma
Abstract:
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dangerous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited…
▽ More
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dangerous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited by the learned policy. Despite significant recent progress, the most successful prior methods are model-free and constrain the policy to the support of data, precluding generalization to unseen states. In this paper, we first observe that an existing model-based RL algorithm already produces significant gains in the offline setting compared to model-free approaches. However, standard model-based RL methods, designed for the online setting, do not provide an explicit mechanism to avoid the offline setting's distributional shift issue. Instead, we propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics. We theoretically show that the algorithm maximizes a lower bound of the policy's return under the true MDP. We also characterize the trade-off between the gain and risk of leaving the support of the batch data. Our algorithm, Model-based Offline Policy Optimization (MOPO), outperforms standard model-based RL algorithms and prior state-of-the-art model-free offline RL algorithms on existing offline RL benchmarks and two challenging continuous control tasks that require generalizing from data collected for a different task. The code is available at https://github.com/tianheyu927/mopo.
△ Less
Submitted 22 November, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.
-
LRP2020: Astrostatistics in Canada
Authors:
Gwendolyn Eadie,
Arash Bahramian,
Pauline Barmby,
Radu Craiu,
Derek Bingham,
Renée Hložek,
JJ Kavelaars,
David Stenning,
Samantha Benincasa,
Guillaume Thomas,
Karun Thanjavur,
Jo Bovy,
Jan Cami,
Ray Carlberg,
Sam Lawler,
Adrian Liu,
Henry Ngo,
Mubdi Rahman,
Michael Rupen
Abstract:
(Abridged from Executive Summary) This white paper focuses on the interdisciplinary fields of astrostatistics and astroinformatics, in which modern statistical and computational methods are applied to and developed for astronomical data. Astrostatistics and astroinformatics have grown dramatically in the past ten years, with international organizations, societies, conferences, workshops, and summe…
▽ More
(Abridged from Executive Summary) This white paper focuses on the interdisciplinary fields of astrostatistics and astroinformatics, in which modern statistical and computational methods are applied to and developed for astronomical data. Astrostatistics and astroinformatics have grown dramatically in the past ten years, with international organizations, societies, conferences, workshops, and summer schools becoming the norm. Canada's formal role in astrostatistics and astroinformatics has been relatively limited, but there is a great opportunity and necessity for growth in this area. We conducted a survey of astronomers in Canada to gain information on the training mechanisms through which we learn statistical methods and to identify areas for improvement. In general, the results of our survey indicate that while astronomers see statistical methods as critically important for their research, they lack focused training in this area and wish they had received more formal training during all stages of education and professional development. These findings inform our recommendations for the LRP2020 on how to increase interdisciplinary connections between astronomy and statistics at the institutional, national, and international levels over the next ten years. We recommend specific, actionable ways to increase these connections, and discuss how interdisciplinary work can benefit not only research but also astronomy's role in training Highly Qualified Personnel (HQP) in Canada.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning
Authors:
Nicholas C. Landolfi,
Garrett Thomas,
Tengyu Ma
Abstract:
The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform…
▽ More
The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform sample-efficient adaptation to new tasks at test time. We use significantly fewer samples by performing policy optimization only in a "virtual" environment whose transitions are given by our learned dynamical model. Our algorithm sequentially trains against several tasks. Upon encountering a new task, we first warm-up a policy on our learned dynamical model, which requires no new samples from the environment. We then adapt the dynamical model with samples from this policy in the real environment. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these tasks.
△ Less
Submitted 3 November, 2019; v1 submitted 10 July, 2019;
originally announced July 2019.
-
NeuNetS: An Automated Synthesis Engine for Neural Network Design
Authors:
Atin Sood,
Benjamin Elder,
Benjamin Herta,
Chao Xue,
Costas Bekas,
A. Cristiano I. Malossi,
Debashish Saha,
Florian Scheidegger,
Ganesh Venkataraman,
Gegi Thomas,
Giovanni Mariani,
Hendrik Strobelt,
Horst Samulowitz,
Martin Wistuba,
Matteo Manica,
Mihir Choudhury,
Rong Yan,
Roxana Istrate,
Ruchir Puri,
Tejaswini Pedapati
Abstract:
Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebui…
▽ More
Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM's AI OpenScale's product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Knowing what you know in brain segmentation using Bayesian deep neural networks
Authors:
Patrick McClure,
Nao Rho,
John A. Lee,
Jakub R. Kaczmarzyk,
Charles Zheng,
Satrajit S. Ghosh,
Dylan Nielson,
Adam G. Thomas,
Peter Bandettini,
Francisco Pereira
Abstract:
In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using…
▽ More
In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using a novel spike-and-slab dropout-based variational inference approach. We show that, on these datasets, the proposed Bayesian DNN outperforms previously proposed methods, in terms of the similarity between the segmentation predictions and the FreeSurfer labels, and the usefulness of the estimate uncertainty of these predictions. In particular, we demonstrated that the prediction uncertainty of this network at each voxel is a good indicator of whether the network has made an error and that the uncertainty across the whole brain can predict the manual quality control ratings of a scan. The proposed Bayesian DNN method should be applicable to any new network architecture for addressing the segmentation problem.
△ Less
Submitted 18 September, 2019; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Analysis of a longitudinal multilevel experiment using GAMLSSs
Authors:
Gustavo Thomas,
Alexandre Igor de Azevedo Pereira,
Cristian Marcelo Villegas Lobos,
Clarice G. B. Demétrio.
Abstract:
The standard procedures for analysing hierarquical or grouped data are by (non)linear mixed models or generalized mixed models. However, the generalized additive models for location, scale and shape (GAMLSSs) also allow different types of random effects to be included in the model formulation. Even though already popular in many areas of research, this type of models have not been found to be used…
▽ More
The standard procedures for analysing hierarquical or grouped data are by (non)linear mixed models or generalized mixed models. However, the generalized additive models for location, scale and shape (GAMLSSs) also allow different types of random effects to be included in the model formulation. Even though already popular in many areas of research, this type of models have not been found to be used for mixed modeling purposes yet. Therefore, this paper describes the analysis of an experiment with plants' growth using mixed GAMLSSs, comparing it to a linear mixed model approach.
△ Less
Submitted 7 October, 2018;
originally announced October 2018.
-
Modeling data with zero inflation and overdispersion using GAMLSSs
Authors:
Gustavo Thomas,
Luiz R. Nakamura,
Rafael A. Moral,
Clarice G. B. Demétrio
Abstract:
Count data with high frequencies of zeros are found in many areas, specially in biology. Statistical models to analyze such data started to be developed in the 80s and are still a topic of active research. Such models usually assume a response distribution that belongs to the exponential family of distributions and the analysis is performed under the generalized linear models framework. However, t…
▽ More
Count data with high frequencies of zeros are found in many areas, specially in biology. Statistical models to analyze such data started to be developed in the 80s and are still a topic of active research. Such models usually assume a response distribution that belongs to the exponential family of distributions and the analysis is performed under the generalized linear models framework. However, the generalized additive models for location, scale and shape (GAMLSSs) represent a more general class of univariate models that can also be used to model zero inflated data. In this paper, the analysis of a data set with excess of zeros and overdispersion is described using GAMLSSs. Specific GAMLSSs' tools were used in the analysis, which enhanced model comparison and eased the interpretation of results.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
Value Iteration Networks
Authors:
Aviv Tamar,
Yi Wu,
Garrett Thomas,
Sergey Levine,
Pieter Abbeel
Abstract:
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a…
▽ More
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.
△ Less
Submitted 20 March, 2017; v1 submitted 9 February, 2016;
originally announced February 2016.