Skip to main content

Showing 1–20 of 20 results for author: Bagnell, J A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2205.15397  [pdf, other

    cs.LG stat.ML

    Minimax Optimal Online Imitation Learning via Replay Estimation

    Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

    Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  2. arXiv:2103.03236  [pdf, other

    cs.LG cs.RO stat.ML

    Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  3. arXiv:2102.02872  [pdf, other

    cs.LG cs.RO stat.ML

    Feedback in Imitation Learning: The Three Regimes of Covariate Shift

    Authors: Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, J. Andrew Bagnell

    Abstract: Imitation learning practitioners have often noted that conditioning policies on previous actions leads to a dramatic divergence between "held out" error and performance of the learner in situ. Interactive approaches can provably address this divergence but require repeated querying of a demonstrator. Recent work identifies this divergence as stemming from a "causal confound" in predicting the curr… ▽ More

    Submitted 11 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

  4. arXiv:2004.00500  [pdf, other

    cs.LG stat.ML

    Exploration in Action Space

    Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

    Abstract: Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when… ▽ More

    Submitted 30 March, 2020; originally announced April 2020.

    Comments: Presented at RSS 2018 in Learning and Inference in Robotics: Integrating Structure, Priors and Models workshop. arXiv admin note: text overlap with arXiv:1901.11503

  5. arXiv:1905.10948  [pdf, other

    cs.LG stat.ML

    Provably Efficient Imitation Learning from Observation Alone

    Authors: Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

    Abstract: We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing a… ▽ More

    Submitted 11 June, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  6. arXiv:1901.11503  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

    Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

    Abstract: Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical ana… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

    Comments: Accepted at AISTATS 2019

  7. arXiv:1805.11240  [pdf, other

    cs.LG stat.ML

    Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

    Authors: Wen Sun, J. Andrew Bagnell, Byron Boots

    Abstract: In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go oracle on the planning horizon and demonstrate that the cost-to-go oracle shortens the learner's planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading to… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Comments: ICLR 2018

  8. arXiv:1805.10755  [pdf, other

    cs.LG stat.ML

    Dual Policy Iteration

    Authors: Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

    Abstract: Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan mul… ▽ More

    Submitted 5 April, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2018; Additional related works

  9. arXiv:1709.08520  [pdf, other

    stat.ML cs.LG

    Predictive-State Decoders: Encoding the Future into Recurrent Networks

    Authors: Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

    Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representa… ▽ More

    Submitted 25 September, 2017; originally announced September 2017.

    Comments: NIPS 2017

  10. arXiv:1609.08938  [pdf, other

    cs.CV stat.ML

    A Discriminative Framework for Anomaly Detection in Large Videos

    Authors: Allison Del Giorno, J. Andrew Bagnell, Martial Hebert

    Abstract: We address an anomaly detection setting in which training sequences are unavailable and anomalies are scored independently of temporal ordering. Current algorithms in anomaly detection are based on the classical density estimation approach of learning high-dimensional models and finding low-probability events. These algorithms are sensitive to the order in which anomalies appear and require either… ▽ More

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: 14 pages without references, 16 pages with. 7 figures. Accepted to ECCV 2016

  11. arXiv:1406.5979  [pdf, ps, other

    cs.LG stat.ML

    Reinforcement and Imitation Learning via Interactive No-Regret Learning

    Authors: Stephane Ross, J. Andrew Bagnell

    Abstract: Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti… ▽ More

    Submitted 23 June, 2014; originally announced June 2014.

    Comments: 14 pages. Under review for NIPS 2014 conference

  12. arXiv:1308.3506  [pdf, other

    cs.GT cs.LG stat.ML

    Computational Rationalization: The Inverse Equilibrium Problem

    Authors: Kevin Waugh, Brian D. Ziebart, J. Andrew Bagnell

    Abstract: Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then… ▽ More

    Submitted 15 August, 2013; originally announced August 2013.

    Comments: In submission to JMLR, conference version: arXiv:1103.5254

  13. arXiv:1305.2532  [pdf, other

    cs.LG stat.ML

    Learning Policies for Contextual Submodular Prediction

    Authors: Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell

    Abstract: Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning.… ▽ More

    Submitted 11 May, 2013; originally announced May 2013.

    Comments: 13 pages. To appear in proceedings of the International Conference on Machine Learning (ICML), 2013

  14. arXiv:1301.0556  [pdf

    cs.LG cs.IR stat.ML

    Learning with Scope, with Application to Information Extraction and Classification

    Authors: David Blei, J Andrew Bagnell, Andrew McCallum

    Abstract: In probabilistic approaches to classification and information extraction, one typically builds a statistical model of words under the assumption that future data will exhibit the same regularities as the training data. In many data sets, however, there are scope-limited features whose predictive power is only applicable to a certain subset of the data. For example, in information extraction from… ▽ More

    Submitted 12 December, 2012; originally announced January 2013.

    Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

    Report number: UAI-P-2002-PG-53-60

  15. arXiv:1206.5281  [pdf

    cs.LG stat.ML

    Learning Selectively Conditioned Forest Structures with Applications to DBNs and Classification

    Authors: Brian D. Ziebart, Anind K. Dey, J Andrew Bagnell

    Abstract: Dealing with uncertainty in Bayesian Network structures using maximum a posteriori (MAP) estimation or Bayesian Model Averaging (BMA) is often intractable due to the superexponential number of possible directed, acyclic graphs. When the prior is decomposable, two classes of graphs where efficient learning can take place are tree structures, and fixed-orderings with limited in-degree. We show how M… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-458-465

  16. arXiv:1205.2656  [pdf

    cs.LG cs.IT stat.ML

    Convex Coding

    Authors: David M. Bradley, J Andrew Bagnell

    Abstract: Inspired by recent work on convex formulations of clustering (Lashkari & Golland, 2008; Nowozin & Bakir, 2008) we investigate a new formulation of the Sparse Coding Problem (Olshausen & Field, 1997). In sparse coding we attempt to simultaneously represent a sequence of data-vectors sparsely (i.e. sparse approximation (Tropp et al., 2006)) in terms of a 'code' defined by a set of basis elements, wh… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-83-90

  17. arXiv:1203.1007  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Agnostic System Identification for Model-Based Reinforcement Learning

    Authors: Stephane Ross, J. Andrew Bagnell

    Abstract: A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular,… ▽ More

    Submitted 3 July, 2012; v1 submitted 5 March, 2012; originally announced March 2012.

    Comments: 8 pages, published in ICML 2012

  18. arXiv:1108.3154  [pdf, ps, other

    cs.LG stat.ML

    Stability Conditions for Online Learnability

    Authors: Stephane Ross, J. Andrew Bagnell

    Abstract: Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). Such conditions have recently been shown to be more powerful to characterize learnability in the general learning setting under i.i.d. samples where uniform convergence is not necessary for learnability, b… ▽ More

    Submitted 17 August, 2011; v1 submitted 16 August, 2011; originally announced August 2011.

    Comments: 16 pages. Earlier version of this work submitted (but rejected) to COLT 2011

  19. arXiv:1105.2054  [pdf, other

    cs.LG stat.ML

    Generalized Boosting Algorithms for Convex Optimization

    Authors: Alexander Grubb, J. Andrew Bagnell

    Abstract: Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the wea… ▽ More

    Submitted 14 February, 2012; v1 submitted 10 May, 2011; originally announced May 2011.

    Comments: Extended version of paper presented at the International Conference on Machine Learning, 2011. 9 pages + appendix with proofs

  20. arXiv:1011.0686  [pdf, other

    cs.LG cs.AI stat.ML

    A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

    Authors: Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell

    Abstract: Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or… ▽ More

    Submitted 16 March, 2011; v1 submitted 2 November, 2010; originally announced November 2010.

    Comments: Appearing in the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011)