Skip to main content

Showing 1–3 of 3 results for author: Banjac, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2201.00039  [pdf, ps, other

    cs.LG math.OC

    Stochastic convex optimization for provably efficient apprenticeship learning

    Authors: Angeliki Kamoutsi, Goran Banjac, John Lygeros

    Abstract: We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear c… ▽ More

    Submitted 31 December, 2021; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2112.14004

    Journal ref: Optimization Foundations for Reinforcement Learning Workshop at NeurIPS 2019, Vancouver, Canada

  2. arXiv:2112.14004  [pdf, ps, other

    cs.LG cs.AI math.OC

    Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

    Authors: Angeliki Kamoutsi, Goran Banjac, John Lygeros

    Abstract: We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computati… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

    Journal ref: International Conference of Machine Learning (ICML) 2021

  3. arXiv:2107.10847  [pdf, other

    cs.LG math.OC

    Accelerating Quadratic Optimization with Reinforcement Learning

    Authors: Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg

    Abstract: First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tu… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 25 pages, 7 figures. Code available at https://github.com/berkeleyautomation/rlqp