Skip to main content

Showing 1–3 of 3 results for author: Gimelfarb, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.12243  [pdf, other

    math.OC cs.LG cs.RO cs.SC eess.SY

    Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs

    Authors: Michael Gimelfarb, Ayal Taitler, Scott Sanner

    Abstract: We propose the Constraint-Generation Policy Optimization (CGPO) framework to optimize policy parameters within compact and interpretable policy classes for mixed discrete-continuous Markov Decision Processes (DC-MDP). CGPO can not only provide bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics, but it can also provably deriv… ▽ More

    Submitted 14 March, 2025; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Published in 22nd International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research

  2. arXiv:2305.07844  [pdf, other

    eess.SY cs.LG

    Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

    Authors: Michael Gimelfarb, Michael Jong Kim

    Abstract: We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regr… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

  3. arXiv:2003.00203  [pdf, other

    cs.LG cs.NE cs.RO eess.SY

    Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts

    Authors: Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

    Abstract: In reinforcement learning, agents that consider the context, or current state, when selecting source policies for transfer have been shown to outperform context-free approaches. However, none of the existing approaches transfer knowledge contextually from model-based learners to a model-free learner. This could be useful, for instance, when source policies are intentionally learned on diverse simu… ▽ More

    Submitted 10 June, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: - updated experiment for Lander domain (fixed a bug in the UCB baseline) - minor editing and formatting, fixing typos - new template - 15 pages, 6 figures