Skip to main content

Showing 1–1 of 1 results for author: Pike-Burke, C

Searching in archive math. Search in all archives.
.
  1. arXiv:2302.11381  [pdf, other

    math.OC cs.LG math.ST

    Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

    Authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini

    Abstract: Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount fac… ▽ More

    Submitted 21 November, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted at NeurIPS 2023