Skip to main content

Showing 1–1 of 1 results for author: Moon, S B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02188  [pdf, other

    stat.ML cs.AI cs.LG

    Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

    Authors: Sang Bin Moon, Abolfazl Hashemi

    Abstract: The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.