Skip to main content

Showing 1–8 of 8 results for author: Chang, H S

Searching in archive math. Search in all archives.
.
  1. arXiv:2402.07063  [pdf, ps, other

    math.OC

    On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes

    Authors: Hyeong Soo Chang

    Abstract: A recent theoretical analysis of a Monte-Carlo tree search (MCTS) method properly modified from the ``upper confidence bound applied to trees" (UCT) algorithm established a surprising result, due to a great deal of empirical successes reported from heuristic usage of UCT with relevant adjustments for various problem domains in the literature, that its rate of convergence of the expected absolute e… ▽ More

    Submitted 1 February, 2025; v1 submitted 10 February, 2024; originally announced February 2024.

  2. arXiv:2401.08845  [pdf, ps, other

    math.OC

    Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

    Authors: Hyeong Soo Chang

    Abstract: We present an algorithm, "constrained successive accept or reject (CSAR)," for the problem of identifying the subset of top feasible-arms from a given finite set of arms with the limited sampling-budget equal to a given time-horizon when the sequential dynamics of the arms follows the model of a constrained multi-armed bandit. We provide a finite-time upper bound on the probability of the incorrec… ▽ More

    Submitted 21 January, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

  3. arXiv:2308.03297  [pdf, ps, other

    math.OC

    Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

    Authors: Hyeong Soo Chang

    Abstract: We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a uniform-feasibility constraint of the expected total discounted-cost in a deterministic, history-independent, and stationary policy set. We derive a DP-equation that recur… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  4. arXiv:2206.01860  [pdf, ps, other

    math.OC

    On Supervised On-line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

    Authors: Hyeong Soo Chang

    Abstract: This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an asynchronous on-line algorithm based on policy iteration integrated with a multi-policy improvement method of policy switching. A sequence of monotonically impr… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  5. arXiv:2112.02177  [pdf, ps, other

    math.OC

    On-line Policy Iteration with Policy Switching for Markov Decision Processes

    Authors: Hyeong Soo Chang

    Abstract: Motivated from Bertsekas' recent study on policy iteration (PI) for solving the problems of infinite-horizon discounted Markov decision processes (MDPs) in an on-line setting, we develop an off-line PI integrated with a multi-policy improvement method of policy switching and then adapt its asynchronous variant into on-line PI algorithm that generates a sequence of policies over time. The current p… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  6. arXiv:2007.14550  [pdf, ps, other

    math.OC cs.LG stat.ML

    An Index-based Deterministic Asymptotically Optimal Algorithm for Constrained Multi-armed Bandit Problems

    Authors: Hyeong Soo Chang

    Abstract: For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an optimal feasible arm to one over infinite horizon. The algorithm is built upon Locatelli et al.'s "anytime parameter-free thresholding" algorithm under the assumption… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  7. arXiv:1805.01237  [pdf, ps, other

    math.OC cs.LG stat.ML

    An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

    Authors: Hyeong Soo Chang

    Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $ε_t$-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  8. arXiv:1412.4898  [pdf, ps, other

    math.OC

    Sleeping Experts and Bandits Approach to Constrained Markov Decision Processes

    Authors: Hyeong Soo Chang

    Abstract: This brief paper presents simple simulation-based algorithms for obtaining an approximately optimal policy in a given finite set in large finite constrained Markov decision processes. The algorithms are adapted from playing strategies for "sleeping experts and bandits" problem and their computational complexities are independent of state and action space sizes if the given policy set is relatively… ▽ More

    Submitted 16 December, 2014; originally announced December 2014.