Showing 1–2 of 2 results for author: Kubíček, O

Search v0.5.6 released 2020-02-24

arXiv:2312.15220 [pdf, other]

cs.GT

Look-ahead Search on Top of Policy Networks in Imperfect Information Games

Authors: Ondrej Kubicek, Neil Burch, Viliam Lisy

Abstract: Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories.… ▽ More Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search during train time. We evaluate the algorithm's performance when trained along Regularized Nash Dynamics, and we evaluate the benefit of using the search in the standard benchmark game of Leduc hold'em, multiple variants of imperfect information Goofspiel, and Battleships. △ Less

Submitted 29 January, 2025; v1 submitted 23 December, 2023; originally announced December 2023.
arXiv:2112.12594 [pdf, other]

cs.GT

Continual Depth-limited Responses for Computing Counter-strategies in Sequential Games

Authors: David Milec, Ondřej Kubíček, Viliam Lisý

Abstract: In zero-sum games, the optimal strategy is well-defined by the Nash equilibrium. However, it is overly conservative when playing against suboptimal opponents and it can not exploit their weaknesses. Limited look-ahead game solving in imperfect-information games allows defeating human experts in massive real-world games such as Poker, Liar's Dice, and Scotland Yard. However, since they approximate… ▽ More In zero-sum games, the optimal strategy is well-defined by the Nash equilibrium. However, it is overly conservative when playing against suboptimal opponents and it can not exploit their weaknesses. Limited look-ahead game solving in imperfect-information games allows defeating human experts in massive real-world games such as Poker, Liar's Dice, and Scotland Yard. However, since they approximate Nash equilibrium, they tend to only win slightly against weak opponents. We propose methods combining limited look-ahead solving with an opponent model in order to 1) approximate a best response in large games or 2) compute a robust response with control over the robustness of the response. Both methods can compute the response in real time to previously unseen strategies. We present theoretical guarantees of our methods. We show that existing robust response methods do not work combined with limited look-ahead solving of the shelf, and we propose a novel solution for the issue. Our algorithm performs significantly better than multiple baselines in smaller games and outperforms state-of-the-art methods against SlumBot. △ Less

Submitted 3 April, 2024; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: 16 pages, 15 figures

Search v0.5.6 released 2020-02-24