-
Generalized Proof-Number Monte-Carlo Tree Search
Authors:
Jakub Kowalski,
Dennis J. N. J. Soemers,
Szymon Kosakowski,
Mark H. M. Winands
Abstract:
This paper presents Generalized Proof-Number Monte-Carlo Tree Search: a generalization of recently proposed combinations of Proof-Number Search (PNS) with Monte-Carlo Tree Search (MCTS), which use (dis)proof numbers to bias UCB1-based Selection strategies towards parts of the search that are expected to be easily (dis)proven. We propose three core modifications of prior combinations of PNS with MC…
▽ More
This paper presents Generalized Proof-Number Monte-Carlo Tree Search: a generalization of recently proposed combinations of Proof-Number Search (PNS) with Monte-Carlo Tree Search (MCTS), which use (dis)proof numbers to bias UCB1-based Selection strategies towards parts of the search that are expected to be easily (dis)proven. We propose three core modifications of prior combinations of PNS with MCTS. First, we track proof numbers per player. This reduces code complexity in the sense that we no longer need disproof numbers, and generalizes the technique to be applicable to games with more than two players. Second, we propose and extensively evaluate different methods of using proof numbers to bias the selection strategy, achieving strong performance with strategies that are simpler to implement and compute. Third, we merge our technique with Score Bounded MCTS, enabling the algorithm to prove and leverage upper and lower bounds on scores - as opposed to only proving wins or not-wins. Experiments demonstrate substantial performance increases, reaching the range of 80% for 8 out of the 11 tested board games.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Towards Explaining Monte-Carlo Tree Search by Using Its Enhancements
Authors:
Jakub Kowalski,
Mark H. M. Winands,
Maksymilian Wiśniewski,
Stanisław Reda,
Anna Wilbik
Abstract:
Typically, research on Explainable Artificial Intelligence (XAI) focuses on black-box models within the context of a general policy in a known, specific domain. This paper advocates for the need for knowledge-agnostic explainability applied to the subfield of XAI called Explainable Search, which focuses on explaining the choices made by intelligent search techniques. It proposes Monte-Carlo Tree S…
▽ More
Typically, research on Explainable Artificial Intelligence (XAI) focuses on black-box models within the context of a general policy in a known, specific domain. This paper advocates for the need for knowledge-agnostic explainability applied to the subfield of XAI called Explainable Search, which focuses on explaining the choices made by intelligent search techniques. It proposes Monte-Carlo Tree Search (MCTS) enhancements as a solution to obtaining additional data and providing higher-quality explanations while remaining knowledge-free, and analyzes the most popular enhancements in terms of the specific types of explainability they introduce. So far, no other research has considered the explainability of MCTS enhancements. We present a proof-of-concept that demonstrates the advantages of utilizing enhancements.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Quantum Circuit Design using a Progressive Widening Enhanced Monte Carlo Tree Search
Authors:
Vincenzo Lipardi,
Domenica Dibenedetto,
Georgios Stamoulis,
Mark H. M. Winands
Abstract:
The performance of Variational Quantum Algorithms (VQAs) strongly depends on the choice of the parameterized quantum circuit to optimize. One of the biggest challenges in VQAs is designing quantum circuits tailored to the particular problem. This article proposes a gradient-free Monte Carlo Tree Search (MCTS) technique to automate the process of quantum circuit design. Our proposed technique intro…
▽ More
The performance of Variational Quantum Algorithms (VQAs) strongly depends on the choice of the parameterized quantum circuit to optimize. One of the biggest challenges in VQAs is designing quantum circuits tailored to the particular problem. This article proposes a gradient-free Monte Carlo Tree Search (MCTS) technique to automate the process of quantum circuit design. Our proposed technique introduces a novel formulation of the action space based on a sampling scheme and a progressive widening technique to explore the space dynamically. When testing our MCTS approach on the domain of random quantum circuits, MCTS approximates unstructured circuits under different values of stabilizer Rényi entropy. It turns out that MCTS manages to approximate the benchmark quantum states independently from their degree of nonstabilizerness. Next, our technique exhibits robustness across various application domains, including quantum chemistry and systems of linear equations. Compared to previous MCTS research, our technique reduces the number of quantum circuit evaluations by a factor of 10 up to 100 while achieving equal or better results. In addition, the resulting quantum circuits exhibit up to three times fewer CNOT gates, which is important for implementation on noisy quantum hardware.
△ Less
Submitted 28 April, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
Environment Descriptions for Usability and Generalisation in Reinforcement Learning
Authors:
Dennis J. N. J. Soemers,
Spyridon Samothrakis,
Kurt Driessens,
Mark H. M. Winands
Abstract:
The majority of current reinforcement learning (RL) research involves training and deploying agents in environments that are implemented by engineers in general-purpose programming languages and more advanced frameworks such as CUDA or JAX. This makes the application of RL to novel problems of interest inaccessible to small organisations or private individuals with insufficient engineering experti…
▽ More
The majority of current reinforcement learning (RL) research involves training and deploying agents in environments that are implemented by engineers in general-purpose programming languages and more advanced frameworks such as CUDA or JAX. This makes the application of RL to novel problems of interest inaccessible to small organisations or private individuals with insufficient engineering expertise. This position paper argues that, to enable more widespread adoption of RL, it is important for the research community to shift focus towards methodologies where environments are described in user-friendly domain-specific or natural languages. Aside from improving the usability of RL, such language-based environment descriptions may also provide valuable context and boost the ability of trained agents to generalise to unseen environments within the set of all environments that can be described in any language of choice.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
Anytime Sequential Halving in Monte-Carlo Tree Search
Authors:
Dominic Sagers,
Mark H. M. Winands,
Dennis J. N. J. Soemers
Abstract:
Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to si…
▽ More
Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to simple regret. However, Sequential Halving requires a budget of iterations to be predetermined, which is often impractical. This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result, while being designed such that it approximates the behavior of Sequential Halving. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1 (and their analogues in MCTS).
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Enhancements for Real-Time Monte-Carlo Tree Search in General Video Game Playing
Authors:
Dennis J. N. J. Soemers,
Chiara F. Sironi,
Torsten Schuster,
Mark H. M. Winands
Abstract:
General Video Game Playing (GVGP) is a field of Artificial Intelligence where agents play a variety of real-time video games that are unknown in advance. This limits the use of domain-specific heuristics. Monte-Carlo Tree Search (MCTS) is a search technique for game playing that does not rely on domain-specific knowledge. This paper discusses eight enhancements for MCTS in GVGP; Progressive Histor…
▽ More
General Video Game Playing (GVGP) is a field of Artificial Intelligence where agents play a variety of real-time video games that are unknown in advance. This limits the use of domain-specific heuristics. Monte-Carlo Tree Search (MCTS) is a search technique for game playing that does not rely on domain-specific knowledge. This paper discusses eight enhancements for MCTS in GVGP; Progressive History, N-Gram Selection Technique, Tree Reuse, Breadth-First Tree Initialization, Loss Avoidance, Novelty-Based Pruning, Knowledge-Based Evaluations, and Deterministic Game Detection. Some of these are known from existing literature, and are either extended or introduced in the context of GVGP, and some are novel enhancements for MCTS. Most enhancements are shown to provide statistically significant increases in win percentages when applied individually. When combined, they increase the average win percentage over sixty different games from 31.0% to 48.4% in comparison to a vanilla MCTS implementation, approaching a level that is competitive with the best agents of the GVG-AI competition in 2015.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Towards a Characterisation of Monte-Carlo Tree Search Performance in Different Games
Authors:
Dennis J. N. J. Soemers,
Guillaume Bams,
Max Persoon,
Marco Rietjens,
Dimitar Sladić,
Stefan Stefanov,
Kurt Driessens,
Mark H. M. Winands
Abstract:
Many enhancements to Monte-Carlo Tree Search (MCTS) have been proposed over almost two decades of general game playing and other artificial intelligence research. However, our ability to characterise and understand which variants work well or poorly in which games is still lacking. This paper describes work on an initial dataset that we have built to make progress towards such an understanding: 26…
▽ More
Many enhancements to Monte-Carlo Tree Search (MCTS) have been proposed over almost two decades of general game playing and other artificial intelligence research. However, our ability to characterise and understand which variants work well or poorly in which games is still lacking. This paper describes work on an initial dataset that we have built to make progress towards such an understanding: 268,386 plays among 61 different agents across 1494 distinct games. We describe a preliminary analysis and work on training predictive models on this dataset, as well as lessons learned and future plans for a new and improved version of the dataset.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Proof Number Based Monte-Carlo Tree Search
Authors:
Jakub Kowalski,
Elliot Doe,
Mark H. M. Winands,
Daniel Górski,
Dennis J. N. J. Soemers
Abstract:
This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These two algorithms have been successfully applied for decision making in a range of domains. We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used: final move selection, solving subtre…
▽ More
This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These two algorithms have been successfully applied for decision making in a range of domains. We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used: final move selection, solving subtrees, and the UCB1 selection mechanism. We test all possible combinations on different time settings, playing against vanilla UCT on several games: Lines of Action ($7$$\times$$7$ and $8$$\times$$8$ board sizes), MiniShogi, Knightthrough, and Awari. Furthermore, we extend this new algorithm to properly address games with draws, like Awari, by adding an additional layer of PNS on top of the MCTS tree. The experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.
△ Less
Submitted 29 May, 2024; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Monte-Carlo Tree-Search for Leveraging Performance of Blackbox Job-Shop Scheduling Heuristics
Authors:
Florian Wimmenauer,
Matúš Mihalák,
Mark H. M. Winands
Abstract:
In manufacturing, the production is often done on out-of-the-shelf manufacturing lines, whose underlying scheduling heuristics are not known due to the intellectual property. We consider such a setting with a black-box job-shop system and an unknown scheduling heuristic that, for a given permutation of jobs, schedules the jobs for the black-box job-shop with the goal of minimizing the makespan. He…
▽ More
In manufacturing, the production is often done on out-of-the-shelf manufacturing lines, whose underlying scheduling heuristics are not known due to the intellectual property. We consider such a setting with a black-box job-shop system and an unknown scheduling heuristic that, for a given permutation of jobs, schedules the jobs for the black-box job-shop with the goal of minimizing the makespan. Here, the jobs need to enter the job-shop in the given order of the permutation, but may take different paths within the job shop, which depends on the black-box heuristic. The performance of the black-box heuristic depends on the order of the jobs, and the natural problem for the manufacturer is to find an optimum ordering of the jobs.
Facing a real-world scenario as described above, we engineer the Monte-Carlo tree-search for finding a close-to-optimum ordering of jobs. To cope with a large solutions-space in planning scenarios, a hierarchical Monte-Carlo tree search (H-MCTS) is proposed based on abstraction of jobs. On synthetic and real-life problems, H-MCTS with integrated abstraction significantly outperforms pure heuristic-based techniques as well as other Monte-Carlo search variants. We furthermore show that, by modifying the evaluation metric in H-MCTS, it is possible to achieve other optimization objectives than what the scheduling heuristics are designed for -- e.g., minimizing the total completion time instead of the makespan. Our experimental observations have been also validated in real-life cases, and our H-MCTS approach has been implemented in a production plant's controller.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Combining Monte-Carlo Tree Search with Proof-Number Search
Authors:
Elliot Doe,
Mark H. M. Winands,
Dennis J. N. J. Soemers,
Cameron Browne
Abstract:
Proof-Number Search (PNS) and Monte-Carlo Tree Search (MCTS) have been successfully applied for decision making in a range of games. This paper proposes a new approach called PN-MCTS that combines these two tree-search methods by incorporating the concept of proof and disproof numbers into the UCT formula of MCTS. Experimental results demonstrate that PN-MCTS outperforms basic MCTS in several game…
▽ More
Proof-Number Search (PNS) and Monte-Carlo Tree Search (MCTS) have been successfully applied for decision making in a range of games. This paper proposes a new approach called PN-MCTS that combines these two tree-search methods by incorporating the concept of proof and disproof numbers into the UCT formula of MCTS. Experimental results demonstrate that PN-MCTS outperforms basic MCTS in several games including Lines of Action, MiniShogi, Knightthrough, and Awari, achieving win rates up to 94.0%.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Split Moves for Monte-Carlo Tree Search
Authors:
Jakub Kowalski,
Maksymilian Mika,
Wojciech Pawlik,
Jakub Sutowicz,
Marek Szykuła,
Mark H. M. Winands
Abstract:
In many games, moves consist of several decisions made by the player. These decisions can be viewed as separate moves, which is already a common practice in multi-action games for efficiency reasons. Such division of a player move into a sequence of simpler / lower level moves is called \emph{splitting}. So far, split moves have been applied only in forementioned straightforward cases, and further…
▽ More
In many games, moves consist of several decisions made by the player. These decisions can be viewed as separate moves, which is already a common practice in multi-action games for efficiency reasons. Such division of a player move into a sequence of simpler / lower level moves is called \emph{splitting}. So far, split moves have been applied only in forementioned straightforward cases, and furthermore, there was almost no study revealing its impact on agents' playing strength. Taking the knowledge-free perspective, we aim to answer how to effectively use split moves within Monte-Carlo Tree Search (MCTS) and what is the practical impact of split design on agents' strength. This paper proposes a generalization of MCTS that works with arbitrarily split moves. We design several variations of the algorithm and try to measure the impact of split moves separately on efficiency, quality of MCTS, simulations, and action-based heuristics. The tests are carried out on a set of board games and performed using the Regular Boardgames General Game Playing formalism, where split strategies of different granularity can be automatically derived based on an abstract description of the game. The results give an overview of the behavior of agents using split design in different ways. We conclude that split design can be greatly beneficial for single- as well as multi-action games.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Service Selection using Predictive Models and Monte-Carlo Tree Search
Authors:
Cliff Laschet,
Jorn op den Buijs,
Mark H. M. Winands,
Steffen Pauws
Abstract:
This article proposes a method for automated service selection to improve treatment efficacy and reduce re-hospitalization costs. A predictive model is developed using the National Home and Hospice Care Survey (NHHCS) dataset to quantify the effect of care services on the risk of re-hospitalization. By taking the patient's characteristics and other selected services into account, the model is able…
▽ More
This article proposes a method for automated service selection to improve treatment efficacy and reduce re-hospitalization costs. A predictive model is developed using the National Home and Hospice Care Survey (NHHCS) dataset to quantify the effect of care services on the risk of re-hospitalization. By taking the patient's characteristics and other selected services into account, the model is able to indicate the overall effectiveness of a combination of services for a specific NHHCS patient. The developed model is incorporated in Monte-Carlo Tree Search (MCTS) to determine optimal combinations of services that minimize the risk of emergency re-hospitalization. MCTS serves as a risk minimization algorithm in this case, using the predictive model for guidance during the search. Using this method on the NHHCS dataset, a significant reduction in risk of re-hospitalization is observed compared to the original selections made by clinicians. An 11.89 percentage points risk reduction is achieved on average. Higher reductions of roughly 40 percentage points on average are observed for NHHCS patients in the highest risk categories. These results seem to indicate that there is enormous potential for improving service selection in the near future.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Foundations of Digital Archæoludology
Authors:
Cameron Browne,
Dennis J. N. J. Soemers,
Éric Piette,
Matthew Stephenson,
Michael Conrad,
Walter Crist,
Thierry Depaulis,
Eddie Duggan,
Fred Horn,
Steven Kelk,
Simon M. Lucas,
João Pedro Neto,
David Parlett,
Abdallah Saffidine,
Ulrich Schädler,
Jorge Nuno Silva,
Alex de Voogt,
Mark H. M. Winands
Abstract:
Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and the…
▽ More
Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and their relationship to the development of human culture and mathematical knowledge. This work is being explored in the ERC-funded Digital Ludeme Project.
The aim of this inaugural international research meeting on DAL is to gather together leading experts in relevant disciplines - computer science, artificial intelligence, machine learning, computational phylogenetics, mathematics, history, archaeology, anthropology, etc. - to discuss the key themes and establish the foundations for this new field of research, so that it may continue beyond the lifetime of its initiating project.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
Ludii -- The Ludemic General Game System
Authors:
Éric Piette,
Dennis J. N. J. Soemers,
Matthew Stephenson,
Chiara F. Sironi,
Mark H. M. Winands,
Cameron Browne
Abstract:
While current General Game Playing (GGP) systems facilitate useful research in Artificial Intelligence (AI) for game-playing, they are often somewhat specialised and computationally inefficient. In this paper, we describe the "ludemic" general game system Ludii, which has the potential to provide an efficient tool for AI researchers as well as game designers, historians, educators and practitioner…
▽ More
While current General Game Playing (GGP) systems facilitate useful research in Artificial Intelligence (AI) for game-playing, they are often somewhat specialised and computationally inefficient. In this paper, we describe the "ludemic" general game system Ludii, which has the potential to provide an efficient tool for AI researchers as well as game designers, historians, educators and practitioners in related fields. Ludii defines games as structures of ludemes -- high-level, easily understandable game concepts -- which allows for concise and human-understandable game descriptions. We formally describe Ludii and outline its main benefits: generality, extensibility, understandability and efficiency. Experimentally, Ludii outperforms one of the most efficient Game Description Language (GDL) reasoners, based on a propositional network, in all games available in the Tiltyard GGP repository. Moreover, Ludii is also competitive in terms of performance with the more recently proposed Regular Boardgames (RBG) system, and has various advantages in qualitative aspects such as generality.
△ Less
Submitted 21 February, 2020; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups
Authors:
Marc Lanctot,
Mark H. M. Winands,
Tom Pepels,
Nathan R. Sturtevant
Abstract:
Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Ac…
▽ More
Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.
△ Less
Submitted 19 June, 2014; v1 submitted 2 June, 2014;
originally announced June 2014.
-
Monte Carlo *-Minimax Search
Authors:
Marc Lanctot,
Abdallah Saffidine,
Joel Veness,
Christopher Archibald,
Mark H. M. Winands
Abstract:
This paper introduces Monte Carlo *-Minimax Search (MCMS), a Monte Carlo search algorithm for turned-based, stochastic, two-player, zero-sum games of perfect information. The algorithm is designed for the class of of densely stochastic games; that is, games where one would rarely expect to sample the same successor state multiple times at any particular chance node. Our approach combines sparse sa…
▽ More
This paper introduces Monte Carlo *-Minimax Search (MCMS), a Monte Carlo search algorithm for turned-based, stochastic, two-player, zero-sum games of perfect information. The algorithm is designed for the class of of densely stochastic games; that is, games where one would rarely expect to sample the same successor state multiple times at any particular chance node. Our approach combines sparse sampling techniques from MDP planning with classic pruning techniques developed for adversarial expectimax planning. We compare and contrast our algorithm to the traditional *-Minimax approaches, as well as MCTS enhanced with the Double Progressive Widening, on four games: Pig, EinStein Würfelt Nicht!, Can't Stop, and Ra. Our results show that MCMS can be competitive with enhanced MCTS variants in some domains, while consistently outperforming the equivalent classic approaches given the same amount of thinking time.
△ Less
Submitted 22 April, 2013;
originally announced April 2013.