-
Predictive Control Using Learned State Space Models via Rolling Horizon Evolution
Authors:
Alvaro Ovalle,
Simon M. Lucas
Abstract:
A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolu…
▽ More
A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Cross-Platform Games in Kotlin
Authors:
Simon M. Lucas
Abstract:
This demo paper describes a simple and practical approach to writing cross-platform casual games using the Kotlin programming language. A key aim is to make it much easier for researchers to demonstrate their AI playing a range of games. Pure Kotlin code (which excludes using any Java graphics libraries) can be transpiled to JavaScript and run in a web browser. However, writing Kotlin code that wi…
▽ More
This demo paper describes a simple and practical approach to writing cross-platform casual games using the Kotlin programming language. A key aim is to make it much easier for researchers to demonstrate their AI playing a range of games. Pure Kotlin code (which excludes using any Java graphics libraries) can be transpiled to JavaScript and run in a web browser. However, writing Kotlin code that will run without modification both in a web browser and on the JVM is not trivial; it requires strict adherence to an appropriate methodology. The contribution of this paper is to provide such a method including a software design and to demonstrate this working for Tetris, played either by AI or human.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Modulation of viability signals for self-regulatory control
Authors:
Alvaro Ovalle,
Simon M. Lucas
Abstract:
We revisit the role of instrumental value as a driver of adaptive behavior. In active inference, instrumental or extrinsic value is quantified by the information-theoretic surprisal of a set of observations measuring the extent to which those observations conform to prior beliefs or preferences. That is, an agent is expected to seek the type of evidence that is consistent with its own model of the…
▽ More
We revisit the role of instrumental value as a driver of adaptive behavior. In active inference, instrumental or extrinsic value is quantified by the information-theoretic surprisal of a set of observations measuring the extent to which those observations conform to prior beliefs or preferences. That is, an agent is expected to seek the type of evidence that is consistent with its own model of the world. For reinforcement learning tasks, the distribution of preferences replaces the notion of reward. We explore a scenario in which the agent learns this distribution in a self-supervised manner. In particular, we highlight the distinction between observations induced by the environment and those pertaining more directly to the continuity of an agent in time. We evaluate our methodology in a dynamic environment with discrete time and actions. First with a surprisal minimizing model-free agent (in the RL sense) and then expanding to the model-based case to minimize the expected free energy.
△ Less
Submitted 13 October, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Evaluating Generalisation in General Video Game Playing
Authors:
Martin Balla,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they…
▽ More
The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they typically require much more data. This work investigates 3 versions of the Advantage Actor-Critic (A2C) algorithm trained on a maximum of 2 levels from the available 5 from the GVGAI framework and compares their performance on all levels. The selected sub-set of games have different characteristics, like stochasticity, reward distribution and objectives. We found that stochasticity improves the generalisation, but too much can cause the algorithms to fail to learn the training levels. The quality of the training levels also matters, different sets of training levels can boost generalisation over all levels. In the GVGAI competition agents are scored based on their win rates and then their scores achieved in the games. We found that solely using the rewards provided by the game might not encourage winning.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Bootstrapped model learning and error correction for planning with uncertainty in model-based RL
Authors:
Alvaro Ovalle,
Simon M. Lucas
Abstract:
Having access to a forward model enables the use of planning algorithms such as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is unavailable, a natural aim is to learn a model that reflects accurately the dynamics of the environment. In many situations it might not be possible and minimal glitches in the model may lead to poor performance and failure. This paper explores the…
▽ More
Having access to a forward model enables the use of planning algorithms such as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is unavailable, a natural aim is to learn a model that reflects accurately the dynamics of the environment. In many situations it might not be possible and minimal glitches in the model may lead to poor performance and failure. This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents. We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards. We experiment with a number of schemes to extract the most likely predictions. Moreover, we also introduce a global error correction filter that applies high-level constraints guided by the context provided through the predictive distribution. We illustrate our approach on Minipacman. The evaluation demonstrates that when dealing with imperfect models, our methods exhibit increased performance and stability, both in terms of model accuracy and in its use within a planning algorithm.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition
Authors:
Zhentao Tang,
Yuanheng Zhu,
Dongbin Zhao,
Simon M. Lucas
Abstract:
The Fighting Game AI Competition (FTGAIC) provides a challenging benchmark for 2-player video game AI. The challenge arises from the large action space, diverse styles of characters and abilities, and the real-time nature of the game. In this paper, we propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning. The approach is readily applicable…
▽ More
The Fighting Game AI Competition (FTGAIC) provides a challenging benchmark for 2-player video game AI. The challenge arises from the large action space, diverse styles of characters and abilities, and the real-time nature of the game. In this paper, we propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning. The approach is readily applicable to any 2-player video game. In contrast to conventional RHEA, an opponent model is proposed and is optimized by supervised learning with cross-entropy and reinforcement learning with policy gradient and Q-learning respectively, based on history observations from opponent. The model is learned during the live gameplay. With the learned opponent model, the extended RHEA is able to make more realistic plans based on what the opponent is likely to do. This tends to lead to better results. We compared our approach directly with the bots from the FTGAIC 2018 competition, and found our method to significantly outperform all of them, for all three character. Furthermore, our proposed bot with the policy-gradient-based opponent model is the only one without using Monte-Carlo Tree Search (MCTS) among top five bots in the 2019 competition in which it achieved second place, while using much less domain knowledge than the winner.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Rolling Horizon Evolutionary Algorithms for General Video Game Playing
Authors:
Raluca D. Gaina,
Sam Devlin,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on the specific configuration of modifications and hybrids introduced over several papers, each adding additional parameters to the core algorithm. Further, the be…
▽ More
Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on the specific configuration of modifications and hybrids introduced over several papers, each adding additional parameters to the core algorithm. Further, the best previously published parameters have been found from only a few human-picked combinations, as the possibility space has grown beyond exhaustive search. This paper presents the state of the art in Rolling Horizon Evolutionary Algorithms, combining all modifications described in literature, as well as new ones, for a large resultant hybrid. We then use a parameter optimiser, the N-Tuple Bandit Evolutionary Algorithm, to find the best combination of parameters in 20 games from the General Video Game AI Framework. Further, we analyse the algorithm's parameters and some interesting combinations revealed through the optimisation process. Lastly, we find new state of the art solutions on several games by automatically exploring the large parameter space of RHEA.
△ Less
Submitted 24 August, 2020; v1 submitted 27 March, 2020;
originally announced March 2020.
-
Learning Local Forward Models on Unforgiving Games
Authors:
Alexander Dockhorn,
Simon M. Lucas,
Vanessa Volz,
Ivan Bravi,
Raluca D. Gaina,
Diego Perez-Liebana
Abstract:
This paper examines learning approaches for forward models based on local cell transition functions. We provide a formal definition of local forward models for which we propose two basic learning approaches. Our analysis is based on the game Sokoban, where a wrong action can lead to an unsolvable game state. Therefore, an accurate prediction of an action's resulting state is necessary to avoid thi…
▽ More
This paper examines learning approaches for forward models based on local cell transition functions. We provide a formal definition of local forward models for which we propose two basic learning approaches. Our analysis is based on the game Sokoban, where a wrong action can lead to an unsolvable game state. Therefore, an accurate prediction of an action's resulting state is necessary to avoid this scenario.
In contrast to learning the complete state transition function, local forward models allow extracting multiple training examples from a single state transition. In this way, the Hash Set model, as well as the Decision Tree model, quickly learn to predict upcoming state transitions of both the training and the test set. Applying the model using a statistical forward planner showed that the best models can be used to satisfying degree even in cases in which the test levels have not yet been seen.
Our evaluation includes an analysis of various local neighbourhood patterns and sizes to test the learners' capabilities in case too few or too many attributes are extracted, of which the latter has shown do degrade the performance of the model learner.
△ Less
Submitted 1 September, 2019;
originally announced September 2019.
-
Project Thyia: A Forever Gameplayer
Authors:
Raluca D. Gaina,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
The space of Artificial Intelligence entities is dominated by conversational bots. Some of them fit in our pockets and we take them everywhere we go, or allow them to be a part of human homes. Siri, Alexa, they are recognised as present in our world. But a lot of games research is restricted to existing in the separate realm of software. We enter different worlds when playing games, but those worl…
▽ More
The space of Artificial Intelligence entities is dominated by conversational bots. Some of them fit in our pockets and we take them everywhere we go, or allow them to be a part of human homes. Siri, Alexa, they are recognised as present in our world. But a lot of games research is restricted to existing in the separate realm of software. We enter different worlds when playing games, but those worlds cease to exist once we quit. Similarly, AI game-players are run once on a game (or maybe for longer periods of time, in the case of learning algorithms which need some, still limited, period for training), and they cease to exist once the game ends. But what if they didn't? What if there existed artificial game-players that continuously played games, learned from their experiences and kept getting better? What if they interacted with the real world and us, humans: live-streaming games, chatting with viewers, accepting suggestions for strategies or games to play, forming opinions on popular game titles? In this paper, we introduce the vision behind a new project called Thyia, which focuses around creating a present, continuous, `always-on', interactive game-player.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Foundations of Digital Archæoludology
Authors:
Cameron Browne,
Dennis J. N. J. Soemers,
Éric Piette,
Matthew Stephenson,
Michael Conrad,
Walter Crist,
Thierry Depaulis,
Eddie Duggan,
Fred Horn,
Steven Kelk,
Simon M. Lucas,
João Pedro Neto,
David Parlett,
Abdallah Saffidine,
Ulrich Schädler,
Jorge Nuno Silva,
Alex de Voogt,
Mark H. M. Winands
Abstract:
Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and the…
▽ More
Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and their relationship to the development of human culture and mathematical knowledge. This work is being explored in the ERC-funded Digital Ludeme Project.
The aim of this inaugural international research meeting on DAL is to gather together leading experts in relevant disciplines - computer science, artificial intelligence, machine learning, computational phylogenetics, mathematics, history, archaeology, anthropology, etc. - to discuss the key themes and establish the foundations for this new field of research, so that it may continue beyond the lifetime of its initiating project.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
Tile Pattern KL-Divergence for Analysing and Evolving Game Levels
Authors:
Simon M. Lucas,
Vanessa Volz
Abstract:
This paper provides a detailed investigation of using the Kullback-Leibler (KL) Divergence as a way to compare and analyse game-levels, and hence to use the measure as the objective function of an evolutionary algorithm to evolve new levels. We describe the benefits of its asymmetry for level analysis and demonstrate how (not surprisingly) the quality of the results depends on the features used. H…
▽ More
This paper provides a detailed investigation of using the Kullback-Leibler (KL) Divergence as a way to compare and analyse game-levels, and hence to use the measure as the objective function of an evolutionary algorithm to evolve new levels. We describe the benefits of its asymmetry for level analysis and demonstrate how (not surprisingly) the quality of the results depends on the features used. Here we use tile-patterns of various sizes as features.
When using the measure for evolution-based level generation, we demonstrate that the choice of variation operator is critical in order to provide an efficient search process, and introduce a novel convolutional mutation operator to facilitate this. We compare the results with alternative generators, including evolving in the latent space of generative adversarial networks, and Wave Function Collapse. The results clearly show the proposed method to provide competitive performance, providing reasonable quality results with very fast training and reasonably fast generation.
△ Less
Submitted 24 April, 2019;
originally announced May 2019.
-
A Local Approach to Forward Model Learning: Results on the Game of Life Game
Authors:
Simon M. Lucas,
Alexander Dockhorn,
Vanessa Volz,
Chris Bamford,
Raluca D. Gaina,
Ivan Bravi,
Diego Perez-Liebana,
Sanaz Mostaghim,
Rudolf Kruse
Abstract:
This paper investigates the effect of learning a forward model on the performance of a statistical forward planning agent. We transform Conway's Game of Life simulation into a single-player game where the objective can be either to preserve as much life as possible or to extinguish all life as quickly as possible.
In order to learn the forward model of the game, we formulate the problem in a nov…
▽ More
This paper investigates the effect of learning a forward model on the performance of a statistical forward planning agent. We transform Conway's Game of Life simulation into a single-player game where the objective can be either to preserve as much life as possible or to extinguish all life as quickly as possible.
In order to learn the forward model of the game, we formulate the problem in a novel way that learns the local cell transition function by creating a set of supervised training data and predicting the next state of each cell in the grid based on its current state and immediate neighbours. Using this method we are able to harvest sufficient data to learn perfect forward models by observing only a few complete state transitions, using either a look-up table, a decision tree or a neural network.
In contrast, learning the complete state transition function is a much harder task and our initial efforts to do this using deep convolutional auto-encoders were less successful.
We also investigate the effects of imperfect learned models on prediction errors and game-playing performance, and show that even models with significant errors can provide good performance.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.
-
Efficient Evolutionary Methods for Game Agent Optimisation: Model-Based is Best
Authors:
Simon M. Lucas,
Jialin Liu,
Ivan Bravi,
Raluca D. Gaina,
John Woodward,
Vanessa Volz,
Diego Perez-Liebana
Abstract:
This paper introduces a simple and fast variant of Planet Wars as a test-bed for statistical planning based Game AI agents, and for noisy hyper-parameter optimisation. Planet Wars is a real-time strategy game with simple rules but complex game-play. The variant introduced in this paper is designed for speed to enable efficient experimentation, and also for a fixed action space to enable practical…
▽ More
This paper introduces a simple and fast variant of Planet Wars as a test-bed for statistical planning based Game AI agents, and for noisy hyper-parameter optimisation. Planet Wars is a real-time strategy game with simple rules but complex game-play. The variant introduced in this paper is designed for speed to enable efficient experimentation, and also for a fixed action space to enable practical inter-operability with General Video Game AI agents. If we treat the game as a win-loss game (which is standard), then this leads to challenging noisy optimisation problems both in tuning agents to play the game, and in tuning game parameters. Here we focus on the problem of tuning an agent, and report results using the recently developed N-Tuple Bandit Evolutionary Algorithm and a number of other optimisers, including Sequential Model-based Algorithm Configuration (SMAC). Results indicate that the N-Tuple Bandit Evolutionary offers competitive performance as well as insight into the effects of combinations of parameter choices.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Game AI Research with Fast Planet Wars Variants
Authors:
Simon M. Lucas
Abstract:
This paper describes a new implementation of Planet Wars, designed from the outset for Game AI research. The skill-depth of the game makes it a challenge for game-playing agents, and the speed of more than 1 million game ticks per second enables rapid experimentation and prototyping. The parameterised nature of the game together with an interchangeable actuator model make it well suited to automat…
▽ More
This paper describes a new implementation of Planet Wars, designed from the outset for Game AI research. The skill-depth of the game makes it a challenge for game-playing agents, and the speed of more than 1 million game ticks per second enables rapid experimentation and prototyping. The parameterised nature of the game together with an interchangeable actuator model make it well suited to automated game tuning. The game is designed to be fun to play for humans, and is directly playable by General Video Game AI agents.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network
Authors:
Vanessa Volz,
Jacob Schrum,
Jialin Liu,
Simon M. Lucas,
Adam Smith,
Sebastian Risi
Abstract:
Generative Adversarial Networks (GANs) are a machine learning approach capable of generating novel example outputs across a space of provided training examples. Procedural Content Generation (PCG) of levels for video games could benefit from such models, especially for games where there is a pre-existing corpus of levels to emulate. This paper trains a GAN to generate levels for Super Mario Bros u…
▽ More
Generative Adversarial Networks (GANs) are a machine learning approach capable of generating novel example outputs across a space of provided training examples. Procedural Content Generation (PCG) of levels for video games could benefit from such models, especially for games where there is a pre-existing corpus of levels to emulate. This paper trains a GAN to generate levels for Super Mario Bros using a level from the Video Game Level Corpus. The approach successfully generates a variety of levels similar to one in the original corpus, but is further improved by application of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Specifically, various fitness functions are used to discover levels within the latent space of the GAN that maximize desired properties. Simple static properties are optimized, such as a given distribution of tile types. Additionally, the champion A* agent from the 2009 Mario AI competition is used to assess whether a level is playable, and how many jumping actions are required to beat it. These fitness functions allow for the discovery of levels that exist within the space of examples designed by experts, and also guide the search towards levels that fulfill one or more specified objectives.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms
Authors:
Diego Perez-Liebana,
Jialin Liu,
Ahmed Khalifa,
Raluca D. Gaina,
Julian Togelius,
Simon M. Lucas
Abstract:
General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video G…
▽ More
General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video Game Description Language (VGDL). The framework has been expanded into several tracks during the last few years to meet the demand of different research directions. The agents are required either to play multiple unknown games with or without access to game simulations, or to design new game levels or rules. This survey paper presents the VGDL, the GVGAI framework, existing tracks, and reviews the wide use of GVGAI framework in research, education and competitions five years after its birth. A future plan of framework improvements is also described.
△ Less
Submitted 22 February, 2019; v1 submitted 28 February, 2018;
originally announced February 2018.
-
The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation
Authors:
Simon M Lucas,
Jialin Liu,
Diego Perez-Liebana
Abstract:
This paper describes the N-Tuple Bandit Evolutionary Algorithm (NTBEA), an optimisation algorithm developed for noisy and expensive discrete (combinatorial) optimisation problems. The algorithm is applied to two game-based hyper-parameter optimisation problems. The N-Tuple system directly models the statistics, approximating the fitness and number of evaluations of each modelled combination of par…
▽ More
This paper describes the N-Tuple Bandit Evolutionary Algorithm (NTBEA), an optimisation algorithm developed for noisy and expensive discrete (combinatorial) optimisation problems. The algorithm is applied to two game-based hyper-parameter optimisation problems. The N-Tuple system directly models the statistics, approximating the fitness and number of evaluations of each modelled combination of parameters. The model is simple, efficient and informative. Results show that the NTBEA significantly outperforms grid search and an estimation of distribution algorithm.
△ Less
Submitted 8 May, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
Efficient Noisy Optimisation with the Sliding Window Compact Genetic Algorithm
Authors:
Simon M. Lucas,
Jialin Liu,
Diego Pérez-Liébana
Abstract:
The compact genetic algorithm is an Estimation of Distribution Algorithm for binary optimisation problems. Unlike the standard Genetic Algorithm, no cross-over or mutation is involved. Instead, the compact Genetic Algorithm uses a virtual population represented as a probability distribution over the set of binary strings. At each optimisation iteration, exactly two individuals are generated by sam…
▽ More
The compact genetic algorithm is an Estimation of Distribution Algorithm for binary optimisation problems. Unlike the standard Genetic Algorithm, no cross-over or mutation is involved. Instead, the compact Genetic Algorithm uses a virtual population represented as a probability distribution over the set of binary strings. At each optimisation iteration, exactly two individuals are generated by sampling from the distribution, and compared exactly once to determine a winner and a loser. The probability distribution is then adjusted to increase the likelihood of generating individuals similar to the winner.
This paper introduces two straightforward variations of the compact Genetic Algorithm, each of which lead to a significant improvement in performance. The main idea is to make better use of each fitness evaluation, by ensuring that each evaluated individual is used in multiple win/loss comparisons. The first variation is to sample $n>2$ individuals at each iteration to make $n(n-1)/2$ comparisons. The second variation only samples one individual at each iteration but keeps a sliding history window of previous individuals to compare with. We evaluate methods on two noisy test problems and show that in each case they significantly outperform the compact Genetic Algorithm, while maintaining the simplicity of the algorithm.
△ Less
Submitted 7 August, 2017;
originally announced August 2017.
-
Evaluating Noisy Optimisation Algorithms: First Hitting Time is Problematic
Authors:
Simon M. Lucas,
Jialin Liu,
Diego Pérez-Liébana
Abstract:
A key part of any evolutionary algorithm is fitness evaluation. When fitness evaluations are corrupted by noise, as happens in many real-world problems as a consequence of various types of uncertainty, a strategy is needed in order to cope with this. Resampling is one of the most common strategies, whereby each solution is evaluated many times in order to reduce the variance of the fitness estimat…
▽ More
A key part of any evolutionary algorithm is fitness evaluation. When fitness evaluations are corrupted by noise, as happens in many real-world problems as a consequence of various types of uncertainty, a strategy is needed in order to cope with this. Resampling is one of the most common strategies, whereby each solution is evaluated many times in order to reduce the variance of the fitness estimates. When evaluating the performance of a noisy optimisation algorithm, a key consideration is the stopping condition for the algorithm. A frequently used stopping condition in runtime analysis, known as "First Hitting Time", is to stop the algorithm as soon as it encounters the optimal solution. However, this is unrealistic for real-world problems, as if the optimal solution were already known, there would be no need to search for it. This paper argues that the use of First Hitting Time, despite being a commonly used approach, is significantly flawed and overestimates the quality of many algorithms in real-world cases, where the optimum is not known in advance and has to be genuinely searched for. A better alternative is to measure the quality of the solution an algorithm returns after a fixed evaluation budget, i.e., to focus on final solution quality. This paper argues that focussing on final solution quality is more realistic and demonstrates cases where the results produced by each algorithm evaluation method lead to very different conclusions regarding the quality of each noisy optimisation algorithm.
△ Less
Submitted 12 July, 2017; v1 submitted 13 June, 2017;
originally announced June 2017.
-
The N-Tuple Bandit Evolutionary Algorithm for Automatic Game Improvement
Authors:
Kamolwan Kunanusont,
Raluca D. Gaina,
Jialin Liu,
Diego Perez-Liebana,
Simon M. Lucas
Abstract:
This paper describes a new evolutionary algorithm that is especially well suited to AI-Assisted Game Design. The approach adopted in this paper is to use observations of AI agents playing the game to estimate the game's quality. Some of best agents for this purpose are General Video Game AI agents, since they can be deployed directly on a new game without game-specific tuning; these agents tend to…
▽ More
This paper describes a new evolutionary algorithm that is especially well suited to AI-Assisted Game Design. The approach adopted in this paper is to use observations of AI agents playing the game to estimate the game's quality. Some of best agents for this purpose are General Video Game AI agents, since they can be deployed directly on a new game without game-specific tuning; these agents tend to be based on stochastic algorithms which give robust but noisy results and tend to be expensive to run. This motivates the main contribution of the paper: the development of the novel N-Tuple Bandit Evolutionary Algorithm, where a model is used to estimate the fitness of unsampled points and a bandit approach is used to balance exploration and exploitation of the search space. Initial results on optimising a Space Battle game variant suggest that the algorithm offers far more robust results than the Random Mutation Hill Climber and a Biased Mutation variant, which are themselves known to offer competitive performance across a range of problems. Subjective observations are also given by human players on the nature of the evolved games, which indicate a preference towards games generated by the N-Tuple algorithm.
△ Less
Submitted 18 March, 2017;
originally announced May 2017.
-
Analysis of Vanilla Rolling Horizon Evolution Parameters in General Video Game Playing
Authors:
Raluca D. Gaina,
Jialin Liu,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
Monte Carlo Tree Search techniques have generally dominated General Video Game Playing, but recent research has started looking at Evolutionary Algorithms and their potential at matching Tree Search level of play or even outperforming these methods. Online or Rolling Horizon Evolution is one of the options available to evolve sequences of actions for planning in General Video Game Playing, but no…
▽ More
Monte Carlo Tree Search techniques have generally dominated General Video Game Playing, but recent research has started looking at Evolutionary Algorithms and their potential at matching Tree Search level of play or even outperforming these methods. Online or Rolling Horizon Evolution is one of the options available to evolve sequences of actions for planning in General Video Game Playing, but no research has been done up to date that explores the capabilities of the vanilla version of this algorithm in multiple games. This study aims to critically analyse the different configurations regarding population size and individual length in a set of 20 games from the General Video Game AI corpus. Distinctions are made between deterministic and stochastic games, and the implications of using superior time budgets are studied. Results show that there is scope for the use of these techniques, which in some configurations outperform Monte Carlo Tree Search, and also suggest that further research in these methods could boost their performance.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
Evaluating and Modelling Hanabi-Playing Agents
Authors:
Joseph Walton-Rivers,
Piers R. Williams,
Richard Bartle,
Diego Perez-Liebana,
Simon M. Lucas
Abstract:
Agent modelling involves considering how other agents will behave, in order to influence your own actions. In this paper, we explore the use of agent modelling in the hidden-information, collaborative card game Hanabi. We implement a number of rule-based agents, both from the literature and of our own devising, in addition to an Information Set Monte Carlo Tree Search (IS-MCTS) agent. We observe p…
▽ More
Agent modelling involves considering how other agents will behave, in order to influence your own actions. In this paper, we explore the use of agent modelling in the hidden-information, collaborative card game Hanabi. We implement a number of rule-based agents, both from the literature and of our own devising, in addition to an Information Set Monte Carlo Tree Search (IS-MCTS) agent. We observe poor results from IS-MCTS, so construct a new, predictor version that uses a model of the agents with which it is paired. We observe a significant improvement in game-playing strength from this agent in comparison to IS-MCTS, resulting from its consideration of what the other agents in a game would do. In addition, we create a flawed rule-based agent to highlight the predictor's capabilities with such an agent.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
General Video Game AI: Learning from Screen Capture
Authors:
Kamolwan Kunanusont,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
General Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. A…
▽ More
General Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. After testing this algorithm using various games of different categories and difficulty levels, the results suggest that our proposed screen capture learning agent has the potential to learn many different games using only a single learning algorithm.
△ Less
Submitted 23 April, 2017;
originally announced April 2017.
-
Population Seeding Techniques for Rolling Horizon Evolution in General Video Game Playing
Authors:
Rauca D. Gaina,
Simon M. Lucas,
Diego Perez-Liebana
Abstract:
While Monte Carlo Tree Search and closely related methods have dominated General Video Game Playing, recent research has demonstrated the promise of Rolling Horizon Evolutionary Algorithms as an interesting alternative. However, there is little attention paid to population initialization techniques in the setting of general real-time video games. Therefore, this paper proposes the use of populatio…
▽ More
While Monte Carlo Tree Search and closely related methods have dominated General Video Game Playing, recent research has demonstrated the promise of Rolling Horizon Evolutionary Algorithms as an interesting alternative. However, there is little attention paid to population initialization techniques in the setting of general real-time video games. Therefore, this paper proposes the use of population seeding to improve the performance of Rolling Horizon Evolution and presents the results of two methods, One Step Look Ahead and Monte Carlo Tree Search, tested on 20 games of the General Video Game AI corpus with multiple evolution parameter values (population size and individual length). An in-depth analysis is carried out between the results of the seeding methods and the vanilla Rolling Horizon Evolution. In addition, the paper presents a comparison to a Monte Carlo Tree Search algorithm. The results are promising, with seeding able to boost performance significantly over baseline evolution and even match the high level of play obtained by the Monte Carlo Tree Search.
△ Less
Submitted 23 April, 2017;
originally announced April 2017.
-
Evolving Game Skill-Depth using General Video Game AI Agents
Authors:
Jialin Liu,
Julian Togelius,
Diego Perez-Liebana,
Simon M. Lucas
Abstract:
Most games have, or can be generalised to have, a number of parameters that may be varied in order to provide instances of games that lead to very different player experiences. The space of possible parameter settings can be seen as a search space, and we can therefore use a Random Mutation Hill Climbing algorithm or other search methods to find the parameter settings that induce the best games. O…
▽ More
Most games have, or can be generalised to have, a number of parameters that may be varied in order to provide instances of games that lead to very different player experiences. The space of possible parameter settings can be seen as a search space, and we can therefore use a Random Mutation Hill Climbing algorithm or other search methods to find the parameter settings that induce the best games. One of the hardest parts of this approach is defining a suitable fitness function. In this paper we explore the possibility of using one of a growing set of General Video Game AI agents to perform automatic play-testing. This enables a very general approach to game evaluation based on estimating the skill-depth of a game. Agent-based play-testing is computationally expensive, so we compare two simple but efficient optimisation algorithms: the Random Mutation Hill-Climber and the Multi-Armed Bandit Random Mutation Hill-Climber. For the test game we use a space-battle game in order to provide a suitable balance between simulation speed and potential skill-depth. Results show that both algorithms are able to rapidly evolve game versions with significant skill-depth, but that choosing a suitable resampling number is essential in order to combat the effects of noise.
△ Less
Submitted 18 March, 2017;
originally announced March 2017.
-
Ms. Pac-Man Versus Ghost Team CIG 2016 Competition
Authors:
Piers R. Williams,
Diego Perez-Liebana,
Simon M. Lucas
Abstract:
This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment…
▽ More
This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment than before. The competition will debut at the IEEE Computational Intelligence and Games Conference 2016. Some preliminary results showing the effects of Partial Observability and the benefits of simple communication are also presented.
△ Less
Submitted 8 September, 2016;
originally announced September 2016.
-
Optimal resampling for the noisy OneMax problem
Authors:
Jialin Liu,
Michael Fairbank,
Diego Pérez-Liébana,
Simon M. Lucas
Abstract:
The OneMax problem is a standard benchmark optimisation problem for a binary search space. Recent work on applying a Bandit-Based Random Mutation Hill-Climbing algorithm to the noisy OneMax Problem showed that it is important to choose a good value for the resampling number to make a careful trade off between taking more samples in order to reduce noise, and taking fewer samples to reduce the tota…
▽ More
The OneMax problem is a standard benchmark optimisation problem for a binary search space. Recent work on applying a Bandit-Based Random Mutation Hill-Climbing algorithm to the noisy OneMax Problem showed that it is important to choose a good value for the resampling number to make a careful trade off between taking more samples in order to reduce noise, and taking fewer samples to reduce the total computational cost. This paper extends that observation, by deriving an analytical expression for the running time of the RMHC algorithm with resampling applied to the noisy OneMax problem, and showing both theoretically and empirically that the optimal resampling number increases with the number of dimensions in the search space.
△ Less
Submitted 12 June, 2017; v1 submitted 22 July, 2016;
originally announced July 2016.
-
Rolling Horizon Coevolutionary Planning for Two-Player Video Games
Authors:
Jialin Liu,
Diego Pérez-Liébana,
Simon M. Lucas
Abstract:
This paper describes a new algorithm for decision making in two-player real-time video games. As with Monte Carlo Tree Search, the algorithm can be used without heuristics and has been developed for use in general video game AI. The approach is to extend recent work on rolling horizon evolutionary planning, which has been shown to work well for single-player games, to two (or in principle many) pl…
▽ More
This paper describes a new algorithm for decision making in two-player real-time video games. As with Monte Carlo Tree Search, the algorithm can be used without heuristics and has been developed for use in general video game AI. The approach is to extend recent work on rolling horizon evolutionary planning, which has been shown to work well for single-player games, to two (or in principle many) player games. To select an action the algorithm co-evolves two (or in the general case N) populations, one for each player, where each individual is a sequence of actions for the respective player. The fitness of each individual is evaluated by playing it against a selection of action-sequences from the opposing population. When choosing an action to take in the game, the first action is chosen from the fittest member of the population for that player. The new algorithm is compared with a number of general video game AI algorithms on three variations of a two-player space battle game, with promising results.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
Bandit-Based Random Mutation Hill-Climbing
Authors:
Jialin Liu,
Diego Peŕez-Liebana,
Simon M. Lucas
Abstract:
The Random Mutation Hill-Climbing algorithm is a direct search technique mostly used in discrete domains. It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it. In this work, we propose to use a novel method to select the neighbour solution using a set of independent multi- armed bandit-style selection units…
▽ More
The Random Mutation Hill-Climbing algorithm is a direct search technique mostly used in discrete domains. It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it. In this work, we propose to use a novel method to select the neighbour solution using a set of independent multi- armed bandit-style selection units which results in a bandit-based Random Mutation Hill-Climbing algorithm. The new algorithm significantly outperforms Random Mutation Hill-Climbing in both OneMax (in noise-free and noisy cases) and Royal Road problems (in the noise-free case). The algorithm shows particular promise for discrete optimisation problems where each fitness evaluation is expensive.
△ Less
Submitted 20 June, 2016;
originally announced June 2016.
-
Evolving controllers for simulated car racing
Authors:
Julian Togelius,
Simon M. Lucas
Abstract:
This paper describes the evolution of controllers for racing a simulated radio-controlled car around a track, modelled on a real physical track. Five different controller architectures were compared, based on neural networks, force fields and action sequences. The controllers use either egocentric (first person), Newtonian (third person) or no information about the state of the car (open-loop co…
▽ More
This paper describes the evolution of controllers for racing a simulated radio-controlled car around a track, modelled on a real physical track. Five different controller architectures were compared, based on neural networks, force fields and action sequences. The controllers use either egocentric (first person), Newtonian (third person) or no information about the state of the car (open-loop controller). The only controller that was able to evolve good racing behaviour was based on a neural network acting on egocentric inputs.
△ Less
Submitted 1 November, 2006;
originally announced November 2006.