Search | arXiv e-print repository

Predictive Control Using Learned State Space Models via Rolling Horizon Evolution

Abstract: A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolu… ▽ More A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted at the Bridging the Gap Between AI Planning and Reinforcement Learning (PRL) Workshop at ICAPS 2021

arXiv:2008.04446 [pdf, other]

Cross-Platform Games in Kotlin

Authors: Simon M. Lucas

Abstract: This demo paper describes a simple and practical approach to writing cross-platform casual games using the Kotlin programming language. A key aim is to make it much easier for researchers to demonstrate their AI playing a range of games. Pure Kotlin code (which excludes using any Java graphics libraries) can be transpiled to JavaScript and run in a web browser. However, writing Kotlin code that wi… ▽ More This demo paper describes a simple and practical approach to writing cross-platform casual games using the Kotlin programming language. A key aim is to make it much easier for researchers to demonstrate their AI playing a range of games. Pure Kotlin code (which excludes using any Java graphics libraries) can be transpiled to JavaScript and run in a web browser. However, writing Kotlin code that will run without modification both in a web browser and on the JVM is not trivial; it requires strict adherence to an appropriate methodology. The contribution of this paper is to provide such a method including a software design and to demonstrate this working for Tetris, played either by AI or human. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: To appear in IEEE Conference on Games 2020. Describes software in: https://github.com/SimonLucas/XKG

arXiv:2007.09297 [pdf, other]

Modulation of viability signals for self-regulatory control

Authors: Alvaro Ovalle, Simon M. Lucas

Abstract: We revisit the role of instrumental value as a driver of adaptive behavior. In active inference, instrumental or extrinsic value is quantified by the information-theoretic surprisal of a set of observations measuring the extent to which those observations conform to prior beliefs or preferences. That is, an agent is expected to seek the type of evidence that is consistent with its own model of the… ▽ More We revisit the role of instrumental value as a driver of adaptive behavior. In active inference, instrumental or extrinsic value is quantified by the information-theoretic surprisal of a set of observations measuring the extent to which those observations conform to prior beliefs or preferences. That is, an agent is expected to seek the type of evidence that is consistent with its own model of the world. For reinforcement learning tasks, the distribution of preferences replaces the notion of reward. We explore a scenario in which the agent learns this distribution in a self-supervised manner. In particular, we highlight the distinction between observations induced by the environment and those pertaining more directly to the continuity of an agent in time. We evaluate our methodology in a dynamic environment with discrete time and actions. First with a surprisal minimizing model-free agent (in the RL sense) and then expanding to the model-based case to minimize the expected free energy. △ Less

Submitted 13 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted at the International Workshop on Active Inference 2020 (camera-ready version). Extended from 6 to 13 pages to include appendices and a more comprehensive reference list

arXiv:2005.11247 [pdf, other]

Evaluating Generalisation in General Video Game Playing

Authors: Martin Balla, Simon M. Lucas, Diego Perez-Liebana

Abstract: The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they… ▽ More The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they typically require much more data. This work investigates 3 versions of the Advantage Actor-Critic (A2C) algorithm trained on a maximum of 2 levels from the available 5 from the GVGAI framework and compares their performance on all levels. The selected sub-set of games have different characteristics, like stochasticity, reward distribution and objectives. We found that stochasticity improves the generalisation, but too much can cause the algorithms to fail to learn the training levels. The quality of the training levels also matters, different sets of training levels can boost generalisation over all levels. In the GVGAI competition agents are scored based on their win rates and then their scores achieved in the games. We found that solely using the rewards provided by the game might not encourage winning. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: accepted for publication in IEEE Conference on Games (CoG) 2020

arXiv:2004.07155 [pdf, other]

Bootstrapped model learning and error correction for planning with uncertainty in model-based RL

Authors: Alvaro Ovalle, Simon M. Lucas

Abstract: Having access to a forward model enables the use of planning algorithms such as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is unavailable, a natural aim is to learn a model that reflects accurately the dynamics of the environment. In many situations it might not be possible and minimal glitches in the model may lead to poor performance and failure. This paper explores the… ▽ More Having access to a forward model enables the use of planning algorithms such as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is unavailable, a natural aim is to learn a model that reflects accurately the dynamics of the environment. In many situations it might not be possible and minimal glitches in the model may lead to poor performance and failure. This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents. We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards. We experiment with a number of schemes to extract the most likely predictions. Moreover, we also introduce a global error correction filter that applies high-level constraints guided by the context provided through the predictive distribution. We illustrate our approach on Minipacman. The evaluation demonstrates that when dealing with imperfect models, our methods exhibit increased performance and stability, both in terms of model accuracy and in its use within a planning algorithm. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.13949 [pdf, other]

Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition

Authors: Zhentao Tang, Yuanheng Zhu, Dongbin Zhao, Simon M. Lucas

Abstract: The Fighting Game AI Competition (FTGAIC) provides a challenging benchmark for 2-player video game AI. The challenge arises from the large action space, diverse styles of characters and abilities, and the real-time nature of the game. In this paper, we propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning. The approach is readily applicable… ▽ More The Fighting Game AI Competition (FTGAIC) provides a challenging benchmark for 2-player video game AI. The challenge arises from the large action space, diverse styles of characters and abilities, and the real-time nature of the game. In this paper, we propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning. The approach is readily applicable to any 2-player video game. In contrast to conventional RHEA, an opponent model is proposed and is optimized by supervised learning with cross-entropy and reinforcement learning with policy gradient and Q-learning respectively, based on history observations from opponent. The model is learned during the live gameplay. With the learned opponent model, the extended RHEA is able to make more realistic plans based on what the opponent is likely to do. This tends to lead to better results. We compared our approach directly with the bots from the FTGAIC 2018 competition, and found our method to significantly outperform all of them, for all three character. Furthermore, our proposed bot with the policy-gradient-based opponent model is the only one without using Monte-Carlo Tree Search (MCTS) among top five bots in the 2019 competition in which it achieved second place, while using much less domain knowledge than the winner. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: 10 pages, 7 figures

arXiv:2003.12331 [pdf, other]

Rolling Horizon Evolutionary Algorithms for General Video Game Playing

Authors: Raluca D. Gaina, Sam Devlin, Simon M. Lucas, Diego Perez-Liebana

Abstract: Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on the specific configuration of modifications and hybrids introduced over several papers, each adding additional parameters to the core algorithm. Further, the be… ▽ More Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on the specific configuration of modifications and hybrids introduced over several papers, each adding additional parameters to the core algorithm. Further, the best previously published parameters have been found from only a few human-picked combinations, as the possibility space has grown beyond exhaustive search. This paper presents the state of the art in Rolling Horizon Evolutionary Algorithms, combining all modifications described in literature, as well as new ones, for a large resultant hybrid. We then use a parameter optimiser, the N-Tuple Bandit Evolutionary Algorithm, to find the best combination of parameters in 20 games from the General Video Game AI Framework. Further, we analyse the algorithm's parameters and some interesting combinations revealed through the optimisation process. Lastly, we find new state of the art solutions on several games by automatically exploring the large parameter space of RHEA. △ Less

Submitted 24 August, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

arXiv:1909.00442 [pdf, other]

Learning Local Forward Models on Unforgiving Games

Authors: Alexander Dockhorn, Simon M. Lucas, Vanessa Volz, Ivan Bravi, Raluca D. Gaina, Diego Perez-Liebana

Abstract: This paper examines learning approaches for forward models based on local cell transition functions. We provide a formal definition of local forward models for which we propose two basic learning approaches. Our analysis is based on the game Sokoban, where a wrong action can lead to an unsolvable game state. Therefore, an accurate prediction of an action's resulting state is necessary to avoid thi… ▽ More This paper examines learning approaches for forward models based on local cell transition functions. We provide a formal definition of local forward models for which we propose two basic learning approaches. Our analysis is based on the game Sokoban, where a wrong action can lead to an unsolvable game state. Therefore, an accurate prediction of an action's resulting state is necessary to avoid this scenario. In contrast to learning the complete state transition function, local forward models allow extracting multiple training examples from a single state transition. In this way, the Hash Set model, as well as the Decision Tree model, quickly learn to predict upcoming state transitions of both the training and the test set. Applying the model using a statistical forward planner showed that the best models can be used to satisfying degree even in cases in which the test levels have not yet been seen. Our evaluation includes an analysis of various local neighbourhood patterns and sizes to test the learners' capabilities in case too few or too many attributes are extracted, of which the latter has shown do degrade the performance of the model learner. △ Less

Submitted 1 September, 2019; originally announced September 2019.

Comments: 4 pages, 3 figures, 3 tables, accepted at IEEE COG 2019

arXiv:1906.04023 [pdf, other]

Project Thyia: A Forever Gameplayer

Authors: Raluca D. Gaina, Simon M. Lucas, Diego Perez-Liebana

Abstract: The space of Artificial Intelligence entities is dominated by conversational bots. Some of them fit in our pockets and we take them everywhere we go, or allow them to be a part of human homes. Siri, Alexa, they are recognised as present in our world. But a lot of games research is restricted to existing in the separate realm of software. We enter different worlds when playing games, but those worl… ▽ More The space of Artificial Intelligence entities is dominated by conversational bots. Some of them fit in our pockets and we take them everywhere we go, or allow them to be a part of human homes. Siri, Alexa, they are recognised as present in our world. But a lot of games research is restricted to existing in the separate realm of software. We enter different worlds when playing games, but those worlds cease to exist once we quit. Similarly, AI game-players are run once on a game (or maybe for longer periods of time, in the case of learning algorithms which need some, still limited, period for training), and they cease to exist once the game ends. But what if they didn't? What if there existed artificial game-players that continuously played games, learned from their experiences and kept getting better? What if they interacted with the real world and us, humans: live-streaming games, chatting with viewers, accepting suggestions for strategies or games to play, forming opinions on popular game titles? In this paper, we introduce the vision behind a new project called Thyia, which focuses around creating a present, continuous, `always-on', interactive game-player. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 8 pages, 1 figure, accepted at IEEE COG 2019

arXiv:1905.13516 [pdf, other]

Foundations of Digital Archæoludology

Authors: Cameron Browne, Dennis J. N. J. Soemers, Éric Piette, Matthew Stephenson, Michael Conrad, Walter Crist, Thierry Depaulis, Eddie Duggan, Fred Horn, Steven Kelk, Simon M. Lucas, João Pedro Neto, David Parlett, Abdallah Saffidine, Ulrich Schädler, Jorge Nuno Silva, Alex de Voogt, Mark H. M. Winands

Abstract: Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and the… ▽ More Digital Archaeoludology (DAL) is a new field of study involving the analysis and reconstruction of ancient games from incomplete descriptions and archaeological evidence using modern computational techniques. The aim is to provide digital tools and methods to help game historians and other researchers better understand traditional games, their development throughout recorded human history, and their relationship to the development of human culture and mathematical knowledge. This work is being explored in the ERC-funded Digital Ludeme Project. The aim of this inaugural international research meeting on DAL is to gather together leading experts in relevant disciplines - computer science, artificial intelligence, machine learning, computational phylogenetics, mathematics, history, archaeology, anthropology, etc. - to discuss the key themes and establish the foundations for this new field of research, so that it may continue beyond the lifetime of its initiating project. △ Less

Submitted 31 May, 2019; originally announced May 2019.

Comments: Report on Dagstuhl Research Meeting. Authored/edited by all participants. Appendices by Thierry Depaulis

arXiv:1905.05077 [pdf, other]

doi 10.1145/3321707.3321781

Tile Pattern KL-Divergence for Analysing and Evolving Game Levels

Authors: Simon M. Lucas, Vanessa Volz

Abstract: This paper provides a detailed investigation of using the Kullback-Leibler (KL) Divergence as a way to compare and analyse game-levels, and hence to use the measure as the objective function of an evolutionary algorithm to evolve new levels. We describe the benefits of its asymmetry for level analysis and demonstrate how (not surprisingly) the quality of the results depends on the features used. H… ▽ More This paper provides a detailed investigation of using the Kullback-Leibler (KL) Divergence as a way to compare and analyse game-levels, and hence to use the measure as the objective function of an evolutionary algorithm to evolve new levels. We describe the benefits of its asymmetry for level analysis and demonstrate how (not surprisingly) the quality of the results depends on the features used. Here we use tile-patterns of various sizes as features. When using the measure for evolution-based level generation, we demonstrate that the choice of variation operator is critical in order to provide an efficient search process, and introduce a novel convolutional mutation operator to facilitate this. We compare the results with alternative generators, including evolving in the latent space of generative adversarial networks, and Wave Function Collapse. The results clearly show the proposed method to provide competitive performance, providing reasonable quality results with very fast training and reasonably fast generation. △ Less

Submitted 24 April, 2019; originally announced May 2019.

Comments: 8 pages plus references. Proceedings of GECCO 2019

arXiv:1903.12508 [pdf, other]

A Local Approach to Forward Model Learning: Results on the Game of Life Game

Authors: Simon M. Lucas, Alexander Dockhorn, Vanessa Volz, Chris Bamford, Raluca D. Gaina, Ivan Bravi, Diego Perez-Liebana, Sanaz Mostaghim, Rudolf Kruse

Abstract: This paper investigates the effect of learning a forward model on the performance of a statistical forward planning agent. We transform Conway's Game of Life simulation into a single-player game where the objective can be either to preserve as much life as possible or to extinguish all life as quickly as possible. In order to learn the forward model of the game, we formulate the problem in a nov… ▽ More This paper investigates the effect of learning a forward model on the performance of a statistical forward planning agent. We transform Conway's Game of Life simulation into a single-player game where the objective can be either to preserve as much life as possible or to extinguish all life as quickly as possible. In order to learn the forward model of the game, we formulate the problem in a novel way that learns the local cell transition function by creating a set of supervised training data and predicting the next state of each cell in the grid based on its current state and immediate neighbours. Using this method we are able to harvest sufficient data to learn perfect forward models by observing only a few complete state transitions, using either a look-up table, a decision tree or a neural network. In contrast, learning the complete state transition function is a much harder task and our initial efforts to do this using deep convolutional auto-encoders were less successful. We also investigate the effects of imperfect learned models on prediction errors and game-playing performance, and show that even models with significant errors can provide good performance. △ Less

Submitted 29 March, 2019; originally announced March 2019.

Comments: Submitted to IEEE Conference on Games 2019

arXiv:1901.00723 [pdf, other]

Efficient Evolutionary Methods for Game Agent Optimisation: Model-Based is Best

Authors: Simon M. Lucas, Jialin Liu, Ivan Bravi, Raluca D. Gaina, John Woodward, Vanessa Volz, Diego Perez-Liebana

Abstract: This paper introduces a simple and fast variant of Planet Wars as a test-bed for statistical planning based Game AI agents, and for noisy hyper-parameter optimisation. Planet Wars is a real-time strategy game with simple rules but complex game-play. The variant introduced in this paper is designed for speed to enable efficient experimentation, and also for a fixed action space to enable practical… ▽ More This paper introduces a simple and fast variant of Planet Wars as a test-bed for statistical planning based Game AI agents, and for noisy hyper-parameter optimisation. Planet Wars is a real-time strategy game with simple rules but complex game-play. The variant introduced in this paper is designed for speed to enable efficient experimentation, and also for a fixed action space to enable practical inter-operability with General Video Game AI agents. If we treat the game as a win-loss game (which is standard), then this leads to challenging noisy optimisation problems both in tuning agents to play the game, and in tuning game parameters. Here we focus on the problem of tuning an agent, and report results using the recently developed N-Tuple Bandit Evolutionary Algorithm and a number of other optimisers, including Sequential Model-based Algorithm Configuration (SMAC). Results indicate that the N-Tuple Bandit Evolutionary offers competitive performance as well as insight into the effects of combinations of parameter choices. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: 8 pages, to appear in 2019 AAAI workshop on Games and Simulations for Artificial Intelligence ( https://www.gamesim.ai/ )

arXiv:1806.08544 [pdf, other]

Game AI Research with Fast Planet Wars Variants

Authors: Simon M. Lucas

Abstract: This paper describes a new implementation of Planet Wars, designed from the outset for Game AI research. The skill-depth of the game makes it a challenge for game-playing agents, and the speed of more than 1 million game ticks per second enables rapid experimentation and prototyping. The parameterised nature of the game together with an interchangeable actuator model make it well suited to automat… ▽ More This paper describes a new implementation of Planet Wars, designed from the outset for Game AI research. The skill-depth of the game makes it a challenge for game-playing agents, and the speed of more than 1 million game ticks per second enables rapid experimentation and prototyping. The parameterised nature of the game together with an interchangeable actuator model make it well suited to automated game tuning. The game is designed to be fun to play for humans, and is directly playable by General Video Game AI agents. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: To appear in Proceedings of IEEE Conference on Computational and Games, 2018

arXiv:1805.00728 [pdf, other]

Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network

Authors: Vanessa Volz, Jacob Schrum, Jialin Liu, Simon M. Lucas, Adam Smith, Sebastian Risi

Abstract: Generative Adversarial Networks (GANs) are a machine learning approach capable of generating novel example outputs across a space of provided training examples. Procedural Content Generation (PCG) of levels for video games could benefit from such models, especially for games where there is a pre-existing corpus of levels to emulate. This paper trains a GAN to generate levels for Super Mario Bros u… ▽ More Generative Adversarial Networks (GANs) are a machine learning approach capable of generating novel example outputs across a space of provided training examples. Procedural Content Generation (PCG) of levels for video games could benefit from such models, especially for games where there is a pre-existing corpus of levels to emulate. This paper trains a GAN to generate levels for Super Mario Bros using a level from the Video Game Level Corpus. The approach successfully generates a variety of levels similar to one in the original corpus, but is further improved by application of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Specifically, various fitness functions are used to discover levels within the latent space of the GAN that maximize desired properties. Simple static properties are optimized, such as a given distribution of tile types. Additionally, the champion A* agent from the 2009 Mario AI competition is used to assess whether a level is playable, and how many jumping actions are required to beat it. These fitness functions allow for the discovery of levels that exist within the space of examples designed by experts, and also guide the search towards levels that fulfill one or more specified objectives. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: 8 pages, GECCO2018

arXiv:1802.10363 [pdf, other]

General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms

Authors: Diego Perez-Liebana, Jialin Liu, Ahmed Khalifa, Raluca D. Gaina, Julian Togelius, Simon M. Lucas

Abstract: General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video G… ▽ More General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video Game Description Language (VGDL). The framework has been expanded into several tracks during the last few years to meet the demand of different research directions. The agents are required either to play multiple unknown games with or without access to game simulations, or to design new game levels or rules. This survey paper presents the VGDL, the GVGAI framework, existing tracks, and reviews the wide use of GVGAI framework in research, education and competitions five years after its birth. A future plan of framework improvements is also described. △ Less

Submitted 22 February, 2019; v1 submitted 28 February, 2018; originally announced February 2018.

Comments: 20 pages, 1 figure, accepted by IEEE ToG

arXiv:1802.05991 [pdf, other]

The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation

Authors: Simon M Lucas, Jialin Liu, Diego Perez-Liebana

Abstract: This paper describes the N-Tuple Bandit Evolutionary Algorithm (NTBEA), an optimisation algorithm developed for noisy and expensive discrete (combinatorial) optimisation problems. The algorithm is applied to two game-based hyper-parameter optimisation problems. The N-Tuple system directly models the statistics, approximating the fitness and number of evaluations of each modelled combination of par… ▽ More This paper describes the N-Tuple Bandit Evolutionary Algorithm (NTBEA), an optimisation algorithm developed for noisy and expensive discrete (combinatorial) optimisation problems. The algorithm is applied to two game-based hyper-parameter optimisation problems. The N-Tuple system directly models the statistics, approximating the fitness and number of evaluations of each modelled combination of parameters. The model is simple, efficient and informative. Results show that the NTBEA significantly outperforms grid search and an estimation of distribution algorithm. △ Less

Submitted 8 May, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

Comments: 9 pages, 3 figures, 3 table. This is the final version of the article accepted by WCCI2018

arXiv:1708.02068 [pdf, other]

Efficient Noisy Optimisation with the Sliding Window Compact Genetic Algorithm

Authors: Simon M. Lucas, Jialin Liu, Diego Pérez-Liébana

Abstract: The compact genetic algorithm is an Estimation of Distribution Algorithm for binary optimisation problems. Unlike the standard Genetic Algorithm, no cross-over or mutation is involved. Instead, the compact Genetic Algorithm uses a virtual population represented as a probability distribution over the set of binary strings. At each optimisation iteration, exactly two individuals are generated by sam… ▽ More The compact genetic algorithm is an Estimation of Distribution Algorithm for binary optimisation problems. Unlike the standard Genetic Algorithm, no cross-over or mutation is involved. Instead, the compact Genetic Algorithm uses a virtual population represented as a probability distribution over the set of binary strings. At each optimisation iteration, exactly two individuals are generated by sampling from the distribution, and compared exactly once to determine a winner and a loser. The probability distribution is then adjusted to increase the likelihood of generating individuals similar to the winner. This paper introduces two straightforward variations of the compact Genetic Algorithm, each of which lead to a significant improvement in performance. The main idea is to make better use of each fitness evaluation, by ensuring that each evaluated individual is used in multiple win/loss comparisons. The first variation is to sample $n>2$ individuals at each iteration to make $n(n-1)/2$ comparisons. The second variation only samples one individual at each iteration but keeps a sliding history window of previous individuals to compare with. We evaluate methods on two noisy test problems and show that in each case they significantly outperform the compact Genetic Algorithm, while maintaining the simplicity of the algorithm. △ Less

Submitted 7 August, 2017; originally announced August 2017.

Comments: 11 pages, 2 tables, 8 figures

arXiv:1706.05086 [pdf, other]

Evaluating Noisy Optimisation Algorithms: First Hitting Time is Problematic

Authors: Simon M. Lucas, Jialin Liu, Diego Pérez-Liébana

Abstract: A key part of any evolutionary algorithm is fitness evaluation. When fitness evaluations are corrupted by noise, as happens in many real-world problems as a consequence of various types of uncertainty, a strategy is needed in order to cope with this. Resampling is one of the most common strategies, whereby each solution is evaluated many times in order to reduce the variance of the fitness estimat… ▽ More A key part of any evolutionary algorithm is fitness evaluation. When fitness evaluations are corrupted by noise, as happens in many real-world problems as a consequence of various types of uncertainty, a strategy is needed in order to cope with this. Resampling is one of the most common strategies, whereby each solution is evaluated many times in order to reduce the variance of the fitness estimates. When evaluating the performance of a noisy optimisation algorithm, a key consideration is the stopping condition for the algorithm. A frequently used stopping condition in runtime analysis, known as "First Hitting Time", is to stop the algorithm as soon as it encounters the optimal solution. However, this is unrealistic for real-world problems, as if the optimal solution were already known, there would be no need to search for it. This paper argues that the use of First Hitting Time, despite being a commonly used approach, is significantly flawed and overestimates the quality of many algorithms in real-world cases, where the optimum is not known in advance and has to be genuinely searched for. A better alternative is to measure the quality of the solution an algorithm returns after a fixed evaluation budget, i.e., to focus on final solution quality. This paper argues that focussing on final solution quality is more realistic and demonstrates cases where the results produced by each algorithm evaluation method lead to very different conclusions regarding the quality of each noisy optimisation algorithm. △ Less

Submitted 12 July, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

Comments: 4 pages, 4 figurs, 1 table

arXiv:1705.01080 [pdf, other]

The N-Tuple Bandit Evolutionary Algorithm for Automatic Game Improvement

Authors: Kamolwan Kunanusont, Raluca D. Gaina, Jialin Liu, Diego Perez-Liebana, Simon M. Lucas

Abstract: This paper describes a new evolutionary algorithm that is especially well suited to AI-Assisted Game Design. The approach adopted in this paper is to use observations of AI agents playing the game to estimate the game's quality. Some of best agents for this purpose are General Video Game AI agents, since they can be deployed directly on a new game without game-specific tuning; these agents tend to… ▽ More This paper describes a new evolutionary algorithm that is especially well suited to AI-Assisted Game Design. The approach adopted in this paper is to use observations of AI agents playing the game to estimate the game's quality. Some of best agents for this purpose are General Video Game AI agents, since they can be deployed directly on a new game without game-specific tuning; these agents tend to be based on stochastic algorithms which give robust but noisy results and tend to be expensive to run. This motivates the main contribution of the paper: the development of the novel N-Tuple Bandit Evolutionary Algorithm, where a model is used to estimate the fitness of unsampled points and a bandit approach is used to balance exploration and exploitation of the search space. Initial results on optimising a Space Battle game variant suggest that the algorithm offers far more robust results than the Random Mutation Hill Climber and a Biased Mutation variant, which are themselves known to offer competitive performance across a range of problems. Subjective observations are also given by human players on the nature of the evolved games, which indicate a preference towards games generated by the N-Tuple algorithm. △ Less

Submitted 18 March, 2017; originally announced May 2017.

Comments: 8 pages, 9 figure, 2 tables, CEC2017

arXiv:1704.07075 [pdf, other]

Analysis of Vanilla Rolling Horizon Evolution Parameters in General Video Game Playing

Authors: Raluca D. Gaina, Jialin Liu, Simon M. Lucas, Diego Perez-Liebana

Abstract: Monte Carlo Tree Search techniques have generally dominated General Video Game Playing, but recent research has started looking at Evolutionary Algorithms and their potential at matching Tree Search level of play or even outperforming these methods. Online or Rolling Horizon Evolution is one of the options available to evolve sequences of actions for planning in General Video Game Playing, but no… ▽ More Monte Carlo Tree Search techniques have generally dominated General Video Game Playing, but recent research has started looking at Evolutionary Algorithms and their potential at matching Tree Search level of play or even outperforming these methods. Online or Rolling Horizon Evolution is one of the options available to evolve sequences of actions for planning in General Video Game Playing, but no research has been done up to date that explores the capabilities of the vanilla version of this algorithm in multiple games. This study aims to critically analyse the different configurations regarding population size and individual length in a set of 20 games from the General Video Game AI corpus. Distinctions are made between deterministic and stochastic games, and the implications of using superior time budgets are studied. Results show that there is scope for the use of these techniques, which in some configurations outperform Monte Carlo Tree Search, and also suggest that further research in these methods could boost their performance. △ Less

Submitted 24 April, 2017; originally announced April 2017.

Journal ref: Applications of Evolutionary Computation, EvoApplications, Lecture Notes in Computer Science, vol. 10199, Springer, Cham., p. 418-434, 2017

arXiv:1704.07069 [pdf, other]

Evaluating and Modelling Hanabi-Playing Agents

Authors: Joseph Walton-Rivers, Piers R. Williams, Richard Bartle, Diego Perez-Liebana, Simon M. Lucas

Abstract: Agent modelling involves considering how other agents will behave, in order to influence your own actions. In this paper, we explore the use of agent modelling in the hidden-information, collaborative card game Hanabi. We implement a number of rule-based agents, both from the literature and of our own devising, in addition to an Information Set Monte Carlo Tree Search (IS-MCTS) agent. We observe p… ▽ More Agent modelling involves considering how other agents will behave, in order to influence your own actions. In this paper, we explore the use of agent modelling in the hidden-information, collaborative card game Hanabi. We implement a number of rule-based agents, both from the literature and of our own devising, in addition to an Information Set Monte Carlo Tree Search (IS-MCTS) agent. We observe poor results from IS-MCTS, so construct a new, predictor version that uses a model of the agents with which it is paired. We observe a significant improvement in game-playing strength from this agent in comparison to IS-MCTS, resulting from its consideration of what the other agents in a game would do. In addition, we create a flawed rule-based agent to highlight the predictor's capabilities with such an agent. △ Less

Submitted 24 April, 2017; originally announced April 2017.

Comments: Proceedings of the IEEE Conference on Evolutionary Computation (2017)

arXiv:1704.06945 [pdf, other]

General Video Game AI: Learning from Screen Capture

Authors: Kamolwan Kunanusont, Simon M. Lucas, Diego Perez-Liebana

Abstract: General Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. A… ▽ More General Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. After testing this algorithm using various games of different categories and difficulty levels, the results suggest that our proposed screen capture learning agent has the potential to learn many different games using only a single learning algorithm. △ Less

Submitted 23 April, 2017; originally announced April 2017.

Comments: Proceedings of the IEEE Conference on Evolutionary Computation 2017

arXiv:1704.06942 [pdf, other]

Population Seeding Techniques for Rolling Horizon Evolution in General Video Game Playing

Authors: Rauca D. Gaina, Simon M. Lucas, Diego Perez-Liebana

Abstract: While Monte Carlo Tree Search and closely related methods have dominated General Video Game Playing, recent research has demonstrated the promise of Rolling Horizon Evolutionary Algorithms as an interesting alternative. However, there is little attention paid to population initialization techniques in the setting of general real-time video games. Therefore, this paper proposes the use of populatio… ▽ More While Monte Carlo Tree Search and closely related methods have dominated General Video Game Playing, recent research has demonstrated the promise of Rolling Horizon Evolutionary Algorithms as an interesting alternative. However, there is little attention paid to population initialization techniques in the setting of general real-time video games. Therefore, this paper proposes the use of population seeding to improve the performance of Rolling Horizon Evolution and presents the results of two methods, One Step Look Ahead and Monte Carlo Tree Search, tested on 20 games of the General Video Game AI corpus with multiple evolution parameter values (population size and individual length). An in-depth analysis is carried out between the results of the seeding methods and the vanilla Rolling Horizon Evolution. In addition, the paper presents a comparison to a Monte Carlo Tree Search algorithm. The results are promising, with seeding able to boost performance significantly over baseline evolution and even match the high level of play obtained by the Monte Carlo Tree Search. △ Less

Submitted 23 April, 2017; originally announced April 2017.

Comments: Proceedings of the IEEE Conference on Evolutionary Computation 2017

arXiv:1703.06275 [pdf, other]

Evolving Game Skill-Depth using General Video Game AI Agents

Authors: Jialin Liu, Julian Togelius, Diego Perez-Liebana, Simon M. Lucas

Abstract: Most games have, or can be generalised to have, a number of parameters that may be varied in order to provide instances of games that lead to very different player experiences. The space of possible parameter settings can be seen as a search space, and we can therefore use a Random Mutation Hill Climbing algorithm or other search methods to find the parameter settings that induce the best games. O… ▽ More Most games have, or can be generalised to have, a number of parameters that may be varied in order to provide instances of games that lead to very different player experiences. The space of possible parameter settings can be seen as a search space, and we can therefore use a Random Mutation Hill Climbing algorithm or other search methods to find the parameter settings that induce the best games. One of the hardest parts of this approach is defining a suitable fitness function. In this paper we explore the possibility of using one of a growing set of General Video Game AI agents to perform automatic play-testing. This enables a very general approach to game evaluation based on estimating the skill-depth of a game. Agent-based play-testing is computationally expensive, so we compare two simple but efficient optimisation algorithms: the Random Mutation Hill-Climber and the Multi-Armed Bandit Random Mutation Hill-Climber. For the test game we use a space-battle game in order to provide a suitable balance between simulation speed and potential skill-depth. Results show that both algorithms are able to rapidly evolve game versions with significant skill-depth, but that choosing a suitable resampling number is essential in order to combat the effects of noise. △ Less

Submitted 18 March, 2017; originally announced March 2017.

Comments: 9 pages, 17 figures, CEC2017

arXiv:1609.02316 [pdf, other]

Ms. Pac-Man Versus Ghost Team CIG 2016 Competition

Authors: Piers R. Williams, Diego Perez-Liebana, Simon M. Lucas

Abstract: This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment… ▽ More This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment than before. The competition will debut at the IEEE Computational Intelligence and Games Conference 2016. Some preliminary results showing the effects of Partial Observability and the benefits of simple communication are also presented. △ Less

Submitted 8 September, 2016; originally announced September 2016.

arXiv:1607.06641 [pdf, other]

Optimal resampling for the noisy OneMax problem

Authors: Jialin Liu, Michael Fairbank, Diego Pérez-Liébana, Simon M. Lucas

Abstract: The OneMax problem is a standard benchmark optimisation problem for a binary search space. Recent work on applying a Bandit-Based Random Mutation Hill-Climbing algorithm to the noisy OneMax Problem showed that it is important to choose a good value for the resampling number to make a careful trade off between taking more samples in order to reduce noise, and taking fewer samples to reduce the tota… ▽ More The OneMax problem is a standard benchmark optimisation problem for a binary search space. Recent work on applying a Bandit-Based Random Mutation Hill-Climbing algorithm to the noisy OneMax Problem showed that it is important to choose a good value for the resampling number to make a careful trade off between taking more samples in order to reduce noise, and taking fewer samples to reduce the total computational cost. This paper extends that observation, by deriving an analytical expression for the running time of the RMHC algorithm with resampling applied to the noisy OneMax problem, and showing both theoretically and empirically that the optimal resampling number increases with the number of dimensions in the search space. △ Less

Submitted 12 June, 2017; v1 submitted 22 July, 2016; originally announced July 2016.

Comments: 8 pages, 1 table, 6 figures

ACM Class: I.2.8

arXiv:1607.01730 [pdf, other]

Rolling Horizon Coevolutionary Planning for Two-Player Video Games

Authors: Jialin Liu, Diego Pérez-Liébana, Simon M. Lucas

Abstract: This paper describes a new algorithm for decision making in two-player real-time video games. As with Monte Carlo Tree Search, the algorithm can be used without heuristics and has been developed for use in general video game AI. The approach is to extend recent work on rolling horizon evolutionary planning, which has been shown to work well for single-player games, to two (or in principle many) pl… ▽ More This paper describes a new algorithm for decision making in two-player real-time video games. As with Monte Carlo Tree Search, the algorithm can be used without heuristics and has been developed for use in general video game AI. The approach is to extend recent work on rolling horizon evolutionary planning, which has been shown to work well for single-player games, to two (or in principle many) player games. To select an action the algorithm co-evolves two (or in the general case N) populations, one for each player, where each individual is a sequence of actions for the respective player. The fitness of each individual is evaluated by playing it against a selection of action-sequences from the opposing population. When choosing an action to take in the game, the first action is chosen from the fittest member of the population for that player. The new algorithm is compared with a number of general video game AI algorithms on three variations of a two-player space battle game, with promising results. △ Less

Submitted 6 July, 2016; originally announced July 2016.

Comments: 2 figures, 1 table, 6 pages

MSC Class: 91A05; 91A15; 68T20; 97R40

arXiv:1606.06041 [pdf, other]

Bandit-Based Random Mutation Hill-Climbing

Authors: Jialin Liu, Diego Peŕez-Liebana, Simon M. Lucas

Abstract: The Random Mutation Hill-Climbing algorithm is a direct search technique mostly used in discrete domains. It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it. In this work, we propose to use a novel method to select the neighbour solution using a set of independent multi- armed bandit-style selection units… ▽ More The Random Mutation Hill-Climbing algorithm is a direct search technique mostly used in discrete domains. It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it. In this work, we propose to use a novel method to select the neighbour solution using a set of independent multi- armed bandit-style selection units which results in a bandit-based Random Mutation Hill-Climbing algorithm. The new algorithm significantly outperforms Random Mutation Hill-Climbing in both OneMax (in noise-free and noisy cases) and Royal Road problems (in the noise-free case). The algorithm shows particular promise for discrete optimisation problems where each fitness evaluation is expensive. △ Less

Submitted 20 June, 2016; originally announced June 2016.

Comments: 7 pages, 10 figures

ACM Class: I.2.8

arXiv:cs/0611006 [pdf]

Evolving controllers for simulated car racing

Authors: Julian Togelius, Simon M. Lucas

Abstract: This paper describes the evolution of controllers for racing a simulated radio-controlled car around a track, modelled on a real physical track. Five different controller architectures were compared, based on neural networks, force fields and action sequences. The controllers use either egocentric (first person), Newtonian (third person) or no information about the state of the car (open-loop co… ▽ More This paper describes the evolution of controllers for racing a simulated radio-controlled car around a track, modelled on a real physical track. Five different controller architectures were compared, based on neural networks, force fields and action sequences. The controllers use either egocentric (first person), Newtonian (third person) or no information about the state of the car (open-loop controller). The only controller that was able to evolve good racing behaviour was based on a neural network acting on egocentric inputs. △ Less

Submitted 1 November, 2006; originally announced November 2006.

Comments: Won the CEC 2005 Best Student Paper Award

Journal ref: Proceedings of the 2005 Congress on Evolutionary Computation, pages 1906-1913

Showing 1–30 of 30 results for author: Lucas, S M