LOGO

Game-Benchmark for Evolutionary Algorithms

On this site, we provide some details on the constructed benchmark. Everything is work in progress. If you have questions, please don't hesitate to ask.

The benchmark was integrated into the COCO framework in order to make it convenient to run and analyse. You can find it on Github here. For more details on the COCO framework, including how to run your own algorithms and post-processing, please check out the separate documentation. The complete benchmark (as of July 2019) is also described in our recent paper at GECCO 2019 called "Single- and Multi-Objective Game-Benchmark for Evolutionary Algorithms". Link coming soon.

Function Suites

In the following, we descripe the function suites contained in the GBEA benchmark. For more detailed and technical descriptions of the context, please refer to the respective papers.

Top Trumps

The Top Trumps problem suites are based on Demonstrating the Feasability of Automatic Game Balancing, where the optimisation task is to fine-tune the values on the cards of a TopTrumps deck. This is done based on fitness functions taken from the paper, but the game was re-implemented in C++ to allow for an easier integration in COCO.

Below is an illustrative video of the game. For a detailed description, please refer to the paper. You can also find a more detailed and formalised description of the fitness functions there.

Game Rules

  1. Shuffle deck and distribute evenly among players
  2. Starting player chooses characteristic (category)
  3. All players compare corresponding values on their cards
  4. Player with highest value wins trick
  5. Until all cards have been played exactly once
  6. Winning player announces new characteristic, goto 3

Fitness Functions

We provide two function suites related to Top Trumps. The first is rw-top-trumps and comprised of 5 single-objective functions. The second one is rw-top-trumps-biobj, containing 3 bi-objective functions.

Any solution of the problem is an array that specifies all values of all cards in a deck. We specify that each deck has 4 categories. The search space dimension is thus a multiple of 4 and only depends on the number of cards specified to be in the deck.

The instances represent different potential themes of a deck. We express that by allowing different value ranges for each category, depending on the instance. The exact boundaries are generated using the instance number as a random seed.

rw-top-trumps
objectives 1
dimensions (88, 128, 168, 208) = 4* (22, 32, 42, 52)
functions 5
instances 15

Overview and characterisation of functions in rw-top-trumps. Column [min] indicates how the fitness measure x is transformed into a minimisation problem.

fid Fitness Measure (x) min
t_1 deck hypervolume -x
t_2 standard deviation of category averages -x
t_3 winrate of better player -x
t_4 switches of trick winner -x
t_5 trick difference at the end of the game x
rw-top-trumps-biobj
objectives 2
dimensions (88, 128, 168, 208) = 4* (22, 32, 42, 52)
functions 3
instances 15

Construction of the bi-objective functions from the rw-top-trumps-biobj suite (T_i) as pairs of single-objective functions from the rw-top-trumps suite (t_j).

T_1 = (t_1, t_2) T_2 = (t_3, t_4) T_3 = (t_3, t_5)

Mario GAN

The MarioGAN suite is based on Evolving mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network, where the optimisation task is to find \emph{good} Mario levels in a specifically defined search space. The fitness of the discovered levels is measured using several functions as surveyed in Understanding Mario: An Evaluation of Design Metrics for Platformers. The existing implementation from Evolving mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network as available on GitHub here was modified to be integrated with COCO. In addition, new training examples were added and the encoding replaced. The modified version of the code can be accessed on branch gbea.

Below is an illustrative video of the generation method. For a detailed description, please refer to the paper. You can also find a more detailed and formalised description of the fitness functions there.

Fitness Functions

rw-gan-mario
objectives 1
dimensions 10, 20, 30, 40
functions 84
instances 7

Overview of functions in rw-gan-mario. Function ids in the benchmark are indicated in the last four columns, divided by overworld [o], underground [u], overworld concatenated [oc] and underground concatenated [uc]. Column [min] indicates how the fitness measure x is transformed into a minimisation problem.

Fitness Measure (x) AI min o u oc uc
enemyDistribution - -x 1 2
positionDistribution - -x 3 4
decorationFrequency - 1-x 5 6
negativeSpace - 1-x 7 8
leniency - x 9 10
basicFitness A* x 11 12 13 14
basicFitness Scared x 15 16
airTime A* 1-x 17 18 19 20
airTime Scared 1-x 21 22
timeTaken A* 1-x 23 24 25 26
timeTaken Scared 1-x 27 28

Below is a more detailed explanation of each of the fitness measures:

  • enemyDistribution: standard dev. (std.) of enemy tiles (x-axis)
  • positionDistribution: std. of tiles you can stand on (y-axis)
  • decorationFrequency: percentage of pretty tiles (Tube, Enemy, Destructible Block, Question Block, or Bullet Bill)
  • negativeSpace: percentage of tiles you can stand on
  • leniency: weighted sum of subjective leniency of tiles
  • basicFitness: MarioAI championship score for AI
  • airTime: ratio between ticks in air vs. total ticks (if level completed, otherwise 1)
  • timeTaken: ratio between time taken and total time allowed (if level completed, otherwise 1)

rw-gan-mario-biobj
objectives 2
dimensions 10, 20, 30, 40
functions 10
instances 7

Construction of the bi-objective functions from the rw-gan-mario-biobj suite (M_i) as pairs of single-objective functions from the rw-gan-mario suite (m_j).

M_1 = (m_4, m_6) M_6 = (m_12, m_24)
M_2 = (m_4, m_8) M_7 = (m_13, m_19)
M_3 = (m_11, m_17) M_8 = (m_13, m_25)
M_4 = (m_11, m_23) M_9 = (m_14, m_20)
M_5 = (m_12,m_18) M_10 = (m_14,m_26)

Problem Submission

We will be using the COCO framework to produce benchmarking results based on your submitted problem. To do that, we need the following information from you:

  • possible search space dimensions
  • possible objective space dimensions
  • number of instances

There are three ways you can submit your problem to us which are detailed in the following in order of preference.

Server

If you have your problem running on a server, we just need an interface to your evaluation function. In order to keep calls consistent between all submissions, please adhere to the following specifications and formats. We want to be able to send our request with the following parameters to the address and port you provide.

n d i x_1 x_2 ... x_n, where
  • n is the search space dimension
  • d is the objective space dimension
  • i is the instance identifier
  • x_1 to x_n are the input values

You should then return the objective value(s) y_1 to y_d as

y_1 y_2 ... y_d.

Code

If you prefer to have us run the code locally, we need an interface to your evaluation function. To make things easier, please adhere to the following specifications and formats. We want to be able to call your code from the command line with the following parameters

n d i x_1 x_2 ... x_n, where
  • n is the search space dimension
  • d is the objective space dimension
  • i is the instance identifier
  • x_1 to x_n are the input values

You should then return the objective value(s) y_1 to y_d to stdout as

y_1 y_2 ... y_d.

Please include a readme file with

  • details on the setup and installation process
  • an example of calling your problem from the command line

The code will be running on an Ubuntu machine, so please either make them platform independent or executable on Ubuntu.

Data

Finally, if none of the above methods seems feasible to you, you can also send us some output values as raw data. For this option, please contact one of the organisers. We will then provide you with a textfile containing input vectors for each possible combination of search space dimension n, objective space dimension d and instance as

x_1 x_2 ... x_n separated by linebreaks.

For each input vector in each textfile you should then compute the objective value(s) and send them to us in the same format, i.e. textfiles that include output vectors as

x_1 x_2 ... x_n : y_1 y_2 ... y_d separated by linebreaks.

If you are interested in this option, please consider that we would like to have your response with the data by the problem submission deadline on June 30th, 2018. Therefore, please contact us early enough so that you are able to compute the output values in time. We are looking for at least 50 samples per input dimension if possible.

Glossary

In the following, we explain some key terms relevant to the GBEA.

Search Space Scalability

As we are working to compile a benchmark, it is important that we are able to investigate key questions such as how an algorithm performs for different search spaces. We are thus looking to provide problems in 4 or more dimensions, in order to be able to test the scalability of a given algorithm.

Instances

In order to be able to obtain statistically sound results, we are looking to provide 7 or more instances to each problem. Instances have the same (or very similar) fitness landscape characteristics, but their optima are located in different areas of the search space. Their scaling might also differ. Instances help avoid obtaining outlier results due, for example, to initialising an algorithm near the global optimum. One way to obtain instances is through transformations such as rotation, but several other options are possible, especially in the context of games.

Targets

Targets are used in order to measure the progress of an optimisation algorithm towards the optimum. This means that, every time a target precision is reached, the number of function evaluations taken is recorded. Thus, if there are any meaningful milestones, the targets should reflect that. In addition, if the problem is very hard, there should be more targets in order to be able to record progress effectively.

Line Walks

Simple tool in order to obtain a first impression of the fitness landscape. Walk through the search space in a random direction and evaluate the problem at equidistantly spaced points. Plot the results.

ELA features

Features used for exploratory landscape analysis (ELA), which can then be used to further characterise the landscape. The flacco R package provides an easy way to compute a large number of features suggested in literature.

Landscape Characteristics

Higher-level characteristics, such as the existence of plateaus, multi-modality, etc. See this paper (specifically Fig. 1) for more details on landscapes and their analysis.