Mix and Match

Many games contain a number of opponents and normally the computer must provide the behaviour for most of them. One way in which this might be done is for each opponent to perceive its environment and then execute a rule suitable for that situation. This is the task of the classifier system used here.

Below is a Java demonstration in which each individual tank competes against the others within a simulated arena. A single click on the applet will bring up a list of controls. Further down there is a more detailed description of the environment in which they perform. The page banner shows a rendered 3D tank, it can be toggled between on and off with a single mouse click. The applet below can provide a list of control parameters

Java applet cannot run. Replacing with screenshot.

Environment

The environment or world in which the combat is played out is a simple one. There are three sorts or objects - the tanks, the shells (bullets) they fire, and walls. Walls form a solid barrier to both moving objects and the tank's radar senses. Their location is downloaded at the beginning of the simulation and after processing the walls are stored in a binary space partition tree to speed collision detection.

Tanks

The tank body consists of a chassis and gun turret, movement is provided by two treads. There are four actions which can be performed by the tank: the chaise may be turned, the tank may move forward, the turret may be turned, and the gun may be fired. When the gun is fired a shell is created and introduced into the environment. The shell will progress until impact with a wall, a tank, or another shell. If the target is another tank then the attacker will be credited extra time and the target debited. Both chaise and turret are equipped with "radar" sensors which can probe a specified view field to a specified distance. Such a probe will only be able to report the presence or absence of tanks, their exact location is not available. Additional sensors can report on shell or collision damage but have the same limitation as radar, they only accurate to a limited view field. Wall sensors allow rudimentary obstacle avoidance behaviour.

Control System

Each tank has its own "brain" which is called a classifier system. The tank's environment is quantified through its senses to just 32 binary bits, this is its current status. Behaviour comes from execution of a number of 128 bit rules. A single rule is divided into four 32 bit components - the input mask, the input map, the output mask, and the output map. Each particular bit corresponds to a particular function or sensation. For instance bit 0 is called "forwardLeftTread" and when set high the left tread will move forward. Bit 17 is called "tankRightTurret" and is set high when a tank is visible to the right of the turret. To find an appropriate rule the current environment status is compared against the input map but the input mask is used as a filter, allowing some factors to be ignored. When a rule has been chosen the status is modified by the output map but again a filter is used, this time the output mask. After making its decision the tank chosen actions are carried out.

Learning

An individual tank is incapable of learning however those with better rule sets and radar settings are more likely to survive. Of all the tanks in any combat most will be allowed one offspring in the next round but the first looser will have none and the final winner will have two. In this way those best fit to survive will come to make up more and more of the population. Each offspring of a tank is mutated slightly. Many different changes can occur, radar setting may be widened or deepened, the classifier rules may be duplicated, deleted, or mutated. This situation brings about a change very similar to natural selection. Systems such as this are called genetic algorithms and play an important part in artificial life.

First Attempt

The initial version of Think Tank used similar evolutionary principles to the current one but results were disappointing. There were two driving forces to the tanks development.

The first was energy. Inside the arena there was a fixed quantity of energy, some concentrated in each tank and some dispersed around the environment. A tank was able to collect a certain percentage of this global energy in each cycle. Any energy that the tank expended was released into the environment and became available to the others. Shooting also required energy but upon a successful hit the attacker would absorb energy directly from the target. The intention had been to encourage evolution to conserve energy where possible but still hunt and kill each other. But, because of the difficulty in guessing the correct energy costs for each action, the result of each evolution was the same. Stationary tanks that did nothing, expended no energy, and would therefore never die. Energy proved to be a poor measure of success and the entire energy system was abandoned.

The second driving force was hit ratings and hit ratios. The current round system was not in place and when a tank died its replacement was immediately generated. To pick the winner at any particular moment each tank gunning prowess was examined. The hit ration compared the number of shells fired with the number which hit their target. Unfortunately this selector evolved tanks who could not afford to miss and so only fired from point blank range. The hit rating tried to correct this by counting only successful shots but this developed tanks who would try for any shot, a tank that always fires can count on hitting something, even if only by accident.

Another aspect of the original implementation which has been lost is a range of sensors available to the tanks. There were three sensors available each with a different cost:

proximity alarm - cheap but inaccurate
tracker - direction of tank tracks on ground
radar - controllable, precise, expensive

All but radar has since been dropped. This simplification has allowed the same code to be used in other demonstrations.

Second Attempt

Here the control system was modified to try and cope with some of the excess of the previous attempt. The requirement for each rule to have a complementary "switch off" rule was dropped. For instance when the gun was activated it would fire a single shot and then stop. Also in circumstances in which no applicable rule could be found a random action was forced to help the tank passed an unfamiliar situation. This new combination was also unsuccessful with tanks developing the tendency to jitter about and regularly fire at nothing.

Finally

Work on this demonstration has been suspended. I still believe that the concept of learning systems optimising behaviour for game play but understand the difficulty of controlling the direction of adaption. In many setting a simple policy of continuous shooting is the most profitable but not the most interesting behaviour. Selection should not be based on what benefits each tank best but on what gives the most player friendly behaviour.

Here is the Java source for the final attempt:

Credits

Thanks to Richard Jobling for the tank model which can be seen in the banner.