Here I place the Python source files for abstract games. Python version used is 2.7.2. Comments in the source files are in english, and I hope detailed enough, so the programs can be easily understood. The Python Imaging Library (PIL) should also be downloaded from the internet for the program to work.
A Power-Point presentation (in PDF format) is also included. It presents some MLM principles, explains the decision and learning processes as they are currently implemented in Python, and gives a short description of the abstract games.
All Python files below are to be placed inside the same folder. The main file is "AMLM324Xvoluntary.py". Running this file will open a dialog window that allows several predefined settings to be tested. Three agents can interact, and some aspects of each agent can be configured:
A . We can choose the Cognitive Level for each MLM agent:
1. Level 3 uses the full MLM cinematic mechanism, so the corresponding choices are voluntary actions.
2. In this setting of abstract games, levels 2, 1, or 0 reduce the machine to a blind machine, performing random actions. For the situations tested, this is the simplest and most convenient configuration. (In other robot settings, level 2 corresponds to reflexes that will only activate when no voluntary action is present, while level 1 corresponds to basic reflex actions, reacting directly to sensory information. Level 0 encodes blind actions that do not result from any sensory information.)
3. Level minus 1 (i.e. -1) codes a fixed strategy, Win-Stay-Lose-Change (WSLC), that can also be used in most of the games.
B. Another aspect than can be configured is the amplitude of the dominance updating mechanism. This is called the dominance update factor (DUF). In the MLM, heuristics and scenes are moved up or down in lists, according to their predictive success or failure. We can set a null or positive factor. A factor 0 means that nothing is moved in the lists, and therefore the most recent scenes always dominate, while the original heuristics dominance is preserved. A factor 1 means that list items are moved up or down in the lists by some predefined amounts, while a factor 10 means items are moved by ten times the predefined amounts, etc.
C. We can also chose for each agent if its actions contribute or not to the objective state of the world. If they do not, the agent is powerless.
D. We can experiment with different sizes for the dynamic dominance long-term memory (DDLTM) of each MLM. This allows to study the influence of memory size in the machine performance.
E. We can reuse the DDLTM from previous runs (default option) or reset it to start anew. This allows us, for instance, to see how a previous cooperative setting affects the a competitive matching-pennies game.
F. Finally, we can configure the way the machine reacts to neutral situations. The machine energy level goes from 0 to 100. When its energy is high, it can act based on predictions that are neither seen as positive or negative, considering the energy reward. When its energy is low, only positive rewards are desirable. Two values can be defined. For instance, a "High Energy Shift Value" of 90 means that an increase of energy over 90 will make the machine shift to an acceptance of positive or neutral situations, while a "Low Energy Shift Value" of 50 means that an decrease of energy below 50 to will make the machine shift to an acceptance of positive situations only.
Some radio buttons allow us to choose from the different types of abstract games available. The mandatory pre-defined settings of the abstract games may override the A, B, and C, configurations chosen by the user. The games are:
Cooperation
Three agents try to reach the same game state. A fixed sequence of external actions, with a small amount of noise, is added to the agent's actions.
Matching-Pennies (2 or 3 agents)
Two agents try to reach different states. The third is not powerless by default, and shares the first agent goal. Cooperation between the first and third agent may therefore be needed to outperform the second agent.
Chicken-Dare (2 agents)
Imagine two car drivers on a head-on collision course. The one that swerves is the chicken, the one that keeps straight is the daring one. If both dare, they crash. It's the least desired result. If both swerve, it's a tie, a neutral state. If only one swerves (the chicken), it's a desired state for the daring player and a undesired state for the chicken player. In this game, the best solution is obtained when both players are configured to always accept neutral situations (i.e. setting to zero both the "High Energy Shift Value" and the "Low Energy Shift Value").
Prisonner's Dillemma (2 agents)
If both prisonners cooperate, they get a small reward. If both defect, both are punished. But the greatest reward and punishment comes when one defects and the other cooperates. Since betrayal is preferred to cooperation (although constant cooperation is better than alternating betrayals), it's harder for cooperation to become the dominant strategy. But it can still become dominant, even in the presence of noise, as long as cooperation is classified as a wanted state. A small LTM (size 10) with a large enough DUF (10) favors cooperation, because patience-in-failure becomes very small. Larger LTMs will make cooperation more difficult.
Superstitious Learning (3 independent agents)
A reward is given at fixed time intervals, and soon we find that each agent keeps the sequence of actions that preceded the reward. This situation has been reported in pigeons, and is a good confirmation of the validity of the MLM concept.
Hunter-Hunted Game (2 agents)
An extremely simple and abstract setting. Three locations are given. In two of them the prey (the hunted agent) can feed. The predator (the hunter agent) feeds if it is found in the same location as the prey. Moving around costs some energy. In this setting, the inter-predictability issue can be studied. It would appear that some randomness needs always to be maintained both by the prey and the predator. But we find that a combination of prey-predator actions can be preserved and repeated as long as both the predator and the prey can survive.
Minority Game
Two locations are given, and, at each turn, the three agents can choose their location. The agent found alone in one of the two locations wins, while the two others found together in the other location lose.
Four-Armed Bandit
All agents are powerless, but for the fact that agent 1 can choose the bandit lever. The four levers (0,1,2,3) have different winning probabilities: 0.3, 0.5, 0.7, 0.9. The agent's task is to find the best lever.
Iowa Gambling Task
A game similar to the four-armed bandit, but with a much more complex reward structure. Agent one selects a card deck among four card decks, say A, B, C, D. Deck A gives a -20 reward with 0.5 probability, and a +10 reward with a 0.5 probability. Deck B gives a -120 reward with 0.1 probability, and a +10 reward with a 0.9 probability. Deck C gives a +5 reward with 0.5 probability, and and a zero reward otherwise. Deck D gives a -20 reward with 0.1 probability, and a +5 reward with a 0.9 probability.
When the Python program starts, it displays detailed information about the first twenty moves of the iterated game. The actions of the three agents are coded with numbers. Values for noise and fixed sequences are also indicated, when they exist. The action values are then multiplied by some factors (for powerless agents the factor is zero), and everything is added up (modulo the allowed number of world sates) in order to obtain the world state in the last column. The only exception is the bandit game, where the action defines the probability distribution of a random result.
The same information is also displayed at the end of the game run, for the last 24 moves.
Some statistical information regarding the type of actions is also displayed: How many actions resulted from beliefs, how often did they predict correctly, how many actions were just instinctive, etc.
A set of graphs (one for each agent) is generated at the end of the game run. In each graph, four coloured lines are drawn. The first three are discounted predictive scores (i.e. dynamically measuring how often the predictions were correct) for [1] the MLM cinematic mechanism, [2] the MLM global process that includes all reflex and random actions, and [3] a random prediction of states that is used as a comparison criterion to the MLM performance regarding predictive accuracy.
The fourth colored line (trapezoidal in shape) refers to the agent's accumulated energy (this can also be seen as the accumulated objective reward). It is bounded by a maximum energy value. When the machine goes under a minimum energy value, it requires a rescue. The number of needed rescues is an indication of the agent's environmental intelligence in a hostile world.
Two horizontal black lines (above and below the level 50 in the graph) are plotted. The line above level 50 is plotted when predictions are generated by the cinematic mechanism, resulting in voluntary actions; the line below level 50 is plotted when reflex or random actions are performed.
In the graphs are also indicated the number of world states in the world model, and how the measurement states are partitioned (pS means positive, xpS highly positive; nS means negative and xnS highly negative). Other things indicated are the cognitive level of the agent, the size of the LTM reached, when it became full, etc.