MLM Typical Problems Addressed

To get an idea of the type of problems easily addressed by the M-Logic Machine inference mechanism, a few typical problems are presented here. They are typical because long-term recorded scenes are assumed to be similar in size, with a small and constant number of channels in each frame. The nature of the measurements that are recorded is irrelevant. The same basic startegies used hereafter can be applied to a multitude of specific situations. Actually, elephants don't play chess or matching-coin games. But they play some other games, because it trains their needed predictive abilities in a safer setting. The necessary information to handle the problems must be simple enough to be recordable within the available channels, but the remaining channels (if any) may often record data unrelated to the problem. At least two frame channels must be dedicated (meaning a specific mapping is consistently used by the heuristics accross all sensory modes): one to record the frame pleasure/pain evaluation (the "driving channels"), and the other to record the measurement that monitors the triggered actions (the "actuactor channels"). Notice that the notion of pleasure and pain is operational. Pain is just like any other measurement, except for the fact that it triggers specific motor responses in the machine by means of an heuristic. Other channels are used to better discriminate the frames (the "situation channels").

If an unbalanced coin is flipped and the probability is biased towards one of the possible results, say tails, the best strategy is to stick our guess to the most frequent result, and keep betting tails. If, on the contrary, the probability of getting heads or tails is identical then the only hope for some valuable prediction is that certain sequences of results are more probable than others. A trivial example is a sequence where heads and tails alternate. Another valuable strategy is to find a correlation in time with some other coin. For instance, if the result of a coin is always equal to the previous flip result of another coin, we are able to guess the result after we discovered the relation among the two coins. The Prolog implementation M-Logic Machine dominance mechanism integrates in a single procedure the search for these three types of regularities in the incoming data, and tends to keep betting on the one that is most rewarding. This is achieved without the need to evaluate or manipulate probabilities.

The incoming data recorded in the cinematic memories must include the "right" data needed to achieve an useful prediction. The attention of the agent needs to be oriented to the relevant measurements (i.e. an adequate sensory mode). Heuristics are thus needed to search for the best sensory mode in a given situation. These heuristics result from evolutionary learning. This is a topic under current research.

The sensory-motor inference mechanism uses past experience to find good motor continuations to the present moment. If there is a purely reflexive answer hardwired in the agent, it will be used instead. A prior evolutionary setting is assumed where bad reflex actions were randomly generated and eliminated. The best rate for frame generation is constrained by the time constants of the sensory-motor dynamics, and is also assumed to be the result of evolutionary learning.

1. Matching Coins

The first problem is a version of the Matching-Coins game: two players, Alice and Bob, hyde in their right hand one or two coins. When the number of coins of each player is disclosed, Alice wins if they both have one or two coins, and Bob wins if the number is different.

1.1 Random Biased Version (implemented in Prolog and Python 2.7)

Let us assume a M-Logic Machine (let's call it Arthur) with cinematic memories with six channels in each frame. Each round will be recorded in a single frame. One dedicated channel records the pain-pleasure evaluation of the frame. Another dedicated channel records the actions of the machine. Two of the remaining four channel will record irrelevant information - for the purpose of the game, it's just noise - and two others will be used tor record the number of coins presented by each player.

The action rule for Bob is to choose the number of coins at random, but with some bias: for instance, 80% of the time he will play one coin, 20% of the time two coins. After a certain time, Bob will change the rule to one coin 20% of the time and two coins 80% of the time. The shift of the action rule is random, with, for instance, a 2% shift probability per round. These action rules, together with the noise recorded in two of Arthur's frame channels, make the world Arthur sees a noisy and changing place.

Arthur starts with some amount of coins, say 200. If he loses, he pays two coins. If he wins, he receives one coin. Arthur also loses if its actions fail to meet the requirements of the game (taking actions that will keep its hands empty, for instance). When Arthur runs out of coins, he is out of the game. This makes the world a hostile place, since pure random choices of Arthur will make him lose all the coins.

Arthur's implicit goal is to stay in the game. No explicit goal is given, except for the search for pleasant futures and the avoidance of future pain, based on fast and frugal heuristics that generate beliefs for the future. To situate Arthur in this game, pleasure and pain will be associated to gaining and losing coins (without an intelligent designer, the right correspondence is to be found by trial and error, but that is not an in-life learning problem). No background information about the rules of the game is given. The undedicated four channels are not specifically assigned to noise or relevant information. Arthur will have to figure out what channels are relevant and what channels are not.

1.2. Pattern Recognition (implemented in Prolog and Python 3.4)

The same as above, but now Bob plays according to some fixed pattern that is changed from time to time. For instance, the sequence "111222111222..." randomly alternating with "122121112122121112..." Notice that the M-Logic Machine does not reach (or seek) predictive perfection.

1.3. Random Unbiased Version With Channel Correlations (implemented in Prolog and Python 3.4)

Can Arthur survive if Bob plays randomly 50% one coin and 50% two coins? It may still be possible, even if the game is just as hostile as above, if Bob plays according to the noise values recorded in some prior frame, say two frames before. Arthur will need to figure out that short-term correlation between present and past random events in order to make correct predictions and stay in the game.

1.4.Mutual Search for Patterns (implemented in Python)

What happens when we place three MLM machines playing against each other? As expected, no stable pattern is produced by any MLM, because this could be used by the two other machines to better predict and win.

2. Prisonner's Dilemma (implemented in Python)

This is a classical problem, where two prisonners can cooperate (C) or defect (D). According to game theory, the best choice is DD, where both prisonner's defect. In iterated games, defecting often becomes the stable choice, because the tit-for-tat (TFT) strategy is found to be most efficient. But cooperation is often found in living agents, and this seeming irrationality is called the paradox of cooperation. The MLM machine does not work with any predefined strategy or set of strategies, like TFT, AC (always cooperate), AD (always defect), or WSLC (win-stay, loose change). But if the CC choice is classified as a desirable state, just like DC for the first agent and CD for the second, the MLM will sabilize in cooperation.

3. Minority Game (implemented in Python)

A minimal setting for this game is implemented. At each turn, given two rooms an three players, each player can choose a room to stay. The player that stays alone in one room is the winner. In this game, any two players with a not too complex fixed strategy will loose against the MLM.

4. Robot Driving

The MLM memorizes the sequences of actions that lead to desired sensory states. If the world is regular enough, many of these sequences can be successfully reused for motor decisions, while the unreliable ones are gradually forgotten.

Page updated

Google Sites

Report abuse