A real world agent is always an open thermodynamic system, with some definite separation between the inside and outside world. In order to act, it needs to obtain some usable energy and matter from the outside world. It's also a fragile physical structure. Death can therefore come from slow depletion or sudden destruction.
In a stationary environment, simple reflex actions (in the limit just physical reactions, not even involving a nervous system) are often all that's needed to assure the agent's perenity. In non-stationary environments, some fast learning may become necessary for survival. The most distinctive feature of a MLM is its fast reinforcement learning mechanism.
To be fast, the search space of the reinforcement learning needs to be very small. This suggests a MLM architecture made of many similar reinforcement learning modules, each one dealing with very simple data. A clever pre-processing of sensory data is therefore assumed.
Formally, the M-Logic Machine (MLM) learning module is a Persistent Turing Machine with finite memories. It's an implementation of the online learning paradigm. It avoids the symbol grounding problem. Small recorded sequences of the input stream are used as "experts" that eventually provide short-term predictions. To my knowledge, it's the only existing learning mechanism based on a structure of many small cinematic memories build from the available measurement instruments. Cinematic memories are built as sequences of frames, snapshots of selected measurement results within short time intervals. The machine does not provide explanations based on a chosen set of adequate state variables, it just finds regularities among whatever is being measured and recorded. There is no domain-specific idea of what is the regularity to be expected (it just uses a set of heuristics that search for different types of general patterns), and no hidden or nominal variables are assumed in the search for any model. The machine knows when, without knowing why, or even knowing what is being measured - from the machine's viewpoint, the measurement result is all there is. The sensory data is mapped to the cinematic memory structures. The cinematic memories provide the raw data for the inference mechanism. Besides the "learning from expert advice" methods, the closest thing I found is PSR (Predictive State Representation). PSR apparently could not find efficient learning algorithms.
There is no need to provide the machine with statistical calculation tools. These calculations are implicit in the simple manipulations of the M-Logic Machine cinematic memories. Actually, the machine is prone to the probabilistic illusions of natural thought and Skinner's "superstitious learning". The recorded data is prioritized according to its success in predicting the future.
The use of a specific working memory, along with a few basic processes running in parallel, allows the emergence of complex epistemic states occurring at a given moment inside the M-Logic Machine. The epistemic states are identified by the configurations of its working memory. To know, to believe, to feel something, all can be present and combined in various grades. Just as the Turing Machine allowed a clear definition of the algorithm notion, assuming such a correspondence exists, the M-Logic Machine allows a clear definition of epistemic states. Other interesting notions, like voluntary motor actions, pain, and emotions, can also be defined. As a natural result of its architecture, voluntary motor actions are triggered before the machine knows about them.
The M-Logic Machine long-term cinematic records are made of short scenes. The length of the scenes don't vary much and the number of different measurements inside a scene is kept constant in all long-term memories. This suggests a neural basis made of many similar units with individual cinematic capabilities. Since in many situations irrelevant information will be recorded in the available channels of the cinematic memories, a cleaning process is implemented in the Prolog version that greatly improves the inference abilities of the machine. It's an internally generated hebbian learning process that strongly evokes dreaming. Memories become less accurate, but more useful.
The M-Logic machine is truly autonomous, in the sense that it starts from a state of total ignorance (i.e. an empty long-term memory) of how the surrounding world evolves and what it can do to change the sequence of events. Initial survival depends on smart sensors and efficient reflex actions (i.e. the machine may start with "implicit" or "hard-wired" knowledge resulting from evolutionary learning). After an initial period of random exploration and cinematic data recording, the machine uses the cinematic memories as a source of micro-theories ("experts", or "specialists") regarding the future continuation of present events (a type of in-life learning). Some simple continuation patterns generate beliefs about the future. These beliefs are used to drive actions and improve the machines' own perpetuation. The survival status is given by specific evaluations of the available energy and physical integrity of the agent, but many other measurements can be linked to "pleasure-pain", or "good-bad" evaluations. This allows a multitude of possible goals, all competing for the attention of the agent. The final result is similar to a behaviour-based architecture, where each behaviour can be discovered and/or improved instead of being totally pre-defined. This implements a compartmentalised approach to learning.
The M-Logic Machine is permanently forgetting its long-term cinematic records. The records that are a source of successful motor-sensory sequences will tend to dominate (i.e. they are found first in the memories linear search), while the others are ultimately pushed to oblivion. This allows implementing fast and efficient heuristics with relatively small memory sizes that quickly adapt to hostile, noisy, and changing environments.
Reflex, conditioned, and instinctive actions (these words are also given specific meanings in the M-Logic Machine architecture) have been also integrated in the M-Logic Machine. Conditioned actions are another type of in-life learning. This integration allows the study of the real need of cinematic learning to handle different problems, and how it coordinates with reflexes.
Some additional details are presented hereafter in the subpages.