Information Theory & Stochastic Control

Because of its universality, the application of information theory produced a wide range of results far beyond its original usage for communication systems. One of these applications concerns the close relationship between Shannon's model of communication and control systems. In fact, as in a communication channel, where the input messages sent by the transmitter are transformed into output messages taken by the receiver; a controller can be considered as an actuation channel that acts starting from its initial input state in order to reach an output goal state.

This analogy can be used to establish fundamental limits of controllability and observability in control systems, or it allows to compute the minimum amount of state information necessary to act in order to achieve a certain level of performance.

Moreover, the communication channel model of control can be exploited to improve algorithms used to compute policies of stochastic control problems with partial observability. A partial observable Markov decision process (POMDP) is a sequential stochastic control model where there is an incomplete knowledge of the system's state, which can be only estimated through the update of a belief state distribution. Because of their partial observability in POMDP it is important to find the proper trade-off between utility optimization and reduction of the information processing cost used to decrease state uncertainty as a consequence of future action sequences.

Within this framework, my research is concerned with the usage of information measures, as belief-state entropy and action entropy, to compute policies that depend on the degree of uncertainty of the agent and to allow gain of information. It is important to underline that it is not necessary to use all the available information to obtain a certain performance level; hence the concept of relevant information is used to decrease the cost associated with information-gathering actions (i.e. epistemic actions) leaving out potentially useless computations.

In this work, we propose a definition of epistemic actions for partial observable Markov decision process. These are used by a model-free and memory-free reinforcement learning agent in an experiment with an integrated arm-eye robot based on the attention-for-action principle. The results show that the architecture can use all of its effectors to execute epistemic actions and it can exploit their informational value to efficiently achieve its task.

What is the minimal amount of state-information that has to be used to achieve a certain level of performance in a control task? The answer to this question can be obtained using the concept of relevant information: that is the information used by a policy that minimizes the mutual information between its actions and the corresponding visited states. Starting from the information bottleneck method, such policy can be computed interleaving the dynamic programming and Blahut-Arimoto algorithms.

Here the optimal utility/information trade-off is investigated for the control of an unstable continuous system (inverted pole balancing) for several level of performance.

Starting from the definition of relevant information (i.e. the minimum amount of information needed to act optimally), we propose an algorithm to solve partially observable Markov decision processes that drives the information-gathering towards the collection of "relevant observations". These can be defined as the observations that unveil only relevant information about the state space. This approach can be used to reduce the computational complexity of information gathering, through the definition of a belief update's termination criterion that leads to the maximum affordable amount of uncertainty for the belief state distribution.

Page updated

Google Sites

Report abuse