Meta-reasoning & Stochastic Control

Meta-reasoning is a methodology that considers models which can take into account the computational cost of decision-making and control. Optimal decision-making is often not tractable in dynamic, uncertain and complex domains since it requires impractical computations under a bounded amount of time. So, starting from the consideration that the computational resources used to select optimal actions reduce also the utility of the result, the idea is to improve the quality of control spending some resources to decide what and how much reasoning to do as opposed to what actions to do.

Meta-reasoning tackles this problem introducing an additional level of reasoning that makes run-time decisions about the problem-solving process (or objet-level deliberation). This additional module is called "meta-level controller" and is often responsible of run-time allocation of computational resources so as to maximize the overall performance of the system. To find the best trade-off between computational resources allocation and the output quality in decision-making and control is one of the key problems of meta-reasoning; as to decide when the execution should switch between the object-level and the meta-level computations.

To use Anytime algorithms can be considered one of the best approaches to solve these problems. In such class of algorithms the quality of results improves gradually as computation time increases. A performance profile of the system's components can be obtained by using such algorithms, in order to have a representation of the run-time relationships with the expected output quality.

One of the goals of my research is to tackle meta-reasoning in stochastic control (using Markov decision processes or influence diagrams/decision networks); as to obtain good performance profiles through model checking and formal methods approaches (i.e. abstract interpretation).

The decomposition of online planning problems in dynamic enviroments can be an useful solution to alleviate their real-time computational requirements. In this work, such idea is implemented through events detection in a model that combines a mixture of continuous Markov decision processes with a mixture of Kalman filters (KFs). These two mixtures are coupled in order to use the anytime upper confidence bound for trees (UCT) algorithm to plan proactively only over the events currently detected by the KFs. The control of a simulated autonomous vehicle in a race is used to validate the proposal using the TORCS racing simulator.

In this project a bayesian framework is adopted to investigate the issue of distributed control through probabilistic inference. Starting from the consideration that usually for each sub-goal and sensory context only few local inferences are relevant, a set of subtasks are embedded in a decision network to arrange them hierarchically . This offers the opportunity to direct the choice of what to infer with higher priority, so to provide to relevant actions the best opportunity to be executed with little uncertainty, more quickly and with more up-to-date information. We validate our proposal through a pong game, a real-time task which is composed of different subtasks (e.g. hitting the ball, tracking the adversary’s moves).

One of the key issues of meta-reasoning is to find a compact representation of the performance profile of each system's component. This can be defined as a probabilistic description of the mapping between computational resources and expected output quality. This information is used by the meta-controller to decide how to allocate proactively computational resources to components. Recents results in probabilistic verification of MDPs use abstract interpretation to answer efficiently to questions as "reachability probability of certain states", "expected reward of reaching those states", "expected time for completion" and so forth. The idea here is to use these informations to solve meta-reasoning in MDPs.