WP5 Focus

(from the proposal)

Work Package WP5 focuses on the fundamental question: how does an AI agent decide and learn on how to act? More precisely, in Theme H2 (WP4), the AI agent is learning how the world works, and it reasons on it to better understand its properties. The present theme V1 aims at empowering the agent with the ability of deliberating autonomously (i.e., without human intervention) how to act in the world. That is reasoning on the effects of its actions, learning from past experiences (or simulation of experiences), as well as monitoring the actual outcome of its actions, learning possibly unexpected outcomes, and again reasoning and learning how to deal with such new outcomes. Crucially, empowering an AI agent with the ability to self-deliberate its own behavior and act autonomously, carries significant risks and therefore we must be able to balance such power with safety. This means that the autonomy of the agent must be guarded by human guided specifications and oversight, to make it verifiable and comprehensible in human terms and ultimately trustworthy (cf. WP3-H1, D2).

Currently, there is a widespread recognition in the AI and CS communities that building systems that act and deliberate how to act autonomously is strategically crucial. However, we are currently stuck between two extremes: either we provide pre-programmed solutions, which is not cost effective or even feasible for certain applications, or we try to exploit progress in deep machine learning to produce solutions, which is not always feasible and often it means that we give up human comprehensibility and trustworthiness. In the TAILOR we aim at realizing self-deliberating and autonomous systems by leveraging the European Scientific competences on Planning and Knowledge Representation and Reasoning (KR) [43] , as well as, on the deep competences on Learning and Optimization. Specifically we aim at addressing the following major challenges:

Novel models of the world dynamics and agent tasks (see WP5 Task T5.1). Planning and KR are based on models, and this allows for explanations, which can be generated from queries over the model of the world, the task, and the plan itself [44 45]. These queries are essentially based on verification and forms of model checking . However, Planning and KR need to be extended in several directions to become instrumental for the kind of aims of the Project. On the one hand, we need more powerful task specification formalisms. In particular, we need to specify complex tasks, involving specific sequencing of activities, moving from one stage of the process to the next, maintaining of certain properties all along etc. In a nutshell, we need to specify arbitrary tasks in formalisms such as temporal logics or logics of programs, which are the ones used in model checking and reactive synthesis in Formal Methods [46]. On other hand, we need more powerful model specification formalisms which are fully realistic. In particular we need to develop, learn and reason on models that giving up Markovian assumptions, handle first-order representations, allow for multi-facet view of the domain of interest at different levels of abstraction, allow for tolerant models and tolerant plans (plans that work in a reference model + variations) [47].

New generation of solvers (see WP5 Task T5.3). The research work of several decades in Planning and Reasoning about Actions in KR has led to the development of a sort of science of algorithms, which has allowed us to confine the inherent complexity of self-deliberation and program-synthesis within a set of hard instances, while keeping most cases of practical interest efficiently solvable [48]. Since we expect agents to require, in most cases, solving non-puzzle-like problems from well-behaved classes, we can exploit this body of knowledge to develop effective algorithms and heuristics to handle generalized forms of Planning [49]. Moreover, we can expect major advancements from studies on how to inject learning into reasoners and planners and vice versa, including, on the one hand, learning how to process the models to facilitate reasoning and planning and how to do learning that will improve the solving of problems, and, on the other hand, adopting reasoners, planners, and symbolic models to drive the learning. Of particular interest is to study foundations and methods for (a) learning procedural control knowledge and reformulation of problem representations to improve problem solving (b) modelling planning problem solving as a learning task.

Integrate data-based methods with model-based methods in deciding and learning how to act (see WP5 task T5.2). A crucial scientific question is how we can learn the models from data (see WP4). Here, in particular, we are interested in learning the type of first-order symbolic models that are commonly used in Planning, which most often, have to be constructed by hand. By showing how to learn meaningful, symbolic models from raw perceptions, the work in WP5 aims at integrating the benefits of learners and planners, where representations play a key role in expressing, communicating, achieving, and recognizing goals [50]. The problem of rapresentation learning for planning is largeley unsolved, and current ideas and methods prove to be inadequate . Indeed, two characteristics of deep reinforcement learning that have to do with its successes and its failures are its ability to deal with high dimensional perceptual spaces from scratch without prior knowledge, combined with its inability to use or produce such such knowledge when solving related tasks [52]. The construction of reusable knowledge from experience (transfer learning), see also WP7, has been a central concern in reinforcement learning [53] and in recent work in deep reinforcement learning [54], but the semantic and conceptual gap between the low level techniques that are used (neural network architectures and loss functions) and the high-level representations that are required (first-orde-representations involving objects and relations), remains just too large [55]. Within WP5 we will develop the formulations and algorithms for showing how first-order symbolic representations of world dynamics involving objects and relations can be learned automatically from data without using any prior symbolic knowledge (see WP4). Unlike current work in deep reinforcement learning, these representations will not be expected to emerge bottom-up from the learning process but will be forced top-down. We know indeed the structure of the first-order representations that are used in planning and the benefits that they have: they can be used to attain a variety of compound goals (compositionality), can be reused easily in a variety of problems (transfer), and can be queried at a high level of abstraction (transparency). There is thus no need to re-discover the structure of these representations nor to learn alternative ones that lack these properties. The challenge is to learn them from data. A related major research task to study how to monitor and control actions (see WP5, task T5.4), so as to update and correct imperfect models through data learned during the execution; mixing prior human knowledge with learning from data, handle grey-box systems with the ultimate aim of allow for human control and oversight, to make AI-based deliberation and acting safe and trustworthy (cf. WP3-H1, D2).

References:

[43] It is widely recognized that in Europe has the best scientists in the world both in Planning and in KR.
[44] Langley, Meadows, et al, AAAI 2017; Chakraborti, Kulkarni, et al., ICAPS 2019; Fox, Longet al., XAIP 2017.
[45] Clarke, Grumberg, & Peled.MIT Press 1999; Lomuscio, Qu, Raimondi, STTT 2017.
[46] Pnueli & Rosner, POPL 1989; Ehlers, Lafortune et al, Discrete Event Dynamic Systems 2017; Gerstacker, Klein, Finkbeiner, ATVA 2018; Meyer, Sickert, Luttenberger, CAV 2018; Kress-Gazit, et al, Annual Review of Control, Robotics, and Autonomous Systems, 2018;De Giacomo, Rubin, IJCAI 2018.
[47] Brafman, De Giacomo, IJCAI 2019; Brafman, De Giacomo, Patrizi, AAAI 2018; Calvanese, De Giacomo, Montali, Patrizi, Inf. Comput. 2018; Aminof, De Giacomo, Murano, Rubin, ICAPS 2019; De Giacomo, Iocchi, Favorito, Patrizi,, ICAPS 2019; Bonet, Francès, Geffner, AAAI 2019; Bonet, Geffner, CoRRabs/1909.05546, 2019
[48] Pommerening, Helmert, Bonet, AAAI 2017; Steinmetz, Hoffmann, AIJ 2017; Lipovetzky, Geffner, AAAI 2017; De Giacomo, Maggi, Marrella, Patrizi, AAAI 2017.
[49] De Giacomo, Vardi, IJCAI 2013 & IJCAI 2015; Camacho, Triantafillou et al, AAAI 2017; Camacho, Baier et al, ICAPS 2018; De Giacomo, Rubin, IJCAI 2018.
[50] Schank, Abelson, Lawrence Earlbaum 1977; Cohen, Levesque, AIJ 1990; Ramírez, Geffner, IJCAI 2009; Pezzulo, Castelfranchi, Psychol. Res. 2009; Geffner, Wiley Interdiscip. Rev. Cogn. Sci. 2013; Seligman, Railtonet al, OUP 2016.
[51] Mnih et al, Nature 2016; Silver, et al, Nature 2017; Silver, Hubert et al, Science 2018; Chevalier-Boisvert, Bahdanau et al, ICLR 2019; Pearl, arXiv:1801.04016 2018; Darwiche, Commun. ACM 2018; Geffner, IJCAI 2018;
[52] Lake, Ullman et al, BBS 2017; Marcus, arXiv:1801.00631 2018; Marcus, arXiv:1801.05667 2018.
[53] Taylor, Stone, AI Magazine 2011; Lazaric, Reinforcement Learning, 2012.
[54] Gupta, Devin et al, ICLR 2017; Barreto, Borsaet al, arXiv:1901.10964 2019.
[55] Asai, Fukunaga, AAAI 2018; Thomas, Bengio et al, arXiv:1802.09484 2018; François-Lavet, Bengio et al, AAAI 2019; Asai, ICAPS 2019; Garnelo, Shanahan, Curr. Opin. Behav. Sci. 2019.