FALL 2021
Friday, December 3rd , 2021
Glen Berseth
Recording: TBD
Developing Autonomous Agents to Learn and Plan in the Real World
While humans plan and solve tasks with ease, simulated and robotics agents struggle to reproduce the same fidelity, robustness and skill. For example, humans can grow to perform incredible gymnastics, prove that black holes exist, and produce works of art, all starting from the same base learning system. If we can design an agent with a similar learning capability, the agent can acquire skills through experience, without the need for expertly constructed planning systems or supervision. In this talk, I present a series of developments on current reinforcement learning methods in the area of long-term planning and learning autonomously without human guidance or supervision. I show how modularity and policy reuse can be used to address challenges in long-horizon planning. Still, learning using current RL methods requires types of supervision that are easy to come by in simulation but are expensive in open and real worlds. I will discuss how to develop more versatile learning agents that do not require expensive or unrealistic data constraints. Last, in an effort to create agents that learn general-purpose skills I present an unsupervised reinforcement learning objective for encouraging sophisticated control over the environment.
SUBGOAL SEARCH
Planning and search have been immensely studied topics in AI research. However, many questions still stay unanswered. In my talk, I will explore topics on the intersection of planning and learning. A general question is how learning methods can be used to make planning more efficient or, conversely, how planning can mitigate errors in learned models.
I will present a stream of hierarchical research in which planning is performed on a coarse-grained temporal resolution. Intuitively, the hope is that search space is tamer when one could jump a few steps ahead. After a general introduction, I will show a particular implementation of hierarchical planning developed in our new paper: “Subgoal Search For Complex Reasoning Tasks”, NeurIPS 21.
Silly rules enhance learning of compliance and enforcement behavior in artificial agents
How do societies learn and maintain social norms? Here we use multi-agent reinforcement learning to investigate the learning dynamics of enforcement and compliance behaviors. Artificial agents populate a foraging environment and need to learn to avoid a poisonous berry. Agents learn to avoid eating poisonous berries better when doing so is taboo, meaning the behavior is punished by other agents. The taboo helps overcome a credit-assignment problem in discovering delayed health effects. By probing what individual agents have learned, we demonstrate that normative behavior is socially interdependent. Learning rule compliance builds upon other agents having learned rule enforcement beforehand. Critically, introducing an additional taboo, which results in punishment for eating a harmless berry, further improves overall returns. This "silly rule" counterintuitively has a positive effect because it gives agents more practice in learning rule enforcement. Our results highlight the benefit of employing a computational model that allows open-ended learning and contribute to our efforts to build AI systems and the normative infrastructure needed to ensure alignment with human values.
Developmental AI: Machines that learn like children
Current approaches to AI and machine learning are still fundamentally limited in comparison with the amazing learning capabilities of children. What is remarkable is not that some children become world champions in certains games or specialties: it is rather their autonomy, open-endedness, flexibility and efficiency at learning many everyday skills under strongly limited resources of time, computation and energy. And they do not need the intervention of an engineer for each new task (e.g. they do not need someone to provide a new task specific reward function or representation).
I will present a research program, which I call Developmental AI, that studies models of open-ended development and learning. These models are used as tools to help us understand better how children learn, as well as to build machines that learn like children with applications in educational technologies, automated discovery, robotics and human-computer interaction. I will ground this research program into several fundamental ideas proposed by developmental psychologists:
1) the child is autotelic, setting its own goals, spontaneously exploring the world like a curious little scientist while self-organizing it learning curriculum (e.g. Piaget, Berlyne);
2) intelligence develops in a social context, where language and culture are internalized to become cognitive tools (e.g. Vygostky and Bruner);
3) intelligence is embodied and develops through self-organization of the dynamical system formed by the brain-body-environment interactions (e.g. Thelen and Smith).
I will show how, together with many colleagues and students, we have worked on operationalizing these ideas in computer and robotic models.
I will explain how this has enabled to advance child development understanding and is now opening new possibilities for AI systems.
In particular, I will review our recent work on:
- Autotelic deep reinforcement learning, where agents learn to represent and sample their own goals towards open-ended learning
- The learning progress hypothesis, enabling efficient automatic curriculum learning in machines and humans
- Self-organization of developmental trajectories replicating fundamental dynamics of human sensorimotor learning
- Vygotskian AI, with the IMAGINE system where deep RL agents leverage language as a cognitive tool for creative exploration
The speechbrain project
SpeechBrain is an open-source and all-in-one conversational AI toolkit. It is designed to facilitate the research and development of speech and language technologies by being simple, flexible, user-friendly, and well-documented. This talk describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel conversational AI pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular datasets, as well as tutorials that allow anyone with basic Python proficiency to familiarize themselves with speech technologies. The talk will also discuss the future research direction for this project and share some ideas for the future development of intelligent speaking machines.
Foundations of GFLOWNETS
GFlowNets have recently emerged at Mila from the setup of generative active learning to find good molecules, where we want to train a generative policy to generate a composite candidate object (like a molecular graph) with probability proportional to a given reward function (an already trained proxy for the experimental goodness and epistemic uncertainty of that candidate). Systems which can do that well could be applied to discover new drugs, but also new materials, to control plants or learn how to reason and build a causal model, as I will argue. In the above setup, we could in principle use MCMC methods but they fail in many of the cases of interest in AI, with highly-separated modes occupying a tiny volume in a high-dimensional space. Instead, similarly to the anaology between encoders VAEs and traditional unamortized variational inference, GFlowNets amortize the work done by MCMC and exploit the power of generalization to guess (and generate from) yet unvisited modes. It turns out that this formalism also opens the door to fascinating possibilities for probabilistic modeling, including training of compositional energy-based models with latent variables, the ability to quickly estimate marginalized probabilities and efficiently represent distributions over sets and graphs, and estimate entropies, conditional entropies, or mutual information. The ability to generate partial explanatory graphs (marginalizing over larger supergraphs) could be a key tool to enable the kind of higher-level cognition, reasoning, planning and long-term credit assignment which seems much easier to humans with their conscious and attentive processing and have motivated recent work to extend deep learning to achieve better out-of-distribution generalization.
Brain Machine Interfacing: frontiers, next steps, and challenges for AI
Direct interaction between machines and brains allows to read information directly from neuronal circuits, and to repair them when they are broken. Neural prosthetic devices known as Brain-Machine Interfaces (BMI) are under rapid development, and have the potential to profoundly impact how we treat brain injuries and pathologies, how we study the brain's learning and computing mechanisms, and ultimately, how we interact with AI. While BMI hardware development is rapidly advancing, a number of technical and algorithmic challenges arise to truly leverage high-throughput interaction with the nervous system. In this talk, I will present an overview of these challenges, and present some research efforts to meet them. This presentation is meant to promote a discussion around this topic, and to inform Mila members of ongoing activity in this space.
Program Reasoning: An Interesting AI Challenge
Software is ubiquitous in our daily lives, and our society ranging from healthcare to entertainment increasingly depends on its well-functioning. The enormous rise in the scale, scope, and complexity of modern software has led to widespread interest in program reasoning tools. Despite great successes in industry, both designing and using these program reasoning tools are non-trivial. Their designers must carefully customize heuristics or analysis rules for each codebase in order to achieve usable accuracy and scalability. Moreover, their users often have to provide correct annotations and inspect a large number of reports. Furthermore, the slow feedback cycle between users and designers hinders the usability of these reasoning tools in the modern software CI/CD cycle. Can program reasoning algorithms be effectively learned and automatically improved over time? In this talk, I will give an overview of our group research on using machine learning-based techniques to address various program reasoning challenges, specifically, program analysis, program synthesis, and formal verification. I will also discuss some new opportunities of applying program reasoning ideas for better machine learning, especially in terms of interpretability and learning efficiency.
Prof Series - Spotlight Talks
Talk 1 by Blake Richards - How might bias and variance in the brain's gradient estimates affect learning?
Abstract: Gradient descent is a central tool in machine learning, in large part because it scales well to large networks and complicated problems. As such, computational neuroscientists are exploring potential means by which the real brain may approximate gradient descent. However, it is unlikely that real brains could ever achieve perfect gradient descent, and it is likely that any gradient estimates used by real brains would have both bias and variance. We have been exploring how bias and variance affect learning in order to understand the implications for gradient estimates in real brains. Through analysis and simulation, we show that bias is less problematic very early in learning, but can have a negative impact as when the loss gets lower. Meanwhile, we find that variance can be problematic in small networks, but becomes less problematic as network size increases. These findings suggest that for real brains, which are both innately wired with some initial capabilities and very large, bias is likely to be problematic, but variance less so. This implies that real brains should use unbiased gradient estimators, even if those estimators have high variance.
Talk 2 by Irina Rish - Making AI Robust and Versatile: a Path to AGI?
Abstract: Modern AI systems have achieved impressive results in many specific domains, from image and speech recognition to natural language processing and mastering complex games such as chess and Go. However, they often remain inflexible, fragile and narrow, unable to continually adapt to a wide range of changing environments and novel tasks without "catastrophically forgetting" what they have learned before, to infer higher-order abstractions allowing for systematic generalization to out-of-distribution data, and to achieve the level of robustness necessary to "survive" various perturbations in their environment - a natural property of most biological intelligent systems, and a necessary property for successfully deploying AI systems in real-life applications. In this talk, we will provide a brief overview of some modern approaches towards making AI more “broad” (versatile) and robust, including transfer learning, domain generalization, invariance principle in causality, adversarial robustness and continual learning. Furthermore, we briefly discuss the role of scale, and summarize recent advances in training large-scale unsupervised models, such as GPT-3, CLIP, DALL-e, which demonstrate remarkable improvements in transfer, both forward (few-shot generalization to novel tasks) and backward (alleviating catastrophic forgetting). We also emphasize the importance of developing an empirical science of AI behaviors, and focus on rapidly expanding field of neural scaling laws, which allow us to better compare and extrapolate behavior of various algorithms and models with increasing amounts of data, model size and computational resources.
Talk 3 by Danilo Bzdok - Markov decision processes and deep biological neural network layers in humans
Abstract: Which brain function could be important enough for the existence and survival of the human species to justify constantly high energy costs? The default mode network (DMN) is believed to subserve the baseline mental activity in humans. Its higher energy consumption compared to other brain networks and its intimate coupling with conscious awareness are both pointing to an unknown overarching function. Many research streams speak in favor of an evolutionarily adaptive role in envisioning experience to anticipate the future. In the present work, we propose a process model that tries to explain how the DMN may implement continuous evaluation and prediction of the environment to guide behavior. The main purpose of DMN activity, we argue, may be described by Markov decision processes that optimize action policies via value estimates through vicarious trial and error.The default mode network (DMN) is believed to subserve the baseline mental activity in humans. Its higher energy consumption compared to other brain networks and its intimate coupling with conscious awareness are both pointing to an unknown overarching function. Many research streams speak in favor of an evolutionarily adaptive role in envisioning experience to anticipate the future. In the present work, we propose a process model that tries to explain how the DMN may implement continuous evaluation and prediction of the environment to guide behavior.