Topic Area

MCS19: Machine Common Sense: An Embodied and Developmental Paradigm

Topic leaders

Invited guests


We plan in this Telluride Workshop to develop embodied computational models and algorithms and educate the participants with the developmental psychological theories that center around the notion of actions and their consequences to achieve machine common sense. Over the last decade, we have observed the great success of deep learning and other data-driven (millions or billions of annotated or weakly-annotated data entries needed), computational heavy (high-end server arrays running with days and months, even years of training) approaches to mimic the intelligent behaviors (such as, object recognition, autonomous navigation, etc.) of human beings (homo sapiens). However, these machine intelligent behaviors are still far from human intelligence and, more importantly, lack common sense.

This lack of machine common sense (MCS) is a major concern for Artificial Intelligence (AI) and includes a series of technical and theoretical challenges. Traditionally, machine common sense has been studied with an “object-centric” view, which performs increasingly well under conventional supervised settings and on narrow performance tasks. The projects we will conduct this year center around a more general machine common sense framework through an embodied computational architecture that can recover innate concepts and build and learn new concepts based on the existing ones. This type of framework would more closely approximate the active and constructive processes underlying early human learning [Byr14, Gop12]. This is very challenging for conventional models, where the innate concepts are pre-defined and static, disconnected from their embodied senses. We have partially overcome these challenges by 1) identifying a critical goal of MCS as the prediction of (real or imagined) action consequences and treating the interpretation of observations of the scene as a process of parsing the continuous dynamic stream into a few entities [Sum12, Yan13, Yan14, Yan15]. These are the protagonists (or constituents) of the action: static components such as the objects and their attributes, and dynamic components such the motion of objects and actions of animate objects (a.k.a agents); 2) taking advice from Developmental Psychology that calls for innate concepts, such as “objecthood”, “agency”, “numerosity”, “contact”, left-right and spatial prepositions, and others, to create a computational infrastructure that is bio-inspired, supporting abstract reasoning and causal inference.

The goal of this project:

We will develop computational models and algorithms and study developmental psychological theories that center around the notion of actions and their consequences. The computational models and developmental psychological theories will be studied via two sub-projects (developing the core knowledge and the concepts learning) and one education project on developmental psychology, interconnected through a centralized “action-centric” MCS knowledge building activity.

For the sub-project 1, we will build the “action-centric” foundations of human common sense through a combination of techniques, such as knowledge distillation to deep neural networks and probabilistic reasoning-based “action-centric” knowledge mapping, with the aim to essentially link the concepts of objects, agents and places through a set of primitive actions.

For the sub-project 2, we will develop computational models that utilize sub-project 1’s learned common sense onto embodied agents (a.k.a. cognitive robots with sensors), and through lifelong interactions with the physical world and other social agents, to refine, process and build new concepts.

For the educational project we will bring developmental psychologists and cognitive development scientists to conduct a lecture series on the extensive research in the field of developmental psychology and embodied cognition has shown that infants’ actions play a critical role in constructing, modifying, and supporting learning throughout development [Smi05]. With these insights, we will encourage interdisciplinary discussion among computer scientists, computer engineers and psychologists to initiate the discussion on developing AI approaches that approximate the active process underlying conceptual development in humans. We will organize education sessions for workshop participants to learn and discuss them.

Educational Component:

The following tutorials may be offered:

  • Representations of action in cognitive studies, life-long learning.
  • Reinforcement learning for action representations and the role of knowledge distillation.
  • Neural and cognitive studies and perception sensing for action representations and learning.
  • Deep learning in Computer Vision, Natural Language Processing, Knowledge Retrieval and Reasoning, Knowledge Distillation and Robotics, partially using Nvidia Deep Learning Institute (Dr. Yang is a certified instructor) online training platform.
  • Developmental Psychology and Cognition mini lecture series.

Physical Platforms:

ASU team will drive from Phoenix to Telluride (8 hours’ drive) and will bring computers, monitors, cameras, light-weight motion capture system, small mobile robots (turtleBots), and other accessories for physical experiments set-up and the education purposes. We will provide tutorials on how to operate and conduct experiments on these platforms.


[Byr14] Byrge, L., Sporns, O., & Smith, L. B. (2014). Developmental process emerges from extended brain–body–behavior networks. Trends in Cognitive Sciences, 8, 395-403.

[Gop12] Gopnik, A. & Wellman, H. (2012). Reconstructing constructivism: causal models, Bayesian learning mechanisms, and the theory theory. Psychological Bulletin, 138(6): 1085-1108.

[Gun18] David Gunning (2018) Machine Common Sense. Concept Paper

[Smi05] Smith, L.B., & Gasser (2005). The development of embodied cognition: Six lessons from babies. Artificial Life, 11, 13-29.

[Sum12] D. Summers-Stay, C L Teo, Y. Yang, C. Fermüller, and Y. Aloimonos. “Using a minimal action grammar for activity understanding in the real world”, IROS, 2012.

[Yan13] Y. Yang, C. Fermüller, Y Aloimonos, ''Detection of Manipulation Action Consequences (MAC),'' IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

[Yan14] Y. Yang, C. Fermüller, A. Guha, and Y. Aloimonos "A cognitive system for understanding human manipulation actions," Advances in Cognitive Systems, 3, 67 - 86, 2014.

[Yan15] Yang, Y., Li, Y., Fermüller, C., & Aloimonos, Y. (2015). Robot Learning Manipulation Action Plans by" Watching" Unconstrained Videos from the World Wide Web. In AAAI (pp. 3686-3693).