Workshop on Multimodal Understanding and Learning for Embodied Applications

Welcome to the website of MULEA 2019 Workshop

The MULEA 2019 Workshop will be held in conjunction with the 2019 ACM Multimedia Conference, in Nice, France, from 2019/10/21 through 2019/10/25. The full name of the Workshop is the The 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications.

The focus of the workshop is on the embodied applications covering many of the “fashionable” applications in AI, such as robotics, autonomous driving, multimodal chatbots, or simulated games. It also covers many new and exciting research areas.


  1. Multimodal context understanding.
    • Context include environment, task/goal states, dynamic objects of the scene, activities, etc. Relevant research streams include visually grounded learning, context understanding, and environmental modeling which includes 3D environment modeling and understanding. Language grounding is also an interesting topic. Connecting the vision and language modalities is essential in applications such as question answering and image captioning. Other relevant research areas include multimodal understanding, context modeling, and grounded dialog systems.
  2. Knowledge inference.
    • Knowledge in this multimedia scenario is represented with knowledge graph, scene graph, memory, etc. Representing contextual knowledge is a topic that has attracted much interest, and goal-driven knowledge representation and reasoning are also new research directions. Deep learning methods are good options to deal with unstructured multimodal knowledge signals.
  3. Embodied learning.
    • Building on context understanding and knowledge representation, the policy generates actions for intelligent agents to achieve goals or finish tasks. The input signals are multimodal and can be images or dialogues, etc. The learning policies not only need to provide short-term reactions, but also need to plan its actions to optimally finish the long-term goals. The actions may involve navigation and localization as well, which are mainstream in the robotics and self-driving vehicle fields. This is relevant to reinforcement learning, and the algorithms are driven by multiple industrial applications in robotics, self-driving vehicles, simulated games, multimodal chatbots, etc.