We are pleased to announce our new Multimodal Machine Learning Taxonomy and Survey which summarizes and expands on the topics presented in the workshop -

Welcome to the 2015 Multimodal Machine Learning workshop homepage!

With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, multimodal machine learning is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence (AI) by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. This research field brings some unique challenges for machine learning researchers given the heterogeneity of the data and the contingency often found between modalities. This workshop will bring together researchers from natural language processing, multimedia, computer vision, speech processing and machine learning to discuss the current challenges in multimodal machine learning and identify the research infrastructure needed to enable a stronger collaboration between multi-disciplinary researchers.

The workshop will take place in 512Dh on Friday the 11th of December, 2015.


  • Louis-Philippe Morency
  • Aaron Courville
  • Tadas Baltrušaitis
  • KyungHyun Cho