• 7:30 - 7:45 Opening remarks, welcome, overview of objectives of the day and talks & posters/demos
  • 7:45 - 8:45 Invited talk by Samy Bengio: Large Scale Image Annotation: Learning to Rank with Joint Word-Image

    Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations.  We propose a strongly performing method  that scales to such datasets  by simultaneously learning to optimize precision at $k$ of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the newly introduced ``sibling'' precision metric, where our method also obtains excellent results (this is joint work with Jason Weston and Nicolas Usunier). 

    The talk will finish with my own views of the upcoming challenges for the machine learning for computer vision community.

  • 8:45 - 8:55 Contributed Challenge: Gesture Recognition Competition (Isabelle Guyon, Vassilis Athitsos, Jitendra Malik, Ivan Laptev)
  • 8:55 - 9:10 Poster / Demo *Spotlights*
  • 9:10 - 9:45 Break -- Poster/Demo/Challenge session
  • 9:45 - 10:15 Contributed talk: Sifting through Images with Multinomial Relevance Feedback (Dorota Glowacka, Alan Medlar, John Shawe-Taylor) [paper]
  • 10:15 - 10:45 Food for thought: Short discussion on state-of-the-art and current challenges.

  • Ski Break / Discussion

  • 3:30 - 4:30 Invited talk by Fei-Fei Li: Visual recognition: ML methods for handling hidden structure, high dimensionality and large-scale data

    Understanding the meaning and structure of images is a central topic in computer vision. In this talk, I present a number of recent projects from the Stanford Vision Lab on tackling the problem of scene understanding, object recognition and human-object interactions. Each of these visual recognition tasks highlights an aspect of the challenges in vision: inferring hidden structure, handling high dimensionality, and classifying tens of thousands of objects in a data ontology of tens of millions of samples. We show how we use structural learning, sparsification and efficient metric learning methods in both generative and discriminative frameworks to approach these problems. If time permits, I will also discuss challenges and opportunities of ML research in vision now and in the future.

  • 4:30 - 5:00 Contributed talk: Toward Artificial Synesthesia Linking Images and Sounds via Words. (Han Xiao,Thomas Stibor) [paper]
  • 5:00 - 5:30 Break & Poster / demo session
  • 5:30 - 6:00 Invited talk by Sebastian Nowozin No Hype, All Hallelujah: Structured Models in Computer Vision

    Rich statistical models have revolutionized computer vision research: graphical models and structured prediction in particular are now commonly used tools to address hard computer vision problems.  I discuss what distinguishes these computer vision problems from other machine learning problems and how this poses unique challenges. One current line of research to address these is in enriching the model structure: the use of latent variables, hierarchical and deep architectures, higher-order interactions, and structure learning.  All suffer from the limitations of todays estimation and learning methods and make clear the need for alternatives.Towards this goal I will discuss current parameter estimation methods for discrete graphical models, exposing conceptual flaws of popularly used methods and advantages of less well-known estimators. I will conclude with a positive outlook on the future.

  • 6:00 - 6:30 Close Panel discussion / thoughts / feedback : Challenges in Computer Vision, opportunities for ML to make a 'quantum leap' impact