AAAI 2013 Spring Symposium on Weakly Supervised Learning from Multimedia

March 25-27, 2013 (Stanford University, California)

What can computers learn about the real world from large quantities of audio-visual data, with minimal human supervision? While weakly supervised learning has been an active research topic in the natural language community, learning from large multimedia collections (and video in particular) is a field that is still in its infancy. Early efforts in this direction include learning models of objects and actions from internet video, humans in images and localizing sounds in audio.

Topics of interest for the symposium include:
  • Scaling weakly supervised learning to very large collections (e.g., internet video);
  • Features and representations;
  • Weakly supervised learning algorithms and connections to related topics, such as multiple instance learning and semi-supervised learning;
  • Learning in the presence of significant label noise;
  • Value of "seeding" weakly supervised learning with small amounts of strongly supervised data;
  • Datasets to enable direct comparison of approaches, including challenges in obtaining reliable groundtruth annotations;
  • Transfer learning (e.g., learning from video and testing in the image domain);
  • Weakly supervised approaches in robotics;
  • Combining audio/visual content with text;
  • Challenge problems in audio, image, video and multimodal domains.

AAAI Spring Symposium Information

This symposium is part of the  AAAI Spring Symposium series.  Participants registered for our symposium can also attend the other collocated events.  AAAI reserves a block of discounted hotels for participants as well as discounted parking on Stanford campus.  For additional details, see the 2013 AAAI Spring Symposia page.

Program (PDF)

All talks are 30mins (including questions/discussion).

Monday March 25 - Stanford University, History Building (200), Room 217, second level

Session 1: (9:00am-10:30am)
  • Opening
  • Vittorio Ferrari, "Learning object class detectors from consumer video"
  • Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei, "Object-centric spatial pooling for image classification"

Coffee break (10:30am-11:00am)

Session 2: (11:00am-12:30pm)
  • Ricardo Cabral, Fernando De la Torre, Joao P. Costeira, Alexandre Bernadino, "Rank minimization techniques for weakly supervised learning"
  • Gang Hua, "Joint visual Inference in an image collection with weakly supervised information"
  • Michael Rubinstein, Ce Liu, William T. Freeman, "Joint Inference in Image Databases via Dense Correspondence"

Lunch (12:30pm-2:00pm)

Session 3: (2:00pm-3:30pm)

  • Sourish Chaudhuri, Rita Singh, Bhiksha Raj, "Unsupervised and weakly supervised structure discovery for audio"
  • Stanley Kok, "Integration of TV news stories using Markov Logic Networks"
  • Judith Hoffman, Omid Madani, "The more classes the merrier?  The effect of additional classes on weakly supervised learning in video"

Coffee break (3:30pm-4:00pm)

Session 4: (4:00pm-5:30pm)

  • Jingen Liu, Qian Yu, Omar Javed, Saad Ali, Amir Tamrakar, Ajay Divakaran, Hui Cheng, Harpreet Sawhney, "Large-scale multimedia event detection and understanding"
  • Panel discussion: Vittorio Ferrari, Matthias Grundmann, Gang Hua, Kevin Murphy, Harpreet Sawhney
    (moderator: Rahul Sukthankar)

Reception (6:00pm-7:00pm)

Tuesday March 26 - Stanford University, History Building (200), Room 217, second level

Session 5: (9:00am-10:30am)
  • S. Hussein Raza, M. Grundmann, I. Essa, "Geometric context from videos"
  • Rong Jin, "Query budget online learning"
  • Kevin Tang, Rahul Sukthankar, Jay Yagnik, Li Fei-Fei, "Discriminative segment annotation in weakly labeled video"

Coffee break (10:30am-11:00am)

Session 6: (11:00am-12:30pm)

  • Zhigang Ma, Yi Yang, Yang Cai, Nicu Sebe, Alexander G. Hauptmann, "Structural adaptive regression for multimedia event detection with few positive exemplars"
  • Oscar Deniz Suarez, "Fight detection"
  • Hung Bui, Tuyen N. Huynh, J. Brian Burns, Vlad I. Morariu, "Recognizing and localizing actions in videos using multiple instance learning"

Lunch (12:30pm-2:30pm)

Session 7: (2:30pm-3:30pm)

  • Enrique Ortiz, Mubarak Shah, "Video action recognition with a handful of labeled examples"
  • James M. Rehg, "Egocentric recognition of objects and activities"

Coffee break (3:30pm-4:00pm)

Session 8: (4:00pm-5:30pm)

  • Alvaro Collet, Bo Xiong, Corina Gurao, Martial Hebert, Siddhartha S. Srinivasa, "Weakly supervised robotic object discovery"
  • Panel discussion: Irfan Essa, Bhiksha Raj, Rita Singh, Siddhartha Srinivasa
    (moderator: Omid Madani)

Plenary Session -- shared with other AAAI Symposia (6:00pm-7:00pm)

Wednesday March 27 - Google Research

Session 9: (10:00am-12:30pm)
Optional session held at Google -- participants who wish to attend must inform organizers in advance.
Presentations about selected Google projects and open discussion with researchers.

Lunch -- at Google for Session 9 attendees (12:30-1:30pm)

Primary Contact

Rahul Sukthankar. Email:

Organizing Committee

  • Omid Madani, Google Research
  • James M. Rehg, Georgia Tech
  • Rahul Sukthankar, Google Research and Carnegie Mellon