Challenges in Learning Hierarchical Models: Transfer Learning and Optimization
A workshop in conjunction with the 25th Annual Conference on Neural Information Processing Systems (NIPS 2011)
Saturday December 17, 2011
Melia Sierra Nevada & Melia Sol y Nieve, Sierra Nevada, Spain
The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of solving many AI related tasks, including visual object recognition, information retrieval, speech perception, and language understanding. Hierarchical models that support inferences at multiple levels have been developed and argued as among the most promising candidates for achieving such goal. An important property of these models is that they can extract complex statistical dependencies from high-dimensional sensory input and efficiently learn latent variables by re-using and combining intermediate concepts, allowing these models to generalize well across a wide variety of tasks.
In the past few years, researchers across many different communities, from applied statistics to engineering, computer science and neuroscience, have proposed several hierarchical models that are capable of extracting useful, high-level structured representations. The learned representations have been shown to give promising results for solving a multitude of novel learning tasks. A few notable examples of such models include Deep Belief Networks, Deep Boltzmann Machines, sparse coding-based methods, nonparametric and parametric hierarchical Bayesian models.
Despite recent successes, many existing hierarchical models are still far from being able to represent, identify and learn the wide variety of possible patterns and structure in real-world data. Existing models can not cope with new tasks for which they have not been specifically trained. Even when applied to related tasks, trained systems often display unstable behavior. Furthermore, massive volumes of training data (e.g., data transferred between tasks) and high-dimensional input spaces pose challenging questions on how to effectively train the deep hierarchical models. The recent availability of large scale datasets (like ImageNet for visual object recognition or Wall Street Journal for large vocabulary speech recognition), the continuous advances in optimization methods, and the availability of cluster computing have drastically changed the working scenario, calling for a re-assessment of the strengths and weaknesses of many existing optimization strategies.
The aim of this workshop is to bring together researchers working on such hierarchical models to discuss two important challenges: the ability to perform transfer learning and the best strategies to optimize these systems on large scale problems. These problems are "large" in terms of input dimensionality (in the order of millions), number of training samples (in the order of 100 millions or more) and number of categories (in the order of several tens of thousands). During the course of the workshop, we shall be interested in discussing the following topics:
1. State of the field: What are the existing methods and what is the relationship between them? Which problems can be solved using existing learning algorithms and which require fundamentally different approaches? How are current methods optimized? Which models can scale to very high-dimensional inputs, to datasets with large number of categories and with huge number of training samples? Which models best leverage large amounts of unlabeled data?
2. Learning structured representations: How can machines extract invariant representations from a large supply of high-dimensional highly-structured unlabeled data? How can these representations be used to represent and learn tens of thousands of different concepts (e.g., visual object categories) and expand on them without disrupting previously-learning concepts? How can these representations be used in multiple applications?
3. Transfer learning: How can previously-learned representations help learning new tasks so that less labeled supervision is needed? How can this facilitate knowledge representation for transfer learning tasks?
4. One-shot learning: For many traditional machine classification algorithms, learning curves are measured in tens, hundreds or thousands of training examples. For humans learners, however, just a few training examples is often sufficient to grasp a new concept. Can we develop models that are capable of efficiently leveraging previously-learned background knowledge in order to learn novel categories based on a single training example? Are there models suitable for generalizing across domains, when presented with one or few examples?
5. Scalability and success in real-world applications: How well do existing transfer learning models scale to large-scale problems including problems in computer vision, natural language processing, and speech perception? How well do these algorithms perform when applied to modeling high-dimensional real-world distributions (e.g. the distribution of natural images)?
6. Optimization: Which optimization methods are best for training a deep deterministic network? Which stochastic optimization algorithms are best for training a probabilistic generative models? Which optimization strategies are best to train on several thousands of categories?
7. Parallel computing: which optimization algorithm is best on GPU's. and which benefit the most by parallel computing on a cloud?
8. Theoretical Foundations: What are the theoretical guarantees of learning hierarchical models? Under what conditions is it possible to provide performance guarantees for such algorithms?
9. Suitable tasks and datasets: What are the right datasets and tasks that could be used in future research on the topic and to facilitate comparisons between methods?
In order to facilitate the discussion, we will invite participants to test their methods on the following two challenges.
- Transfer Learning Challenge: we will make available a dataset that has a large amount of unlabeled data and a large number of categories. The task is to categorize samples belonging to a novel category that has only few labeled training samples available. Participants will have to follow a strict training/test protocol to make results comparable. Performance is measured in terms of accuracy as well as training and test time.
- Optimization Challenge: the aim is to test several optimization algorithms to train a non-linear predictor on a large scale dataset. A strict protocol will be enforced to make results comparable and performance will be evaluated in terms of accuracy as well as training time both on single core machine as well as GPU.
Paper Submission Deadline: 23:59 PDT, Sunday 23 October 2011,
Challenge Submission Deadline: 23:59 PDT, Sunday 23 October 2011
Acceptance Notification: 28 October 2011
Workshop Date: Saturday 17 December 2011
Submissions of Papers
We solicit submissions of unpublished research papers. Papers must have at most 6 pages (even in the form of extended abstracts), and must satisfy the formatting instructions of the NIPS 2011 call for papers. Papers need not to be anonymous. Submissions should include the title, authors' names, institutions and email addresses. Style files are available here.
Papers should be submitted in pdf or ps format by email to: email@example.com no later than 23:59 PDT, Sunday, October 23, 2011.
We encourage submissions on the following and related topics:
* transfer learning
* one-shot learning
* learning hierarchical models
* scalability of hierarchical models at training and test time
* deterministic and stochastic optimization for hierarchical models
* parallel computing
* theoretical foundations of transfer learning
* applications of hierarchical models to large scale datasets
Contributors will present their paper with a short spotlight presentation as well as a poster presentation.
Submissions of Challenges
We solicit participants to test their methods on two challenges:
- Transfer Learning Challenge
- Optimization Challenge
The aim is to facilitate discussion among participants through a common task using a strict protocol that makes results comparable.
More details about the transfer learning challenge are available here while details about the optimization challenges are available here.
The winners of the challenges will be awarded by an oral presentation of their method.
Results should be submitted via email to: firstname.lastname@example.org no later than 23:59 PDT, Sunday, October 23, 2011.
The paper and challenge tracks are independent from each other. This means that the contributors can choose to submit a paper to the workshop without contesting in challenges or contest in one of the challenges without submitting. Multiple submissions to different tracks (paper and challenge) are also encouraged.
Quoc V. Le, Computer Science Department, Stanford University
Marc'Aurelio Ranzato, University of Toronto and Google Inc
Ruslan Salakhutdinov, Department of Statistics, University of Toronto
Andrew Ng, Computer Science Department, Stanford University
Josh Tenenbaum, Department of Brain and Cognitive Sciences, MIT