We are excited to have following renowned researchers as speakers at our workshop:
Principal Researcher, Microsoft Research India, Adjunct Professor of Computer Science, IIT Delhi
Manik Varma is a Principal Researcher at Microsoft Research India and an Adjunct Professor of computer science at the Indian Institute of Technology (IIT) Delhi. His research interests lie in the areas of machine learning, computational advertising and computer vision. Classifiers that he has developed have been deployed on millions of devices around the world and have protected them from viruses and malware. His algorithms are also generating millions of dollars on the Bing search engine (up to sign ambiguity). In 2013, he and John Langford coined the term extreme classification and found that they had inadvertently started a new area in machine learning. Today, by happenstance, extreme classification is thriving in both academia and industry with Manik’s classifiers being used in various Microsoft products as well as in the wider tech sector. Manik recently proclaimed “2 KB (RAM) ought to be enough for everybody” prompting the media in the US, India, China, France, Belgium and Singapore to cover his research and compare him to Bill Gates (unfair, Manik’s more handsome!). Manik has been awarded the Microsoft Gold Star award, the Microsoft Achievement award, won the PASCAL VOC Object Detection Challenge and stood first in chicken chess tournaments and Pepsi drinking competitions. He has served as an area chair/senior PC member for machine learning, artificial intelligence and computer vision conferences such as AAAI, CVPR, ICCV, ICML, IJCAI and NIPS and is serving as an associate editor of the IEEE PAMI journal. Manik is also a failed physicist (BSc St. Stephen's College, David Raja Ram Prize), theoretician (BA Oxford, Rhodes Scholar), engineer (DPhil Oxford, University Scholar), mathematician (MSRI Berkeley, Post-doctoral Fellow) and astronomer (Visiting Miller Professor, UC Berkeley).
Title: Extreme Classification: Tagging on Wikipedia, Recommendation on Amazon & Advertising on Bing
Abstract: I will introduce extreme classification which is a new area of machine learning research focusing on multi-class & multi-label problems involving millions of categories. Extreme classification has opened up a new paradigm for thinking about key applications such as tagging, ranking and recommendation. I will discuss algorithms for some of these applications and present results on tagging on Wikipedia, product recommendation on Amazon and search and advertising on the Bing search engine. More details can be found on The Extreme Classification Repository at http://manikvarma.org/downloads/XC/XMLRepository.html
Research Scientist, Facebook Artificial Intelligence Research
Dr. Moustapha Cisse is a Research Scientist working at Facebook Artificial Intelligence Research. His long-term goal is to build reliable and fair human-level intelligence, then use it to improve the lives of those who need it most. Currently, he works towards that goal by designing novel algorithms addressing the problem of AI safety and by applying machine learning to challenges in the developing world. Previously, he obtained MSc and a Ph.D. in AI and machine learning at the University Pierre and Marie Curie (France) in 2014. Before, he studied Mathematics and Physics at the University Gaston Berger in Senegal in 2008.
Title: Deep Extreme Classification: From head to tail
Abstract: Neural Machine Translation and Language Modeling are exciting and challenging instances of Extreme Classification. They exhibit all the curses of learning to predict in high dimensional output spaces (long tail, computational complexity, etc.). Additionally, because the best performing solutions to these tasks use Deep Neural Networks, any solution must be amenable to training in the specialized hardware (GPUs) commonly used to learn the parameters of large models in a reasonable amount of time. In this talk, I will discuss how to leverage the structure of the problem to design solution satisfying these two desiderata: efficiency and accuracy. In particular, I will present alternatives to the traditional Softmax regression layer that are significantly faster to train while notably improving the performances.
Professor of Machine Learning at the Department of Computer Science of University of Kaiserslautern
Since 2017 Marius Kloft is a professor of machine learning at the Department of Computer Science of TU Kaiserslautern. Prior to joining TUK, he was a junior professor at HU Berlin (2014-2017) and a joint postdoctoral fellow at Courant Institute and MSKCC, working with M. Mohri, C. Cortes, and G. Rätsch. Marius Kloft is interested in theory and algorithms of statistical machine learning and its applications, especially in statistical genetics. He has been working on, e.g., multiple kernel learning, multi-task learning, anomaly detection, extreme classification, and adversarial learning for computer security. He has co-organized workshops on multiple kernel learning, multitask learning, anomaly detection, and extreme classication at NIPS (2010, 2013, 2014, 2017), ICML (2016), and Dagstuhl (2015, 2018). For his research, Marius Kloft received the Google Most Influential Papers 2013 award.
Title: Distributed Training of All-in-one Multi-class SVMs
Abstract: Training of multi-class or multi-label classification machines are embarrassingly parallelizable via the one-vs.-rest approach. However, training of all-in-one multi-class learning machines such as multinomial logistic regression or all-in-one multi-class SVMs (MC-SVMs) is not parallelizable out of the box. In my talk, I present optimization strategies to distribute the training of all-in-one multi-class SVMs over the classes, which makes them appealing for the use in extreme classification.
Assistant Professor Rice University
Anshumali Shrivastava is an assistant professor in the computer science department at Rice University. His broad research interests include randomized algorithms for large-scale machine learning. He is a recipient of National Science Foundation CAREER Award, a Young Investigator Award from Air Force Office of Scientific Research, and the machine learning research award from Amazon. His research on hashing inner products has won Best Paper Award at NIPS 2014 while his work on representing graphs got the Best Paper Award at IEEE/ACM ASONAM 2014. His work on how hashing can slash 95% or more computations in deep learning got picked up by several media outlets and got significant social media attention.
Title: Training 100,000 classes on a Single Titan X in 7 hours or 15 minutes with 25 Titan Xs.
Abstract: In this talk, I will present Merged-Averaged Classifiers via Hashing (MACH) for K-classification with ultra-large values of K. Compared to traditional one-vs-all classifiers that require O(Kd) memory and inference cost, MACH only need O(d log(K)) (d is dimensionality) memory while only requiring O(K log K + d log K) operation for inference. MACH is a generic K-classification algorithm, with provably theoretical guarantees, without any assumption on the relationship between classes. MACH uses universal hashing to reduce classification with a large number of classes to few (logarithmic many) independent classification tasks with small (constant) number of classes. I will show the first quantification of discriminability-memory tradeoff in multi-class classification. Using the simple idea of hashing, we can train ODP dataset with 100,000 classes and 400,000 features on a single Titan X GPU, with the classification accuracy of 19.28%, which is the best-reported accuracy on this dataset. Before this work, the best performing baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB model size) and achieves 9% accuracy. In contrast, MACH can achieve 9% accuracy with 480x reduction in the model size (of mere 0.3GB). With MACH, we also demonstrate complete training of fine-grained ImageNet dataset (compressed size 104GB), with 21,000 classes, on a single GPU. To the best of our knowledge, this is the first work to demonstrate complete training of these extreme-class datasets on a single Titan X.