Subset Selection in Machine Learning: Theory, Applications, and Hands On


23rd February, Wednesday, 8:30 AM to 12:30 PM PST

Goal of this tutorial

The goal of this tutorial is to provide a gentle introduction to ideas in combinatorial optimization, submodularity, and coresets to the broader machine learning and deep learning researchers, and ground this in applications. Specifically, we believe that the applications presented in this tutorial in areas such as label efficient, compute efficient, robust, fair and personalized learning will enable researchers to think beyond just improving the model accuracy and in broader yet important aspects like Green AI, fairness, robustness, personalization, data efficiency and so on. Furthermore, the hands-on demonstrations will also be useful to students and researchers from industry to get oriented in and practically stated with these topics. Another goal of this tutorial is to connect researchers working on theoretical and algorithmic areas to the numerous applications where their work can have impact, and vice versa. The target audience of this tutorial are practitioners in deep learning and machine learning as well as researchers working on more theoretical areas in optimization in machine learning.

Topics Covered in This Tutorial

[Rishabh] Introduction (5 Mins)

Part I: Theory (1 Hour: 08:35 to 09:35 PST)

  1. [Jeff] Introduction to Submodularity: Definition, Properties, Examples (20 Mins)

  2. [Jeff] Submodular Optimization Problems and Algorithms: Minimization, Maximization, Constraints, etc. (20 Mins)

  3. [Rishabh] Introduction to Combinatorial Information Measures: Submodular (Conditional) Mutual Information, Conditional Gains, Multi-Set information (20 Mins)

Part II: Applications of Subset Selection (1 Hour, 30 Mins, 09:35 PST to 11:05 PST)

  1. [Jeff] A summary of the history of summarization, coresets, sketches, and distillation (10 Mins)

  2. [Ganesh] Compute-Efficient DNN Learning: Reduce model training time/hyperparameter tuning/Automl time by factors of 3x to 10x and reduce inference time via Model Pruning/Compression (20 Mins)

  3. [Rishabh] Active Learning: Select unlabeled data items iteratively to reduce labeling efforts (20 Mins)

  4. [Abir] Human Assisted Learning: Rely on an human for certain critical decisions (20 Mins)

  5. [Abir] Social Networks: Subset Selection applications in social network analysis (20 Mins)

25 Minute Break (11:05 PST to 11:30 PST)

Part III: Hands-On (1 Hour, 11:30 PST to 12:30 PST)

  1. [Suraj] SubmodLib: Submodular Optimization (15 Mins)

  2. [Krishnateja] CORDS: Efficient Learning/Auto-ML (15 Mins)

  3. [Nathan] DISTIL: Active Learning (15 Mins)

  4. [Jeff] smr.ai: Submodularity in Industry (15 Mins)

Tutorial Material

For the slides presented in this tutorial, please see https://drive.google.com/drive/folders/16fvksE_X9D_2ewYGuTdfMi8hIMQ_qPr8?usp=sharing

Why this Tutorial?

A growing number of machine learning problems involve finding subsets of data points. Examples range from selecting a subset of labeled or unlabeled data points to selecting subsets of features or parameters of a deep model to selecting subsets of data for outsourcing predictions to humans (human-assisted machine learning). The tutorial would encompass a wide variety of topics ranging from theoretical aspects of subset selection {\em e.g.}, coresets, submodularity, determinantal point processes, to several practical applications, {\em e.g.}, time and energy-efficient learning, learning under resource constraints, active learning, human-assisted learning, feature selection, model compression, feature induction, fair, robust and personalized machine learning etc.

We believe that this tutorial will prove very useful since, a) subset selection is naturally emerging and has often been considered in isolation in several of the above applications, and b) by connecting researchers working on both the theoretical and application domains above, we can foster a much needed discussion on reusing several technical innovations across these subareas and applications. Furthermore, we would also like to connect researchers working on the theoretical foundations of subset selection (in areas such as coresets and submodularity) with researchers working in applications (such as feature selection, active learning, data-efficient learning, model compression, and human-assisted machine learning).

Tutorial Organizers