SubSetML: Subset Selection in Machine Learning: From Theory to Practice

Workshop @ ICML 2021, 24th July 2021

Workshop Link:

About this Workshop

A growing number of machine learning problems involve finding subsets of data points. Examples of these range from selecting a subset of labeled or unlabeled data points to selecting subsets of features or model parameters to selecting subsets of pixels, keypoints, sentences, etc. in image segmentation, correspondence, and summarization problems. The workshop would encompass a wide variety of topics ranging from theoretical aspects of subset selection e.g. coresets, submodularity, determinantal point processes, to several practical applications, e.g., time and energy-efficient learning, learning under resource constraints, active learning, human-assisted learning, feature selection, model compression, feature induction, etc.

Call For Papers

Subset selection is relevant and is growing in importance in an increasing number of applications in machine learning. It is a naturally emerging topic and has often been considered in isolation in many applications. We would like to invite original contributions (especially early research work) on the following topics.

Theoretical Directions

  1. Coresets

  2. Determinantal Point Processes

  3. Submodular functions and their optimization

  4. Information-Theoretic Approaches

Applications of Subset Selection

  1. Compute efficient training (training time and energy efficiency)

  2. Active Learning and selecting subsets of unlabelled data for labelling

  3. Human assisted learning

  4. Feature selection and dimensionality reduction

  5. Cost-sensitive feature selection

  6. Model compression

  7. Rule augmentation and Data programming

  8. Image segmentation, image correspondence, and MAP inference in graphical models.

  9. Data Summarization (e.g. video, image collection, document, news summarization)

  10. Peptide Matching, Proteomics, etc.

  11. Learning of neural set functions

The above are just a few of the potential applications and theoretical directions. If you are working on anything related to subset selection in ML, AI, and deep learning, please consider submitting to and attending our workshop!

Submissions in the form of extended abstracts must be at most 6 pages long (not including references and an unlimited number of pages for supplemental material, which reviewers are not required to take into account) and adhere to the ICML format. We accept submissions of work recently published or currently under review. Submissions should be anonymized. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have either a link to an arxiv version of their paper or a pdf published on the workshop webpage. If the authors give us an arxiv link, we will link it here from the list of accepted papers on this webpage.

Important dates:

  • Submission deadline: Tuesday, June 8th, 23:59 AOE

  • Author notification: Wednesday, June 16th

  • Deadline for slideslive for selected talks: Sunday, June 27th 2021

  • Camera-ready deadline: July 10th 2021

  • Workshop date: Saturday, 24th July 2021

We will be using CMT to handle paper submissions ( Please submit papers before the deadline above.

Accepted Papers

We received a number of high-quality submissions to SubSetML 2021. We have accepted 33 papers as spotlight presentations and 12 papers as posters, making the total number of accepted papers 45. The full list of accepted papers is here.


We are very excited to have an amazing set of speakers with a wide range of expertise ranging from discrete optimization, submodularity, and coresets, to applications of subset selection such as time and energy-efficient training, model compression, active learning, human-assisted AI, feature selection, and column selection, explainability, and rule induction.

Amin Karbasi (Yale University)

Andreas Krause (ETH Zurich)

Baharan Mirzasoleiman (UCLA)

Cody Coleman (Stanford)

Dan Feldman (Haifa University)

Luc De Raedt (KU Leuven)

Manuel Gomez Rodriguez (MPI-SWS)

Rajiv Khanna (UC Berkeley)

In addition to the above, Jeff Bilmes, Rishabh Iyer and Ganesh Ramakrishnan will also be speaking. Jeff will talk on recent work done in Summary Analytics (, while Rishabh and Ganesh will talk about an open source platform DECILE (Data efficient Learning). Both talks are very relevant to this workshop.

Panel Discussion

In addition to the speakers, we have two additional panelists: Matthai Phillipose (Microsoft Research) and Gaurav Aggarwal (Google Research). The topic of the panel discussion is: Subset Selection for Machine Learning in the Real World

Workshop Schedule and Plan

The workshop will consist of ten talks, a spotlight session, a poster session, and a panel discussion, with enough time for scientific discussions throughout a full-day schedule. Each talk will be roughly 30 mins including questions -- 25 mins for the talk and 5 mins for Q&A. We plan to have talks pre-recorded, along with a live Q&A session. The introductory remarks and the panel discussion will be live.

Below is the detailed schedule. The timezone is PDT

06:00 AM - 06:30 AM: Introductory remarks by the organizers

06:30 AM - 07:00 AM: Talk 1: Introduction to Coresets and Open Problems (Dan Feldman)

07:00 AM - 07:30 AM: Talk 2: Differentiable learning Under Algorithmic Triage (Manuel Gomez)

07:30 AM - 08:00 AM: Talk 3: Data Summarization via Bilevel Coresets (Andreas Krause)

08:00 AM - 08:30 AM: Talk 4: Learning Constraints from Examples (Luc De Raet)

08:30 AM - 09:00 AM: Talk 5: Greedy and Its Friends (Amin Karbasi)

09:00 AM - 09:30 AM: Poster Session I

09:30 AM - 10:30 AM: Panel Discussion on Subset Selection for ML Problems in the Real World

10:30 AM - 10:45 AM: Talk 6.1: Benchmarks and Toolkits for Subset Selection in ML through DECILE ( Part 1 (Rishabh Iyer)

10:45 AM - 11:00 AM: Talk 6.2: Benchmarks and Toolkits for Subset Selection in ML through DECILE ( Part 2 (Ganesh Ramakrishnan)

11:00 AM - 11:30 AM: Talk 7: More Information, Less Data (Jeff Bilmes)

11:30 AM - 12 PM: Talk 8: Theory of feature selection (Rajiv Khanna)

12:00 PM - 13:10 PM: Spotlight Session I

13:10 PM - 14:00 PM: Posters

14:00 PM - 14:30 PM: Talk 9: Data-efficient and Robust Learning from Massive Datasets (Baharan Mirzasoleiman)

14:30 PM - 15:00 PM: Talk 10: Computationally Efficient Data Selection for Deep Learning (Cody Coleman)

15:00 PM - 16:10 PM: Spotlight Session II

Workshop Organizers

Rishabh Iyer (UT Dallas)

Abir De (IIT Bombay)

Ganesh Ramakrishnan (IIT Bombay)

Jeff Bilmes (University of Washington, Seattle)