SubSetML: Subset Selection in Machine Learning: From Theory to Practice
Workshop @ ICML 2021, 24th July 2021
Workshop Link: https://icml.cc/virtual/2021/workshop/8351
About this Workshop
A growing number of machine learning problems involve finding subsets of data points. Examples of these range from selecting a subset of labeled or unlabeled data points to selecting subsets of features or model parameters to selecting subsets of pixels, keypoints, sentences, etc. in image segmentation, correspondence, and summarization problems. The workshop would encompass a wide variety of topics ranging from theoretical aspects of subset selection e.g. coresets, submodularity, determinantal point processes, to several practical applications, e.g., time and energy-efficient learning, learning under resource constraints, active learning, human-assisted learning, feature selection, model compression, feature induction, etc.
Call For Papers
Subset selection is relevant and is growing in importance in an increasing number of applications in machine learning. It is a naturally emerging topic and has often been considered in isolation in many applications. We would like to invite original contributions (especially early research work) on the following topics.
Theoretical Directions
Coresets
Determinantal Point Processes
Submodular functions and their optimization
Information-Theoretic Approaches
Applications of Subset Selection
Compute efficient training (training time and energy efficiency)
Active Learning and selecting subsets of unlabelled data for labelling
Human assisted learning
Feature selection and dimensionality reduction
Cost-sensitive feature selection
Model compression
Rule augmentation and Data programming
Image segmentation, image correspondence, and MAP inference in graphical models.
Data Summarization (e.g. video, image collection, document, news summarization)
Peptide Matching, Proteomics, etc.
Learning of neural set functions
The above are just a few of the potential applications and theoretical directions. If you are working on anything related to subset selection in ML, AI, and deep learning, please consider submitting to and attending our workshop!
Submissions in the form of extended abstracts must be at most 6 pages long (not including references and an unlimited number of pages for supplemental material, which reviewers are not required to take into account) and adhere to the ICML format. We accept submissions of work recently published or currently under review. Submissions should be anonymized. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have either a link to an arxiv version of their paper or a pdf published on the workshop webpage. If the authors give us an arxiv link, we will link it here from the list of accepted papers on this webpage.
Important dates:
Submission deadline:
Tuesday, June 8th, 23:59 AOEAuthor notification:
Wednesday, June 16thDeadline for slideslive for selected talks:
Sunday, June 27th 2021Camera-ready deadline:
July 10th 2021Workshop date: Saturday, 24th July 2021
We will be using CMT to handle paper submissions (https://cmt3.research.microsoft.com/SUBSETML2021). Please submit papers before the deadline above.
Accepted Papers
We received a number of high-quality submissions to SubSetML 2021. We have accepted 33 papers as spotlight presentations and 12 papers as posters, making the total number of accepted papers 45. The full list of accepted papers is here.
Speakers
We are very excited to have an amazing set of speakers with a wide range of expertise ranging from discrete optimization, submodularity, and coresets, to applications of subset selection such as time and energy-efficient training, model compression, active learning, human-assisted AI, feature selection, and column selection, explainability, and rule induction.
Amin Karbasi (Yale University)
Andreas Krause (ETH Zurich)
Baharan Mirzasoleiman (UCLA)
Cody Coleman (Stanford)
Dan Feldman (Haifa University)
Luc De Raedt (KU Leuven)
Manuel Gomez Rodriguez (MPI-SWS)
Rajiv Khanna (UC Berkeley)
In addition to the above, Jeff Bilmes, Rishabh Iyer and Ganesh Ramakrishnan will also be speaking. Jeff will talk on recent work done in Summary Analytics (smr.ai), while Rishabh and Ganesh will talk about an open source platform DECILE (Data efficient Learning). Both talks are very relevant to this workshop.
Panel Discussion
In addition to the speakers, we have two additional panelists: Matthai Phillipose (Microsoft Research) and Gaurav Aggarwal (Google Research). The topic of the panel discussion is: Subset Selection for Machine Learning in the Real World
Workshop Schedule and Plan
The workshop will consist of ten talks, a spotlight session, a poster session, and a panel discussion, with enough time for scientific discussions throughout a full-day schedule. Each talk will be roughly 30 mins including questions -- 25 mins for the talk and 5 mins for Q&A. We plan to have talks pre-recorded, along with a live Q&A session. The introductory remarks and the panel discussion will be live.
Below is the detailed schedule. The timezone is PDT
06:00 AM - 06:30 AM: Introductory remarks by the organizers
06:30 AM - 07:00 AM: Talk 1: Introduction to Coresets and Open Problems (Dan Feldman)
07:00 AM - 07:30 AM: Talk 2: Differentiable learning Under Algorithmic Triage (Manuel Gomez)
07:30 AM - 08:00 AM: Talk 3: Data Summarization via Bilevel Coresets (Andreas Krause)
08:00 AM - 08:30 AM: Talk 4: Learning Constraints from Examples (Luc De Raet)
08:30 AM - 09:00 AM: Talk 5: Greedy and Its Friends (Amin Karbasi)
09:00 AM - 09:30 AM: Poster Session I
09:30 AM - 10:30 AM: Panel Discussion on Subset Selection for ML Problems in the Real World
10:30 AM - 10:45 AM: Talk 6.1: Benchmarks and Toolkits for Subset Selection in ML through DECILE (decile.org): Part 1 (Rishabh Iyer)
10:45 AM - 11:00 AM: Talk 6.2: Benchmarks and Toolkits for Subset Selection in ML through DECILE (decile.org): Part 2 (Ganesh Ramakrishnan)
11:00 AM - 11:30 AM: Talk 7: More Information, Less Data (Jeff Bilmes)
11:30 AM - 12 PM: Talk 8: Theory of feature selection (Rajiv Khanna)
12:00 PM - 13:10 PM: Spotlight Session I
13:10 PM - 14:00 PM: Posters
14:00 PM - 14:30 PM: Talk 9: Data-efficient and Robust Learning from Massive Datasets (Baharan Mirzasoleiman)
14:30 PM - 15:00 PM: Talk 10: Computationally Efficient Data Selection for Deep Learning (Cody Coleman)
15:00 PM - 16:10 PM: Spotlight Session II
Workshop Organizers
Rishabh Iyer (UT Dallas)
Abir De (IIT Bombay)
Ganesh Ramakrishnan (IIT Bombay)
Jeff Bilmes (University of Washington, Seattle)