T6: Combinatorial Approaches for Data and Feature Subset Selection

IJCAI 2020 Tutorial

January 8th 2021, 9am - 12:15pm JST

Rishabh Iyer and Ganesh Ramakrishnan

Abstract

This tutorial will address several applications of data selection and summarization including data subset selection for efficient learning, active learning, feature selection and model compression, and data summarization (videos/images/text). We shall view all these problems from a lens of combinatorial optimization, and in particular, submodular optimization which has received a lot of attention in machine learning, robotics, game theory and social networks.

We shall study a rich class of submodular functions and corresponding optimization and learning algorithms, and particularly study their utility in modeling aspects such as representation, coverage, and diversity for summarization problems, and modeling the information of data and feature subsets for data subset selection, feature selection and active learning. We will provide a number of demos throughout this tutorial, with a significant hands on component to try these out.

Tutorial Details

IJCAI Tutorial Website: https://ijcai20.org/tutorials/

Time and Date (Different time zones): 9am - 12:15pm JST (8th January 2021) / 12am - 3:15 am UTC (8th Jan 2021) / 5:30am - 8:45am IST (8th Jan 2021)/ 7pm - 10:15pm EST (7th Jan 2021) / 6pm - 9:15pm CST (7th Jan 2021)/ 4pm - 7:15pm PST (7th Jan 2021)

Venue: Online (IJCAI conference virtualchair: https://www.virtualchair.net/events/ijcai-pricai-2020)

Slides

Part 1 (PDF)

Part 2 (PDF)

Motivation and Outline of this Tutorial

Data from multiple modalities in the form of images, videos, live streams, unstructured text, knowledge graphs have been growing at an unprecedented rate in the last few years. While this massive data is a blessing to data science by helping improve predictive accuracy, it is also a curse since humans are unable to consume this large amount of data. Moreover, today, machine generated videos (via Drones, Dash-cams, Body-cams, Security cameras, Go-pro etc.) as well as textual feeds (news sites, social media data, etc.) are being generated at a rate higher than what we as humans can process. Moreover, majority of this data is plagued with redundancy.

Massive data is not only difficult to consume, but also is also costly and time consuming to process. An important reason for this is that the ML and deep learning systems which consume these massive datasets, are getting increasingly complex and deeper. This results in increased time for training and delays in experimental turn-around time, compute and cloud costs for training these models, CO2 costs, and labeling costs and time. For example, training a deep model for NLP (with NAS) can cost around 1 M - 3.2 M $. The same model consumed 656kWh for 247,000 hours, and produces 626,000 pounds of CO2 (5x a lifetime of a car in the US). Similarly, training an ELMO model on a P100 GPU and a YOLO v5x on a V100 GPU can take 14 days and 8 days respectively for just a single run! Obtaining high quality labeled data is equally expensive! Labeling a 50k image dataset costs around 30k USD, while labeling a 50k 3D point cloud dataset can cost between 120k - 250k USD! For text classification, obtaining a 50k text classification dataset can cost between 80k - 150k USD (all estimates were obtained from Google, Amazon and hiva.ai, which all provide labeling services).

Given this data explosion, machine learning techniques which automatically understand, organize and categorize this data and its features are of utmost importance. We seek to ask the following questions:

  1. Can we train machine learning models on subsets of much larger datasets, thereby getting significant speedups while not compromising accuracy? Datasets today are growing, thereby creating the need for expensive and large GPU clusters, and larger experimental turn around times. Moreover, labeling these large datasets is getting more and more expensive and time consuming. =

  2. Can we identify feature/rule subsets for better generalization. The features/rules could come from a potentially exponentially large space. However, such a large space is often fraught with redundancies. Can we select feature and rule subsets to enable efficient and cost sensitive learning?

  3. Can we create a summary containing the highlights (best moments/diverse set of events) in a image collection, video or large text corpus.

Motivated by these applications, we shall explore a unified combinatorial approach of modeling the above problems. In particular, we shall study:

  1. Properties and examples of submodular functions and their modeling aspects,

  2. The role of submodular functions in summarization and why they are a natural fit and also the problem of learning submodular functions for summarization,

  3. Study several natural and interesting formulations of data subset selection for machine learning and show how they naturally connect to submodularity. We will study theoretical and empirical properties of these algorithms!

  4. Different types of feature selection and rule induction techniques, and connections to submodularity

  5. Data summarization in various applications (e.g. videos, images, text), and also different aspects (generic, query focused, privacy preserving).

Topics Covered in this Tutorial

This tutorial shall cover the following topics in detail, with the following tentative structure:

Part I: Submodularity: Models, Examples and Optimization Algorithms

    1. Motivate the problems of data summarization (30 mins)

    2. An introduction of Submodular Functions and Optimization algorithms (45 mins)

    3. Submodular Information Measures (15 mins)

    4. A demo of submodular optimization and machine learning in python (10 mins)

Part II: Applications in data subset selection, active learning, feature selection, and summarization

  1. Data Selection for training and labeling, Feature Selection and Coresets and connections to Submodularity (30 mins)

  2. Submodularity in active and interactive learning (15 mins)

  3. Python demo of data subset selection and active learning (15 mins)

  4. Feature Selection and Rule Induction (15 mins)

  5. Submodular Functions and their applications in summarizing image collection, videos and topic hierarchies (30 mins = 3 x 10 mins)


Video Recordings of the Tutorial

References

Submodular Optimization

  1. Wolsey, Laurence A. "An analysis of the greedy algorithm for the submodular set covering problem." Combinatorica 2.4 (1982): 385-393.

  2. Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. "An analysis of approximations for maximizing submodular set functions—I." Mathematical programming 14.1 (1978): 265-294

  3. Mirzasoleiman, Baharan, et al. "Distributed submodular maximization: Identifying representative elements in massive data." Advances in Neural Information Processing Systems 26 (2013): 2049-2057

  4. Rishabh Iyer and Jeff Bilmes, Submodular optimization with submodular cover and submodular knapsack constraints, In Advances Neural Information Processing Systems 2013 (Best Paper Award)

  5. Rishabh K Iyer, Stefanie Jegelka, Jeff A Bilmes, Curvature and optimal algorithms for learning and minimizing submodular functions, In Advances of Neural Information Processing Systems 2013

  6. Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes, Fast semidifferential-based submodular function optimization, International Conference on Machine Learning (ICML) 2013 (Best Paper Award)

  7. Kai Wei, Rishabh K. Iyer, Jeff A. Bilmes, Fast multi-stage submodular maximization, International Conference on Machine Learning (ICML 2014 Link to Video) )

  8. Mirzasoleiman, Baharan, et al. "Lazier than lazy greedy." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29. No. 1. 2015.

  9. Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

  10. Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani, Submodular Combinatorial Information Measures with Applications in Machine Learning, The 32nd International Conference on Algorithmic Learning Theory, ALT 2021 (29.2% Acceptance Rate)

  11. Rishabh Iyer and Jeff Bilmes, A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems, To Appear in Artificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan (32.4% Acceptance Rate)

  12. Rishabh Iyer and Jeff Bilmes, Concave Aspects of Submodular Functions, In IEEE International Symposium on Information Theory, ISIT 2020


Data Subset Selection

  1. Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015:

  2. Wei, Kai, et al. "Submodular subset selection for large-scale speech training data." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, 2014.

  3. Kirchhoff, Katrin, and Jeff Bilmes. "Submodularity for data selection in machine translation." Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.

  4. Yuzong Liu, Rishabh Iyer, Katrin Kirchhoff, Jeff Bilmes, SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization, Computer Speech & Language 42, 122-142, 2017

  5. Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Hawaii, USA

  6. Coleman, Cody, et al. "Selection via proxy: Efficient data selection for deep learning." ICLR 2020

  7. Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for Data-efficient Training of Machine Learning Models. In International Conference on Machine Learning (ICML), July 2020.

  8. Baharan Mirzasoleiman, Kaidi Cao, Jure Leskovec, Coresets for Robust Training of Neural Networks against Noisy Labels, In Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020

  9. Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, 35th AAAI Conference on Artificial Intelligence, AAAI 2021

  10. S Durga, Krishnateja Killamsetty, Abir De, Ganesh Ramakrishnan, Baharan Mirzasoleiman, Rishabh Iyer, Grad-Match: A Gradient Matching based Data Selection Framework for Efficient Learning (coming soon)


Active Learning

  1. Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.

  2. A New Active Labeling Method for Deep Learning, IJCNN, 2014

  3. Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015

  4. Deep Bayesian Active Learning with Image Data, ICML, 2017

  5. Ashish Kulkarni, Narasimha Raju Uppalapati, Pankaj Singh, Ganesh Ramakrishnan: An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA.

  6. Active Learning for Convolutional Neural Networks: A Core-Set Approach, ICLR, 2018

  7. Adversarial Active Learning for Deep Networks: a Margin Based Approach, arXiv, 2018

  8. Vishal Kaushal, Rishabh Iyer, Anurag Sahoo, Khoshrav Doctor, Narasimha Raju, Ganesh Ramakrishnan, Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks, In Proceedings of The 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.

  9. Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." ICLR 2020.

  10. Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021


Feature Selection

  1. Peng, Hanchuan, Fuhui Long, and Chris Ding. "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy." IEEE Transactions on pattern analysis and machine intelligence 27.8 (2005): 1226-1238.

  2. Brown, Gavin, et al. "Conditional likelihood maximisation: a unifying framework for information theoretic feature selection." The journal of machine learning research 13.1 (2012): 27-66.

  3. Naveen Nair, Amrita Saha, Ganesh Ramakrishnan, Shonali Krishnaswamy , Efficent Rule Ensemble Learning in Structured Outpt Spaces, AAAI 2012

  4. Rishabh Iyer, Jeff Bilmes, Algorithms for approximate minimization of the difference between submodular functions, with applications, Uncertainty in Artificial Intelligence (UAI) 2012

  5. Bateni, MohammadHossein, et al. "Categorical feature compression via submodular optimization." arXiv preprint arXiv:1904.13389 (2019).

  6. Srijita Das, Rishabh Iyer, Sriraam Natarajan , A Clustering based Selection Framework for Cost Aware and Test-time Feature Elicitation, In CODS-COMAD 2021 (Best Paper Award, Honorable Mention)

Data Summarization

  1. Ian Simon, Noah Snavely, and Steven M. Seitz. Scene Summarization for Online Image Collections. In ICCV, 2007. Zhang, Ke, et al.

  2. Lin, Hui, and Jeff Bilmes. "A class of submodular functions for document summarization." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011.

  3. Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes, Learning mixtures of submodular functions for image collection summarization, In Advances in Neural Information Processing Systems (NIPS) 2014

  4. Gong, Boqing, et al. Diverse sequential subset selection for supervised video summarization. Advances in Neural Information Processing Systems. 2014

  5. Gygli, Michael, Helmut Grabner, and Luc Van Gool. Video summarization by learning submodular mixtures of objectives. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  6. Xu, Jia, et al. Gaze-enabled egocentric video summarization via constrained submodular maximization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015

  7. Ramakrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan and Jeff Bilmes , Summarizing Multi-Document Topic Hierarchies using Submodular Mixtures, (ACL), 2015

  8. Sharghi, Aidean, Boqing Gong, and Mubarak Shah. "Query-focused extractive video summarization. European Conference on Computer Vision. Springer, Cham, 2016.

  9. Video summarization with long short-term memory. European conference on computer vision. Springer, Cham, 2016.

  10. Mirzasoleiman, Baharan, Stefanie Jegelka, and Andreas Krause. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. arXiv preprint arXiv:1706.03583 (2017).

  11. Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Sandeep Subramanium, and Ganesh Ramakrishnan, A Framework Towards Domain Specific Video Summarization, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.

  12. Vishal Kaushal, Rishabh Iyer, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan, Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.

  13. Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Himanshu Asnani, and Rishabh Iyer, A Unified Framework for Generic, Query-Focused, Privacy Preserving and Update Summarization using Submodular Information Measures, arXiv:2010.05631