Research
Research Program
My research interests are broadly in the intersection of optimization and machine learning. Specifically, I'm very interested in going beyond accuracy (which today, thanks to deep learning, we have achieved near-human performance), but also try to achieve other desiderata such as compute and memory efficiency, human interaction, label efficiency, robustness, fairness, etc. I'm interested in efficiency on multiple fronts: label efficiency (how can we learn with less labeled data), model efficiency (reducing model complexity for resource-constrained environments), and time and resource efficiency (how do we reduce end to end running time of training, and train models on resource-constrained environments). I am also interested in building intelligent systems that organize, analyze, and summarize massive amounts of data, and also automatically learn from this. Below are some of the concrete applications my group is currently working on.
Data Subset Selection/Coresets for efficient learning: How do we select the right sets of data making training/inference/hyperparameter tuning etc. more efficient, particularly for deep learning? I'm interested in speeding up deep learning by an order of magnitude (e.g. 5x to 10x speedup) by training on much smaller subsets without significant loss in accuracy or other evaluation metrics.
Active Learning for Deep Models: How do we tradeoff between uncertainty and diversity in a principled manner for active learning (i.e. iteratively selecting labeled data points) in deep learning? This is particularly important since labeled data is very time-consuming and expensive to obtain for real-world problems. We study techniques that can achieve 2x - 5x labeling cost reductions for a wide range of applications.
Data Programming and Weak Supervision: Using Weak Supervision to automatically create noisy labeled data for reducing labeling costs
Robust Learning: How do we learn machine learning models in a robust manner in the presence of noisy labels, outliers, distribution shift, and imbalance.
Fair & Continuous Learning: Learning Deep Models and ML Models while ensuring fairness to under-represented and minority classes and attributes, and continuously learning from data in a resource and compute efficient manner.
Feature selection: What are principled ways of selecting the right sets of features and how to do these in model-dependent or model-independent ways? How do we do these when eliciting features have a cost associated (e.g. in medical domains, each additional medical test might have a cost) and in an online manner.
Neural Network Compression and Architecture Search in Resource Constraints: How do we compress neural networks (top-down) or search for resource-constrained architectures (bottom-up) in an efficient manner?
Data Summarization: What makes a good summary of data and how do we consume these summaries
Data Partitioning: Efficient partitioning of data for clustering and distributed training
I'm also interested in achieving multiple desiderata simultaneously, i.e., approaches that can be efficient (either label or compute efficient), while being robust, fair, etc.
Motivating Applications
Below are more details of the applications listed above. We study each of the applications below in a broad range of domains including computer vision, video analytics, speech recognition, and natural language processing/text classification.
Data Subset Selection/Coresets for Efficient Learning
Selecting the right dataset for training is a critical problem today given massive datasets – both from training efficiency and labeling cost. This could be unsupervised, where we don’t have labels (select a subset of unlabeled data points for labeling) or supervised, where we have labels (for faster training or hyper-parameter tuning). In either case, we are interested in obtaining a representative subset of instances for training machine learning models. We show that the problem of selecting a subset of data with maximum likelihood on the training set is a submodular optimization problem, for several classifiers. We show that by learning of the right data subsets, we can achieve significant speedups in training time (between 5x - 10x) with minimal loss in accuracy.
Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Abir De, Rishabh Iyer, GRAD-MATCH: A Gradient Matching Based Data Subset Selection for Efficient Deep Model Training, In ICML 2021
Durga Sivasubramanian, Rishabh Iyer, Ganesh Ramakrishnan, and Abir De, Training Data Subset Selection for Regression with Controlled Validation Error, In ICML 2021
Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, In AAAI 2021
Krishnateja Killamsetty, Xujiang Zhou, Feng Chen, and Rishabh Iyer, RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning, In NeurIPS 2021
Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015
Yuzong Liu, Rishabh Iyer, Katrin Kirchhoff, Jeff Bilmes, SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization, Computer Speech & Language 42, 122-142, 2017 (shorter version also appeared in INTERSPEECH 2015)
Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019
Active Learning in the Real World
Another very important applications we study in our group is Active Learning: How to reduce the labeling costs by selecting (in an active learning manner) the right subset/batch of examples to label. Active Learning approaches can reduce the amount labeled data required significantly (by almost 5x to 20x) while not significantly reducing accuracy. I am also interested in active learning in realistic settings, i.e. with OOD, rare classes, imbalance, etc. We applied active learning to a number of application domains including computer vision, text classification, and speech recognition, and below are some of our recent papers on this topic.
Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Rishabh Iyer, PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization, AAAI 2022
Suraj Kothawade; Nathan Beck; Krishnateja Killamsetty; Rishabh Iyer, SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios, In NeurIPS 2021
Suraj Kothawade, Saikat Ghosh, Sumit Shekhar, Yu Xiang, Rishabh Iyer, TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information, In ECCV 2022
Nathan Beck, Durga Sivasubramanian, Apurva Dani, Ganesh Ramakrishnan, and Rishabh Iyer, Effective Evaluation of Deep Active Learning on Image Classification Tasks, arXiv: 2106.15324
Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019
Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015
Semi-Supervised Learning in the Real World
In our group, we are also working on developing semi-supervised learning for real-world situations like out of distribution data and imbalance. We have also designed efficient approaches for semi-supervised learning. Finally, we also studied semi-supervised few shot and meta learning.
Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen, Robust Semi-Supervised Learning with Out of Distribution Data, In Proc. ICDM 2022
Krishnateja Killamsetty, Xujiang Zhou, Feng Chen, and Rishabh Iyer, RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning, In NeurIPS 2021
Few-Shot and Meta-Learning
Another application in data-efficient learning space is meta-learning and few-shot learning. We have studied meta-learning in real-world scenarios like noise and OOD data, and semi-supervised meta learning.
Changbin Li, Suraj Kothawade, Feng Chen, and Rishabh Iyer, PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information, In International Conference of Machine Learning, ICML 2022
Krishnateja Killamsetty, Changbin Li, Chen Zhou, Feng Chen, Rishabh Iyer, A Nested Bi-level Optimization Framework for Robust Few Shot Learning, To Appear In 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (15% Acceptance Rate)
Data Programming & Weak Supervision
Getting high quality labelled data is very expensive, and machine learning models require massive amounts of labelled data. I am studying approaches of weak supervision for effectively learning machine learning models with very few labelled instances and a large number of unlabelled instances using noisy labels from multiple sources (semi-supervised data programming). I'm also interested in subset selection problems in this space (e.g. how do we select a subset of labeling functions for robustness, and selecting a subset of labeled instances to complement
Ayush Maheshwari, Krishnateja Killamsetty, Ganesh Ramakrishnan, Rishabh Iyer, Marina Danilevsky, Lucian Popa, Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming, Findings of ACL, 2022 (Long Paper)
Ayush Maheshwari, Oishik Chatterjee, KrishnaTeja Killamsetty, Ganesh Ramakrishnan, and Rishabh Iyer, Data Programming using Semi-Supervision and Subset Selection, To Appear in Findings of ACL, 2021 (Long Paper)
Atul Sahay, Anshul Nasery, Ayush Maheshwari, Ganesh Ramakrishnan, and Rishabh Iyer, Rule Augmented Unsupervised Constituency Parsing, To Appear in Findings of ACL, 2021 (Short Paper)
Robust Learning
Can we make machine learning algorithms robust to noisy labels, out of distribution samples, distribution shift and imbalance? We study this problem in various settings (supervised, semi-supervised, and few shot learning) and also study the impact of robustness in these settings. We pose this problem as a bi-level optimization, and study algorithms for solving this.
Krishnateja Killamsetty, Changbin Li, Chen Zhou, Feng Chen, Rishabh Iyer, A Nested Bi-level Optimization Framework for Robust Few Shot Learning, In 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (15% Acceptance Rate)
Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen, Robust Semi-Supervised Learning with Out of Distribution Data, In International Conference on Data Mining, ICDM 2022
Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, In AAAI 2021
Krishnateja Killamsetty, Xujiang Zhou, Feng Chen, and Rishabh Iyer, RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning, In NeurIPS 2021
Data Summarization
I am interested in several applications of data summarization including video summarization, image collection summarization, document/text summarization and summarization of topic hierarchies. We study questions like what are natural models for summarization, how do we choose the right models for different problems/domains and how do we learn the right combinations of functions for various tasks. A lot of effort is also spent on interpretability of models, evaluation and loss functions for summarization, and at the core of it, understanding what makes a good summary for the problem at hand. We have also created new datasets for domain specific video summarization and image collection summarization. We recently released a dataset called VISIOCITY with large videos for video summarization and video understanding.
Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes, Learning mixtures of submodular functions for image collection summarization, In Advances in Neural Information Processing Systems (NIPS) 2014
Ramkrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan, Jeff Bilmes, Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures, In Association of Computational Linguists (ACL) 2015
Vishal Kaushal, Sandeep Subramanium, Suraj Kothiwade, Rishabh Iyer, and Ganesh Ramakrishnan, A Framework Towards Domain Specific Video Summarization, 7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan, Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance, 7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Himanshu Asnani, and Rishabh Iyer, A Unified Framework for Generic, Query-Focused, Privacy Preserving and Update Summarization using Submodular Information Measures, arXiv:2010.05631
V. Kaushal, S. Kothawade, R. Iyer and G. Ramakrishnan, Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework, ACMM Workshops 2020​ Link to the Dataset
Data Partitioning
We seek to intelligently partition data for large scale distributed training, so that we can achieve superior results compared to simple random partitioning and other baselines. We demonstrate that diversified partitioning via submodular functions can achieve significant improvements on several distributed deep learning and general machine learning tasks.
Kai Wei, Rishabh Iyer, Shenjie Wang, Wenruo Bai, Jeff Bilmes, Mixed robust/average submodular partitioning: Fast algorithms, guarantees, and applications, In Advances of Neural Information Processing Systems (NIPS) 2015
Kai Wei, Rishabh Iyer, Shenjie Wang, Wenruo Bai, Jeff Bilmes, How to intelligently distribute training data to multiple compute nodes: Distributed machine learning via submodular partitioning, Neural Information Processing Society (NIPS) Workshop, Montreal, Canada 2015
Feature Selection
Feature Selection is a very important pre-processing step for machine learning and data science applications, and is used to mostly reduce prediction time and memory, feature acquisition cost, and remove noisy and irrelevant features. We study a parameterized feature selection framework using submodular functions, and particularly using a family of mutual information based models. We show how this framework can be extended to cost-aware feature elicitation.
Rishabh Iyer, Jeff Bilmes, Algorithms for approximate minimization of the difference between submodular functions, with applications, Uncertainty in Artificial Intelligence (UAI) 2012
Srijita Das, Rishabh Iyer, Sriraam Natarajan , A Clustering based Selection Framework for Cost Aware and Test-time Feature Elicitation, In CODS-COMAD 2021 Research Track (Honorable Mention)
Srijita Das, Rishabh Iyer, Sriraam Natarajan, A Parameterized Information-theoretic Feature Selection Framework for Test-time Feature Elicitation, In Review 2021
Other Applications
In addition to the applications above, our group has also studied applications of combinatorial optimization (subset selection) and bi-level optimization in other applications like Fair Learning, Continuous Learning, and Social Networks.
Rishabh Tiwari, Krishnateja Killamsetty, Rishabh Iyer, and Pradeep Shenoy, GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning, In Computer Vision and Pattern Recognition, CVPR 2022
MS Ozdayi, M Kantarcioglu, R Iyer, Fair Machine Learning under Limited Demographically Labeled Data, ICLR Workshop on Socially Responsible Machine Learning
Ping Zhang, Rishabh K Iyer, Ashish V. Tendulkar, Gaurav Aggarwal, Abir De, Learning to Select Exogenous Events for Marked Temporal Point Process, In Neural Information Processing Systems, NeurIPS 2021
Theoretical Advances
To solve the motivating applications listed in Thread 1, below are some of the theoretical directions I'm pursuing.
UNIFIED ALGORITHMS AND THEORY OF SUBMODULAR OPTIMIZATION
Submodular Optimization is a rich and expressive class of non-linear discrete optimization problems which generalize important combinatorial functions like set cover, facility location, log-determinants, etc. A number of applications such as data subset selection, data summarization, data partitioning, and active learning naturally involving flavors of submodular optimization. In this thread, we develop fast and scalable algorithms for a number of problems which occur in practice. Examples include submodular minimization, submodular maximization, difference of submodular optimization, submodular optimization subject to submodular constraints and ratio of submodular optimization. This framework of algorithms achieved (near) optimal approximation guarantees, while being easy to implement and scaling to massive datasets. Empirically, we demonstrated orders of magnitude speedups and our algorithms have been used in several real world applications. Our algorithms have been for several real world problems including cooperative cuts for image segmentation and cooperative matching, diffusion aware optimization, path planning, mobile crowd-sensing, trajectory optimization for aerial 3D scanning, sensor placement under cooperative costs, limited vocabulary speech data selection etc. Some relevant publications are:
Rishabh Iyer, Jeff Bilmes, Algorithms for approximate minimization of the difference between submodular functions, with applications, Uncertainty in Artificial Intelligence (UAI) 2012
Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes, Fast semidifferential-based submodular function optimization, International Conference on Machine Learning (ICML) 2013 (Winner of the Best Paper Award)
Rishabh Iyer and Jeff Bilmes, Submodular optimization with submodular cover and submodular knapsack constraints, In Advances Neural Information Processing Systems 2013 (Winner of the Outstanding Paper Award)
Rishabh K Iyer, Stefanie Jegelka, Jeff A Bilmes, Curvature and optimal algorithms for learning and minimizing submodular functions, In Advances of Neural Information Processing Systems 2013
Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes, Monotone Closure of Relaxed Constraints in Submodular Optimization: Connections Between Minimization and Maximization, Uncertainty in Artificial Intelligence (UAI) 2014
Kai Wei, Rishabh K. Iyer, Jeff A. Bilmes, Fast multi-stage submodular maximization, International Conference on Machine Learning, ICML 2014
Wenruo Bai, Rishabh Iyer, Kai Wei, Jeff Bilmes, Algorithms for optimizing the ratio of submodular functions, In Proc. International Conference on Machine Learning( ICML) 2016
Rishabh Iyer and Jeff Bilmes, Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs, To Appear in Artificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan
Rishabh Iyer and Jeff Bilmes, A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems, To Appear in Artificial Intelligence and Statistics (AISTATS) 2019
LEARNING WITH SUBMODULAR FUNCTIONS
While submodular optimization occurs in inference, a critical component of fitting submodular functions to machine learning applications is learning the right submodular functions. In this thread, we study a rich class of models associated with submodular functions and the associated learning problems.
Vishal Kaushal, Sandeep Subramanium, Suraj Kothiwade, Rishabh Iyer, and Ganesh Ramakrishnan, A Framework Towards Domain Specific Video Summarization, 7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan, Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance, 7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
Rishabh Iyer and Jeff Bilmes, Submodular point processes with applications to machine learning, Proc. Artificial Intelligence and Statistics (AISTATS) 2015
Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes, Learning mixtures of submodular functions for image collection summarization, In Advances in Neural Information Processing Systems (NIPS) 2014
Suraj Kothawade, Jiten Girdhar, Chandrashekar Lavania, and Rishabh Iyer, Deep Submodular Networks for Extractive Data Summarization, arXiv:2010.08593
SUBMODULAR INFORMATION FUNCTIONS
This thread studies the intersection between submodular/combinatorial optimization and information theory via the study of submodular information measures. We study properties, modeling and representational power, instantiations, and applications of such measures. Examples of this include submodular mutual information, submodular distance metrics, divergences, and multi-set submodular information measures. We have successfully applied the submodular information functions to a number of application domains including active learning with rare classes and out-of-distribution data, targeted learning, personalized speech recognition, and semi-supervised meta-learning.
Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani, Submodular Combinatorial Information Measures with Applications in Machine Learning, The 32nd International Conference on Algorithmic Learning Theory, ALT 2021.
Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, Himanshu Asnani, Generalized Submodular Information Measures: Theoretical Properties, Examples, Optimization, Algorithms, and Applications, In IEEE Transactions of Information Theory, February 2022
Jennifer A Gillenwater, Rishabh K Iyer, Bethany Lusch, Rahul Kidambi, Jeff A Bilmes, Submodular hamming metrics, In Advances in Neural Information Processing Systems 2015
Himanshu Asnani, Jeff Bilmes, and Rishabh Iyer, Independence Properties of Generalized Submodular Information Measures, 2021 IEEE International Symposium on Information Theory, ISIT 2021
Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Rishabh Iyer, PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization, In AAAI 2022
Suraj Kothawade; Nathan Beck; Krishnateja Killamsetty; Rishabh Iyer, SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios, In NeurIPS 2021
Suraj Kothawade, Saikat Ghosh, Sumit Shekhar, Yu Xiang, Rishabh Iyer, TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information, arXiv:2112.00166
Mayank Kothyari, Anmol Reddy Mekala, Rishabh Iyer, Ganesh Ramakrishnan, and Preethi Jyothi, Personalizing ASR with Limited Data Using Targeted Subset Selection, arXiv:2110.04908
DISCRETE AND CONTINUOUS BILEVEL OPTIMIZATION
A growing number of applications for efficient and robust learning involve bi-level optimization. In this thread, we study approaches for solving such bilevel optimization problems in an efficient manner, particularly for deep learning models. I'm particularly interested in bi-level optimization problems that have a discrete component in them (i.e. a mixed discrete/continuous bi-level optimization problem).
Krishnateja Killamsetty, Changbin Li, Chen Zhou, Feng Chen, Rishabh Iyer, A Nested Bi-level Optimization Framework for Robust Few Shot Learning, To Appear In 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (15% Acceptance Rate)
Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen, Robust Semi-Supervised Learning with Out of Distribution Data, arXiv:2010.03658
Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, 35th AAAI Conference on Artificial Intelligence (AAAI) 2021
Ayush Maheshwari, Krishnateja Killamsetty, Ganesh Ramakrishnan, Rishabh Iyer, Marina Danilevsky, Lucian Popa, Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming, Findings of ACL, 2022 (Long Paper)