Combinatorial Approaches for Visual

Data Summarization

Rishabh Iyer

Monday, January 7th 2PM - 5PM

Growing Visual Data

Exabytes of visual data is created everyday

Visual Data Summarization

How do you organize and summarize this big data?


This tutorial will address several aspects of Visual Data Summarization including Image Collection Summarization, Video Summarization, Entity/Object Summarization in Videos/Images, Data Subset Selection, and Diversified Active Learning. We shall study a combinatorial framework (specifically via a class of discrete optimization functions called submodular functions) for the above visual data summarization problems, and motivate various summarization models and discuss how they models different aspects of summarization including diversity, coverage, representation and importance. Moreover, we shall show how we can learn combinatorial models from data. Throughout this tutorial, we shall show how summarization models defined this way, not only work well in practice and scale well to massive scale problems, but the resulting models are also interpretable and intuitive.

Motivation of this Tutorial

Visual Data in the form of images, videos and live streams have been growing at an unprecedented rate in the last few years. While this massive data is a blessing to data science by helping improve predictive accuracy, it is also a curse since humans are unable to consume this large amount of data. Moreover, today, machine generated videos (via Drones, Dash-cams, Body-cams, Security cameras, Go-pro etc.) are being generated at a rate higher than what we as humans can process. Moreover, majority of this data is plagued with redundancy. Given this data explosion, machine learning techniques which automatically understand, organize and categorize this data are of utmost importance. Visual Data summarization attempts to solve this problem in two ways.

  1. Create a highlight of the most critical and important events in the video (or an image collection), giving the viewer a quick glimpse of the entire video (or photo album) in a short amount of time. It is not uncommon today for us to click thousands of photographs while on a vacation, and can we automatically find the highlights of the trip to send friends? Similarly security officers have to go through several hours of footage to find important events: Can this be done automatically to save human time?
  2. Create data summaries for training visual classification and detection models. Datasets today are growing, thereby creating the need for expensive and large GPU clusters, and larger experimental turn around times. Moreover, labeling these large datasets is getting more and more expensive and time consuming. Visual data subset selection attempts to extract the most critical aspects of the data to reduce both training time and labeling efforts.

Topics Covered in this Tutorial

  1. Combinatorial Summarization Models for Summarization
    • Submodular Functions and Set Functions for Summarization
    • Modeling power of summarization functions: Modeling diversity, representation, coverage and importance and the trade-off between them. We shall demonstrate this with several examples of videos and image collections
    • Examples of Combinatorial Summarization Models (DPPs, Disparity Functions, Coverage, Facility Location, Graph Cut, Spectral Functions etc.)
  2. Combinatorial Optimization Algorithms for Summarization
    • Optimization Algorithms for different function classes (monotone submodular, non-monotone submodular, approximately submodular, dispersion functions and combinations of these)
    • Optimization Algorithms in different settings (batch, streaming, distributed)
    • Several Practical implementation tricks like lazy greedy implementations and memoization approaches
  3. Submodular Functions as Models for Data Subset Selection and Data Partitioning
    • Submodularity and Data Subset Selection: Theoretical connections
    • Data subset selection for quick training, faster hyper-parameter tuning, reducing labeling costs in datasets and active learning.
    • Extensions to Data Partitioning and Data Subset Selection from multiple distributions
  4. Learning Summarization Functions for Summarization
    • Max-Margin Framework for learning
    • Probabilistic Models
    • Other Learning approaches
  5. Video and Image Summarization
    • Examples of Summarization Models used in literature
    • Learning Summarization Models for Image Collection Summarization and Video Summarization
    • Can we make sense of the learnt mixtures? Does this match with our domain knowledge?
    • Video Summarization in an online/streaming setting
    • Extensions to Query Focused and Entity Video Summarization
  6. Discussions and Future Directions

Related Papers

  1. Gygli, Michael, Helmut Grabner, and Luc Van Gool. Video summarization by learning submodular mixtures of objectives. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  2. Elhamifar, Ehsan, and M. Clara De Paolis Kaluza. Online Summarization via Submodular and Convex Optimization. CVPR. 2017.
  3. Mirzasoleiman, Baharan, Stefanie Jegelka, and Andreas Krause. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. arXiv preprint arXiv:1706.03583 (2017).
  4. Xu, Jia, et al. Gaze-enabled egocentric video summarization via constrained submodular maximization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015
  5. Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Sandeep Subramanium, and Ganesh Ramakrishnan, A Framework Towards Domain Specific Video Summarization, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.
  6. Vishal Kaushal, Rishabh Iyer, Rohan Mahadev, Suraj Kothiwade, Khoshrav Doctor, Narsimha Raju, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.
  7. Vishal Kaushal, Rishabh Iyer, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan, Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.
  8. Kai Wei, Rishabh Iyer, Shenjie Wang, Wenruo Bai, Jeff Bilmes, Mixed robust/average submodular partitioning: Fast algorithms, guarantees, and applications, In Advances of Neural Information Processing Systems (NIPS) 2015
  9. Rishabh Iyer and Jeff Bilmes, Submodular point processes with applications to machine learning, Artificial Intelligence and Statistics (AISTATS) 2015
  10. Djolonga, Josip, and Andreas Krause. From MAP to marginals: Variational inference in bayesian submodular models. Advances in Neural Information Processing Systems. 2014.
  11. Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015
  12. Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes, Learning mixtures of submodular functions for image collection summarization, In Advances in Neural Information Processing Systems (NIPS) 2014
  13. Kai Wei, Rishabh K. Iyer, Jeff A. Bilmes, Fast multi-stage submodular maximization, International Conference on Machine Learning, ICML 2014
  14. Gong, Boqing, et al. Diverse sequential subset selection for supervised video summarization. Advances in Neural Information Processing Systems. 2014
  15. Sharghi, Aidean, Boqing Gong, and Mubarak Shah. "Query-focused extractive video summarization. European Conference on Computer Vision. Springer, Cham, 2016.
  16. Rishabh Iyer and Jeff Bilmes, A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems, Artificial Intelligence and Statistics (AISTATS) 2019
  17. Hoi, Steven CH, Jin, Rong, Zhu, Jianke, and Lyu, Michael R. Batch mode active learning and its application to medical image classification. In ICML, 2006.
  18. Ian Simon, Noah Snavely, and Steven M. Seitz. Scene Summarization for Online Image Collections. In ICCV, 2007.
  19. Zhang, Ke, et al. Summary transfer: Exemplar-based subset selection for video summarization. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  20. Zhang, Ke, et al. Video summarization with long short-term memory. European conference on computer vision. Springer, Cham, 2016.

Bio of the Speakers

Rishabh Iyer is currently a Research Scientist at Microsoft, where he works on several problems around computer vision, discrete optimization, online learning, contextual bandits, reinforcement learning etc. He finished his Postdoc and Ph.D from the University of Washington, Seattle, where he worked with Prof. Jeff Bilmes. His work has received best paper awards at the International Conference of Machine learning and the Neural Information Processing Systems. He also won the Microsoft Ph.D. fellowship and Facebook Ph.D. Fellowship, along with the Yang Outstanding Doctoral Student Award from University of Washington. He completed his B.Tech at the Department of Electrical Engineering at IIT Bombay in 2011, and has been a visitor at Microsoft Research, Redmond and Simon Fraser University. He has worked on several aspects of Machine Learning including discrete and convex optimization, deep learning, video/image summarization, data subset selection, active learning, online learning etc. He has applied his work in several domains including search advertisement, computer vision, text classification and speech. He has given invited talks/tutorials at numerous conferences and workshops including the AMS Sectional Meeting, International Symposium on Mathematical Programming (ISMP), Non-convex Optimization for Machine Learning (NOML) Summer School and several renowned research and academic institutions world-wide.