MLML Reading Group

Description:

The MLML (short for Multi-Level Machine Learning , or Madras Loving Machine Learning) reading group, is a regular meeting of ML enthusiasts at IIT Madras. In each meeting, a participant takes the stage and describes his/her work, other recent work or a classic technique, via presentations/demos. The subject matter of the topic presented can be anything within the broad area of Machine learning.

The key point is that the audience should walk away with something significant, regardless of their level of expertise in the topic. A beginning master's student should go home with a broad idea of what was discussed and situations where he/she might make use of it. A professor of machine learning should leave with a deep understanding of the work and connections to related work. The presentations are as much about learning/practicing to communicate to a general audience, as they are are about the particular machine learning concept.

Anyone at IIT Madras is welcome to the meeting. The meetings are of 1 hour duration including questions.

Timing and Venue:

The venue will be BSB 361, Turing hall, unless announced otherwise. The meetings are planned to happen every Monday from 4PM to 5PM.

Sign up to speak:

If you are interested in presenting in the group, please send an email to me (hariguru [at] cse [dot] iitm [dot] ac [dot] in). MS and PhD students should particularly use this opportunity to present and get feedback regarding their work and presentation.

Google Groups link:

https://groups.google.com/forum/#!forum/mlml-reading-group-iitm/join

Presentations:

  1. (20/08/2018): Complex tasks in Machine learning. Harish.
    • Abstract: Learning discrete valued functions from training data is the most popular machine learning task with practitioners. It includes the standard tasks of binary classification, multiclass classification, structured prediction and ranking. In most practical scenarios, the ultimate task falls into one of these bins, but the evaluation metric of choice has to be specialised in some way, and it is often done via cost sensitive classification, which is reasonably well studied. However, in several such scenarios, any "simple" or "decomposable" or "linear" evaluation metric is unacceptable due to situations such as unbalanced data and trivial solutions, paving the way for "complex" performance measures like the F-measure, or Harmonic-mean measure. In addition, certain government regulated use cases of machine learning, the learnt classifier has to satisfy certain constraints, e.g. it must obey fairness by accepting equal proportion of men and women to a college. We call such tasks with "complex" objectives as complex classification problems, and we will discuss a broad class of algorithms that has emerged to tackle problems of this kind.
    • Refs: http://proceedings.mlr.press/v84/narasimhan18a/narasimhan18a.pdf
    • Slides: Link.
    • Space/Time: Aug 20, 2018, 4 to 5PM, Turing hall, Dept. of CSE
  2. (27/08/2018): Attention mechanisms and multiple instance learning. Harish.
    • Abstract: Attention mechanisms are a popular tool used in deep learning to "focus" on a part of the input, e.g. sub-image in an image, sentence in a paragraph, word in a sentence. The underlying idea that most of the input is irrelevant for deciding the answer is not new and has striking similarities to a task in machine learning known as multi-instance learning. In this talk we will discuss the similarities between these two frameworks, and consider experiments that can potentially answer questions like the following. 1. Under what conditions can "an attention mechanism" give much better answers than a naive method which looks at the entire input for giving answers? 2. What is the reason for the success of such mechanisms? 3. Under what conditions can attention mechanisms be misled easily? 4. Are there better algorithms for attention mechanisms?
    • Refs: http://proceedings.mlr.press/v80/ilse18a/ilse18a.pdf , https://arxiv.org/pdf/1706.00687.pdf , http://www.jmlr.org/papers/volume13/sabato12a/sabato12a.pdf
    • Slides: Link
    • Space/Time: Aug 27, 2018, 4:30 to 5:30PM, RBC DSAI Seminar hall.
  3. (03/09/2018): Concentration bounds for empirical conditional value-at-risk: The unbounded case. Prashanth.
    • Abstract: In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbounded random variable, which is either sub-Gaussian or sub-exponential. We derive a novel one-sided concentration bound for a natural sample-based CVaR estimator in this setting. Our bound relies on a concentration result for a quantile-based estimator for Value-at-Risk (VaR), which may be of independent interest.
    • Refs: https://arxiv.org/abs/1808.01739
    • Slides:
    • Space/Time: Sep 3, 2018, 3:30 to 4:30. ALC (MR1), Dept. of CSE.

4. (10/09/2018): Graph Convolutional architectures for node classification .

    • Speaker: Priyesh.
    • Space/Time: Sep 10, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: In many real-life applications, entities in an environment are not independent but rather influenced by each other through their interactions. Such relational datasets are popularly modeled as graphs where the entities make up the node, and the edges represent an interaction. Node classification task in attributed graphs, i.e., graphs with node features is called Collective Classification (CC). Collective Classification involves learning to classify unlabeled nodes given a partially labeled graph in which label predictions are made by jointly modeling the node and its' neighborhood features. It is often the case that a node is not only influenced by its immediate neighbors but also by it’s higher order neighbors, multiple hops away. Current state-of-the-art neural network architectures for CC are end-to-end differentiable variations of recursively defined graph kernels that aggregate and filter multi-hop neighborhood information. In this talk, I’ll present our works HOPF [1], A Higher Order Propagation Framework for collective classification and Fusion-GCN [2], Fusion Graph Convolutional networks. HOPF is a hybrid semi-supervised learning framework that couples generic differentiable graph kernels with an iterative inference procedure and Fusion-GCN is a simple extension to existing models that improve their representation capacity. Through HOPF, I’ll discuss the limitations of existing differentiable graph kernels to effectively capture information from multiple hops in light of scalability, Information Morphing and representation capacity. With our extensive experiments on multiple datasets from six different domains, we observed that existing models do not perform consistently across datasets. Whereas, the proposed models provide robust performances across datasets with state-of-the-art performance on many while being highly competitive on the rest.
    • Refs:
      • [1] Vijayan, P., Chandak, Y., Khapra, M. M., Parthasarathy, S & Ravindran, B. (2018). HOPF: Higher Order Propagation Framework for Deep Collective Classification. arXiv preprint arXiv:1805.12421.
      • [2] Vijayan, P., Chandak, Y., Khapra, M. M., Parthasarathy, S & Ravindran, B. (2018). Fusion Graph Convolutional Networks. arXiv preprint arXiv:1805.12528.

5. (01/10/2018): Introduction to Deep NLP through an application - Reading Comprehension with Multiple Choice Question Answering.

    • Speaker: Ananya Sai.
    • Space/Time: Oct 1, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: The task of Reading Comprehension with Multiple Choice Questions, requires a human (or machine) to read a given {passage, question} pair and select one of the n given options. For an AI agent to perform the task, we need to effectively encode the information in the passage, form representations of the options to compare them in the light of the question. Most models for this task first compute a question-aware vectorial representation for the passage and then select the option which has the maximum similarity with this representation. However, when humans perform this task they do not just focus on option selection but use a combination of elimination and selection. Specifically, a human would first try to eliminate the most irrelevant option and then read the passage again in the light of this new information (and perhaps ignore portions corresponding to the eliminated option). This process could be repeated multiple times till the reader is finally ready to select the correct option. This talk discusses some basics of neural networks for NLP (Natural Language Processing) and Question Answering and then presents ElimiNet, a neural network-based model which tries to mimic the process of option elimination for MCQs. Specifically, Eliminet has gates which decide whether an option can be eliminated given the {passage, question} pair and if so it tries to make the passage representation orthogonal to this eliminated option (akin to ignoring portions of the passage corresponding to the eliminated option). The model makes multiple rounds of partial elimination to refine the passage representation and finally uses a selection module to pick the best option. Eliminet is evaluated on the recently released large scale RACE dataset and where it outperforms the state of the art model on 7 out of the 13 question types in the dataset. Further, an ensemble of this elimination-selection based method with a selection based method gives us an improvement of 3.1% over using just the selection based approach on this dataset.

6. (08/10/2018): Feature-based and end-to-end methods for source separation.

    • Speaker: Jilt Sebastian
    • Space/Time: Oct 8, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: Estimating the constituent components of a mixture signal is of great interest to the audio research community owing to its applications in the audio-based enhancement and recommendation systems. Source separation is the task of extracting relevant sources from a mixture of signals. In this talk, we will discuss deep learning techniques used in the source separation task. State-of-the-art techniques for source separation employ recurrent neural networks for predicting source-specific mask functions. These networks use magnitude-based features as the input. We propose to use modified group delay feature for musical source separation which has a higher discriminative ability compared to the magnitude spectrum. Use of this network for percussive separation stage in onset detection from musical mixtures will be discussed. Motivated by attempts to end-to-end source separation, a task-dependent signal-to-signal conversion framework for information extraction will then be presented with a focus on spike estimation from neuronal signals.

7. (15/10/2018): Zero-Shot/Few-Shot Learning and Learning with limited supervision.

    • Speaker: Ashish Mishra
    • Space/Time: Oct 15, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model’s ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.

7. (29/10/2018): A Brief Overview on Speech Recognition System.

    • Speaker: Sai Prabhakar
    • Space/Time: Oct 29, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: Speech recognition is a vast research field with a long research history. Commercial speech recognition systems have already started to impact our daily life with large potential. Along with Natural language processing, Speech recognition is a field where end-to-end models are yet to outperform non-end-to-end hybrid approaches. In this talk, we take a quick look at speech recognition systems from the classical GMM-HMM to more recent Attention-based approaches for Automatic speech recognition. We compare different approaches along with their performance in the standard Switchboard speech corpus. This talk is meant to provide non-experts in the field a general understanding of speech recognition approaches, performance, and challenges.

8. (12/11/2018): Boosting, Margins and Generalisation.

    • Speaker: Harish
    • Space/Time: Nov 12, 2018, 3:15 to 4:15PM, Turing hall, Dept. of CSE
    • Abstract:

9. (19/11/2018): Preference Aggregation with Battling Bandits

    • Speaker: Aadirupa Saha. IISc.
    • Space/Time: Nov 19, 2018, 3:30 to 4:30PM, Turing hall, Dept. of CSE
    • Abstract: The problem of Multi-Armed Bandit (MAB) is a classical example of online learning under uncertainty in partial information settings, where the objective is to maximize the total profit over time by sequentially playing one bandit arm (decision) at a time. However MAB requires to reveal the absolute reward of the chosen arm at each play, whereas in reality a relative perception -- "Item a is better than item b" -- is usually easier to elicit than its absolute counterpart -- "The value of item a is 4, and b is 7", etc. This led to the famous Dueling Bandit problem for learning from pairwise preferences. But motivated from various real world problems we note that, it is even more easier and budget friendly to collect information from subsetwise preferences, that also offers flexibility to collect different types of feedback: winner item, top-m, full ordering etc. This inspires us to formulate "Battling Bandits (BB)" --- a generalization of Dueling Bandits where given a set of n alternatives, now the goal is to 'learn' by sequentially choosing small k-sized subsets at each round (k < n), followed by a setwise-preference information. Clearly, the main challenge of BB lies in handling the combinatorially large decision space of O(n^k) choices. We will see how to handle this problem for certain class of parametrized feedback models, and seek answers for some of the interesting questions: Does the luxury of playing a general k-sized sets really help in faster information aggregation? What are the fundamental limit of performances for different types of feedback preferences? To the best of our knowledge we are the first to propose such generalization for Dueling Bandits to subsetwise preferences in online setup. This is based on a joint work with Prof. Aditya Gopalan (ECE, IISc). An initial version appearded in UAI, 2018.


10. (3/12/2018): Necessity of Causal Explainability in Artificial Intelligence

    • Speaker: Pavan Ravishankar
    • Space/Time: Dec 3, 2018, 3:30 to 4:30PM. Turing hall, Dept. of CSE
    • Abstract: Due to permeation of AI into human lives, it is important to incorporate transparency into decision making. Overtime, AI community will be answerable to successes and failures of AI machine. Causality as a concept becomes imperative in such scenarios. If machine cannot explain why it did what it did/what it saw in terms of human like concepts of causal and analogical it becomes difficult to provide a fair trial when AI engages itself in a conflicting scenario. Self-driving car killing a human will always be looked as a mistake of AI unless it explains that it had a break failure. Or if it had not killed human, it would have crashed into a moving car. Although bringing such causal explanations might take time, we need to address this problem of causal explainability from now. Talk will address questions like Why is causality important? What impact will it have in future? Importance of hierarchies in representations and decision making? How is it different from current state of artificial intelligence as seen from Pearl's formulation of causality? Ethical issues? etc.

11. (24/12/2018): On Stein's Identity and Gradient and Hessian-Free Stochastic Non-Convex Optimization.

    • Speaker: Dr. Krishnakumar, UC Davis
    • Space/Time: Dec 24, 2018, 3:30 to 4:30PM. Turing hall, Dept. of CSE
    • Abstract: Gaussian smoothing based techniques for gradient-free stochastic optimization are common in the optimization literature. It will be shown that such techniques are essentially instantiations of Stein's identity, popular in statistics literature. Based on this relationship, the following three results will be discussed. First, for constrained non-convex optimization problems, I will introduce a gradient-free conditional gradient algorithm that achieves rates similar to the gradient-free stochastic gradient descent algorithm for the unconstrained setting. Next, under a structural sparsity assumption on the optimization problem, I will illustrate an implicit regularization phenomenon where the standard gradient-free stochastic gradient algorithm adapts to the sparsity of the problem at hand by just varying the step-size. Next, I will discuss a truncated gradient-free stochastic gradient algorithm, whose rate of convergence depends only poly-logarithmically on the dimensionality. Finally, leveraging the second-order Stein's identity, I will introduce a Hessian (and gradient)-free cubic regularized Newton method with zeroth-order information and show that it escapes saddle-points and converges to second-order stationary points at rates comparable to standard cubic-regularization method with full Hessian (and gradient) information. Joint work with Saeed Ghadimi.

12. (8/4/2019): Thompson Sampling in Multi-Armed Bandits

    • Space/Time: Monday April 8, 4:00PM, BSB 361, Turing Hall CSE department.
    • Speaker: Dr. Abhishek Sinha
    • Abstract: In this talk, we will survey the technique of Thompson Sampling (TS) in the context of Multi-armed Bandits. After giving a few examples of Bandits, we will be discussing various statistical and computational techniques for implementing approximate TS, where the exact TS cannot be carried out efficiently. Next, we will survey some recent theoretical advances on the performance of TS, briefly outlining some of the proof techniques. We will conclude this talk by showing some experimental results on the application of TS to real data sets.


13. (15/4/2019): Duelling Bandits - A tour

    • Space/Time: Monday April 15, 4:00PM, MR1, ALC Hall, CSE department
    • Speaker: Dr. Arun Rajkumar
    • Abstract: In this talk, we will introduce the problem of Duelling Bandits, an extension of the traditional multi armed bandits problem to the scenario where only pairwise preferences can be elicited. We will define the notion of regret in this setup and study several popular algorithms that have been proposed in literature over the last decade. These algorithms, as we will see, work well under increasingly general assumptions and enjoy good theoretical regret guarantees. A brief introduction to the classical UCB style algorithms will also be given to set the context.