Research

Choosing sample with the lowest loss makes SGD robust [ArXiv]

Joint work with Xiaoxia Wu and Sujay Sanghavi

To Appear in AISTATS 2020

Abstract: The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to k= 1, i.e. no choice; k>=2 represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with linear regression and small-scale neural network experiments

Negative sampling in semi-supervised learning [ArXiv]

Joint work with John Chen and Anastasios Kyrillidis

Abstract: We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast,easy to tune algorithm for semi-supervised learning (SSL). NS3L is motivated by the success of negative sampling/contrastive estimation. We demonstrate that adding the NS3L loss to state-of-the-art SSL algorithms, such as the Virtual Adversarial Training (VAT), significantly improves upon vanilla VAT and its variant, VAT with Entropy Minimization. By adding the NS3L loss to MixMatch, the current state-of-the-art approach on semi-supervised tasks, we observe significant improvements over vanilla MixMatch. We conduct extensive experiments on the CIFAR10, CIFAR100, SVHN and STL-10 benchmark datasets.

On Generalization of Adaptive methods for Over-parameterized Setting [ArXiv]

Joint work with Soumya Basu, Anastasios Kyrillidis and Sujay Sanghavi

In NeurIPS Workshop on Integration of Deep Learning Theories, 2018.

Abstract: Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter optimization. Recently, researchers have studied both algorithmic classes via toy examples: e.g., for over-parameterized linear regression, Wilson et. al. (2017) shows that, while SGD always converges to the minimum-norm solution, adaptive methods show no such inclination, leading to worse generalization capabilities. Our aim is to study this conjecture further. We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods. In practical DNN settings, we observe that adaptive methods can outperform SGD, with larger weight norm output models, but without necessarily reducing the amount of tuning required

On Robust Learning of Ising Models [Link]

Joint work with Erik Lindgren, Yanyao Shen, Alex Dimakis, Adam Klivans

In NeurIPS Workshop on Relational Representation Learning, 2018.

Abstract: Ising Models are one of the most popular class of probability distributions withapplications in wide ranging fields such as physics, engineering and finance. Inthis paper, we attempt to learn the underlying graphical model robustly in presence of adversarial corruptions. In this work, we establish new lower and upper boundsfor robustly learning Ising models

Matrix Factorization with Side and Higher Order Information [ArXiv]

Joint work with Nikhil Rao, Weicong Ding (Intern at Technicolor Research Bay Area)

In 13th International KDD Workshop on Mining and Learning with Graphs, 2017.

Abstract: The problem of predicting unobserved entries of a partially observed matrix has found wide applicability in several areas, such as recommender systems, computational biology, and computer vision. Many scalable methods with rigorous theoretical guarantees have been developed for algorithms where the matrix is factored into low-rank components, and an embedding is learned for the row and column variables. While there has been recent research on incorporating explicit side information in the low-rank matrix factorization setting, often implicit information can be gleaned from the data, via higher order interactions among variables. In this paper, we design a method to make use of this implicit information, via random walks on graphs. We show that the problem we intend to solve can be cast as factoring a nonlinear transform of the observed matrix and develop an efficient coordinate descent based algorithm for the same. Experiments on several datasets show that the method we propose outperforms vanilla matrix factorization, and also those methods that use available side information.

Trading-off variance and complexity in stochastic gradient descent [ArXiv]

Joint work with Megasthenis Asteris, Anastasios Kyrillidis and Sujay Sanghavi

Abstract: Stochastic gradient descent (SGD) is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing a new update rule which requires infrequent passes over the entire input dataset to compute the full-gradient. In this work, we propose CheapSVRG, a stochastic variance-reduction optimization scheme. Our algorithm is similar to SVRG, but instead of the full gradient, it uses a surrogate which can be efficiently computed on a small subset of the input data. It achieves a linear convergence rate – up to some error level, depending on the nature of the optimization problem – and features a trade-off between the computational complexity and the convergence rate. Empirical evaluation shows that CheapSVRG performs at least competitively compared to the state of the art.

Trust Based Recommender Systems

Course Project, Joydeep Ghosh, UT Austin (Joint work with Divya Gadde, Sindhu Maiyya, Yamini Gotimukul)

We evaluated the performance of user based and item based collaborative filtering methods as well as three different trust based recommender systems - Pure Trust, Random TrustWalker, and Augmented Trust - over Epinions dataset. RMSE, Coverage and F Measure were utilized to compare the performance of these algorithms. Both Random Trustwalker and Augmented Trust methods make use of trust network as well as collaborative filtering to make recommendations, thereby improving the coverage tremendously for a marginal tradeoff in accuracy. For cold start users, we saw that the coverage almost doubles for trust based methods when compared to traditional CF-based methods. Finally, we also experimented with an algorithm that extends augmented trust to give top-N recommendations.

Additional Resources:

GitHub Repository (Python implementation)

Project Report

Clustering Sand Dunes Using Radon Transform

Project, Advisor: Prof. B.K. Mohan, IIT Bombay

Statistical features of the Radon transform orientation such as mode, entropy and standard deviation were obtained for each pixel using a window based processing technique. The sand dunes were then clustered based on these features using a thresholding process for each dune orientation. Developed an algorithm to perform detect temporal changes occuring in the boundaries of clustered regions. This algorithm was tested on desert regions in India, China, Mexico and Libya.

Publications:

Change Detection of Desert Sand Dunes: A Remote Sensing Approach

Unchanging Desert Sand Dunes

Segmentation of desert sand dunes

Energy Efficient Scheduling Using Green Energy Sources

Dual Degree Dissertation, Advisors: Prof. Abhay Karandikar, Prof. Prasanna Chaporkar, IIT Bombay

The first problem in this thesis caters to finding the optimal power allocation policy in presence of these alternate energy sources. These renewable energy sources pose a problem of their own due to their unpredictable nature and thus the existing policies may have to be modified to suit them. The Water-filling technique was modified for achieving maximum capacity for point-to-point links powered by a green energy source.

The second problem focuses on determining an index policy for the existing scheduling problems. The close resemblance of the scheduling problem with the Restless Bandit Problem acts as a motivating factor to pursue this technique. This thesis considers two such problems and tries to determine if they follow certain indexability criteria which in turn is necessary to show the existence of an index policy.

Additional Resources:

Dual Degree Thesis

Minimizing System Backlog in Wireless Networks using Bandit Optimization

Internship, Advisor: Prof. Alexandre Proutiere, KTH Sweden

Modeled the downlink scheduling system in wireless networks as a Restless Multi-Armed Bandit Problem. Showed the existence of Whittles index criteria for the problem of minimizing system backlog with delayed queue state information for a two channel state, two action model. Obtained a closed form expression for Whittles index which revealed a threshold-like structure.