Testing Lipschitz property over hyper-cube domain under non-uniform distribution and its applications to privacy, 2012: We design algorithms to test privacy guarantees of a given algorithm A when run on a dataset x containing potentially sensitive information about the individuals. More formally, we design a computationally efficient algorithm T_priv that verifies whether A satisfies differential privacy on typical datasets (DPTD) guarantee in time sublinear in the size of the domain of the datasets. DPTD, a similar notion to generalized differential privacy first proposed by [Bhaskar et al., 2013], is a distributional relaxation of the popular notion of differential privacy.

To design algorithm T_priv, we show a formal connection between the testing of privacy guarantee for an algorithm and the testing of the Lipschitz property of a related function. More specifically, we show that an efficient algorithm for testing of Lipschitz property can be used as a subroutine in T_priv that tests if an algorithm satisfies differential privacy on typical datasets.

Apart from formalizing the connection between the testing of privacy guarantee and testing of the Lipschitz property, we generalize the work of [Jha and Raskhodnikova, 2011] to the setting of property testing under product distribution. More precisely, we design an efficient Lipschitz tester for the case where the domain points are drawn from hypercube according to some fixed but unknown product distribution instead of the uniform distribution.

It is a joint work with Kashyap Dixit [Pennsylvania State University] , Madhav Jha [Sandia National Laboratory] and Sofya Raskhodnikova [Pennsylvania State University].

Privacy and stability of online advertisement, 2012-present: The idea in this project is to come up with a systematic approach towards analyzing the stability of online advertisement systems like Bing Ads or Google Ads. More over, use the stability guarantees to design differentially private algorithms. One key ingredient in our approach is that often our private algorithms are oblivious to the exact workings of the underlying advertisement system, and they come as a ``wrapper'' over the underlying system.

. It is a joint work with Misha Bilenko [MSR Redmond], Cynthia Dwork [MSR Silicon Valley], Muthu Muthukrishnan [MSR India], Guy Rothblum [MSR Silicon Valley] and Helen Wang [MSR Redmond].

Differentially private sparse high-dimensional regression with optimal sample complexity, 2012-2013: We design differentially private algorithms for statistical model selection. Given a data set and a large, discrete collection of “models”, each of which is a family of probability distributions, the goal is to determine the model that best “ﬁts” the data. This is a basic problem in many areas of statistics and machine learning. We consider settings in which there is a well-deﬁned answer, in the following sense: Suppose that there is a nonprivate model selection procedure f which is the reference to which we compare our performance. Our differentially private algorithms output the correct value f(D) whenever f is stable on the input data set D. We work with two notions, perturbation stability and sub-sampling stability.

We give two classes of results: generic ones, that apply to any function with discrete output set; and speciﬁc algorithms for the problem of sparse linear regression. The algorithms we describe are efﬁcient and in some cases match the optimal non-private asymptotic sample complexity.

Our algorithms for sparse linear regression require analyzing the stability properties of the popular LASSO estimator. We give sufﬁcient conditions for the LASSO estimator to be robust to small changes in the data set, and show that these conditions hold with high probability under essentially the same stochastic assumptions that are used in the literature to analyze convergence of the LASSO.

It is a joint work with Adam Smith [Pennsylvania State University].

Differentially private kernel learning, 2012:In this paper, we consider the problem of differentially private learning where access to the training features is through a kernel function only. As mentioned in Chaudhuri et al. (2011), the problem seems to be intractable for general kernel functions in the standard learning model of releasing di.fferentially private predictor. We study this problem in three simpler but practical settings. We .first study an interactive model where the user sends its test points to a trusted learner (like search engines) and expects accurate but di.fferentially private predictions. In the second model, the learner has access to a subset of the unlabeled test set using which it releases a predictor, which preserves privacy of the training data. (NIH, 2003) is an example of such publicly available test set. Our third model is similar to the traditional model, where learner is oblivious to the test set but the kernels are restricted to functions over vector spaces. For each of the models, we derive di.fferentially private learning algorithms with provable utility or error bounds. Moreover, we show that our methods can also be applied to the traditional model where they demonstrate better dimensionality dependence when compared to the methods of (Rubinstein et al., 2009; Chaudhuri et al., 2011). Finally, we provide experimental validation of our methods.

It is a joint work with Prateek Jain [MSR, India].

Unified learning theoretic approach towards interactive database privacy, 2011-2012: This project is about unifying previous online learning based differentially private interactive query response mechanisms like multiplicative weights update and Frieze/Kannan update. We show that all these mechanisms are special instances of a general online learning framework called \emph{mirror descent algorithm}. We also show that our unified mechanism work for a larger class of datasets and queries than previous work.

It is a joint work with Prateek Jain [MSR, India].

Differentially private online learning, 2011-2013:We provide a general technique for making online learning algorithms differentially private, in both the full information and bandit settings. Our technique applies to algorithms that aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. We modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first non-private algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work. In many cases, our algorithms (in both settings) matching the dependence on the input length, T, of the optimal non-private regret bounds up to logarithmic factors in T. Our algorithms require logarithmic space and update time.

It is a joint work with Adam Smith [Pennsylvania State University]. This is a follow up work on an earlier paper with Prateek Jain [MSR India] and Pravesh Kothari [University of Texas at Austin] from COLT 2012.

Differential privacy for cloud system, 2011: It is often highly valuable for organizations to have their data analyzed by external agents. However, any program that computes on potentially sensitive data risks leaking information through its output. Di.erential privacy provides a theoretical framework for processing data while protecting the privacy of individual records in a dataset. Unfortunately, it has seen limited adoption because of the loss in output accuracy, the difficulty in making programs di.erentially private, lack of mechanisms to describe the privacy budget in a programmer's utilitarian terms, and the challenging requirement that data owners and data analysts manually distribute the limited privacy budget between queries.

This paper presents the design and evaluation of a new system, GUPT, that overcomes these challenges. Unlike existing diff.erentially private systems such as PINQ and Airavat, it guarantees di.fferential privacy to programs not developed with privacy in mind, makes no trust assumptions about the analysis program, and is secure to all known classes of side-channel attacks.

GUPT uses a new model of data sensitivity that degrades privacy of data over time. This enables effi.cient allocation of diff.erent levels of privacy for diff.erent user applications while guaranteeing an overall constant level of privacy and maximizing the utility of each application. GUPT also introduces techniques that improve the accuracy of output while achieving the same level of privacy. These approaches enable GUPT to easily execute a wide variety of data analysis programs while providing both utility and privacy.

It is a joint work with Prashanth Mohan [University of California, Berkeley], Elaine Shi [Researcher, University of California, Berkeley],Dawn Song [Associate Professor, University of California, Berkeley], Elaine Shi [Researcher, University of California, Berkeley] and David Culler [University of California, Berkeley].

Differentially private convex optimization and high-dimensional regression, 2010-2012: We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end:

(a) We significantly extend the analysis of the ``objective perturbation'' algorithm of Chaudhuri et al., 2011 for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method.

A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a \emph{weak} sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n.

(b) We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s,log p) when there exists a good regression vector with s nonzero coefficients.

Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate -- a combination which is impossible for deterministic algorithms.

It is a joint work with Adam Smith [Pennsylvania State University], Daniel Kifer [Pennsylvania State University].

Noiseless Database Privacy, 2010-2011: Differential Privacy (DP) has emerged as a formal, flexible framework for privacy protection, with a guarantee that is agnostic to auxiliary information and that admits simple rules for composition. Bene.ts notwithstanding, a major drawback of DP is that it provides noisy responses to queries, making it unsuitable for many applications.We propose a new notion called Noiseless Privacy that provides exact answers to queries, without adding any noise whatsoever. While the form of our guarantee is similar to DP, where the privacy comes from is very different, based on statistical assumptions on the data and on restrictions to the auxiliary information available to the adversary. We present a set of results for Noiseless Privacy of arbitrary Boolean-function queries and of linear Real-function queries, when data are drawn independently, from nearly-uniform and Gaussian distributions respectively. We also derive simple rules for composition under models of dynamically changing data.

It is a joint work with Raghav Bhaskar, Vipul Goyal and Srivatsan Laxman from [MSR, India], and Abhishek Bhowmick [University of Texas, Austin].

Discovering Frequent Patterns in Sensitive Data, 2009-2010: Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g. patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there.

We present two efficient algorithms for discovering the K most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning ‘noisy’ lists of patterns that are close to the actual list of K most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-K pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately ‘robust’ measure of interest.

It is a joint work with Raghav Bhaskar [MSR, India], Srivatsan Laxman [MSR, India] and Adam Smith [Associate Professor, Pennsylvania State University].

File Manager for SQL Server Compact v5.0, 2007-2008: Designed and implemented the prototype File System Manager to cater the Storage Engine up to a Database of 4TB. Work completed in collaboration with Mr. S. Murali and Mr. Anil Nori. Work completed in Microsoft Inc.

English to Sanskrit Speech Translator, 2006-2007: Designed an on the fly speech translator from English to Sanskrit using HMM based speech to text Translator, Maximum Entropy Parser and Paninian grammar based English to Sanskrit language translating technique. Worked under Dr. Vijayan Immanuel [National Institute of Technology Karnataka, Surathkal] and Dr. P Ramanujan [CDAC-Bangalore].

GRASP Approach to NP Glass Cutting Problem 2005-2006: Designed an algorithm to improve the existing Greedy Randomized Search Procedure for NP Glass Cutting problem to increase the efficiency to 96\%. Worked under Dr. Vijayan Immanuel [National Institute of Technology Karnataka, Surathkal].