Current Projects

For many machine learning tasks, the input data lieson a low-dimensional manifold embedded in a high dimensional space. Most algorithms are inefficient because of this high-dimensional structure. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. For this reason, Gilbert and Jain [11] and Fan, et al. [8] introduce the closely related problems of sparse metric repair and metric violation distance. The goal of each problem is to repair as few distances as possible to ensure that the distances between the data points satisfy a metric. 

My interest in this problem is two fold. First to develop the theory as can be seen here. Second to develop data science tools using metric repair as can be seen here, ( Julia Package for MR-Missing)

More details can be found at the Metric Repair page

Filling in Missing Data in a Sequential Model

Lots of data sets, such as text and speech has an inherent sequential structure. Hence, I am interested in given a text with missing entries developing algorithms that can fill in the missing entries. This is useful in general settings where the data is has an inherent sequential structure. In particular this is motivated from finding old manuscripts and text that are partially readable and we want to be able to recreate the text. 

Non Linear Independent Component Analysis

In many cases we are given a linear mixture of multiple independent sources, such as the sound of people talking in different parts of a room, and we want to separate the sources, i.e. separate each persons voice. To do this Independent Component Analysis has worked really well. The only short fall of this is that the mixture has to be linear. Suppose instead we had non linear mixing. Is it still possible to separate the independent components?

Dropout

Dropout is a regularization technique that is widely used in Neural Networks and other graphical inference models to prevent overfiiting. I am interested in providing theoretical results about dropout.