I am currently a PhD student in the Division of Applied Mathematics at Brown University. My research interests are broadly in the area of statistics and machine learning. Here are some of my recent projects:

  Insight Data Science Project: Predicting 311 Complaints

My most recent project is, a tool I developed to identify and predict high-priority complaints your NYC neighborhood. I worked with the 311 complaint dataset, which contains over 7 million rows of non-emergency complaint logs from 2010 onwards. The data has interesting spacio-temporal information that can be used to gain insights into the different problems each area faces.

I modeled the periodic trends in the complaint volume time series and compared the expected and actual volumes to identify the unusually high complaint types. I used MySQL to store and access the data, and performed the analysis in Python using the NumPy, SciPy, matplotlib and scikit-learn libraries. I also used flask and jQuery to build an interactive front-end hosted on Amazon web services.

For more details, check it out at

Slides - [pdf]

  PhD Research: Conditional Inference and non-parametric approaches for network analysis of neural spiking data

My PhD research focuses on analyzing neuron spike train data, modeled as a multivariate binary time series, to infer an underlying anatomical network using conditional inference techniques.

Inference about the fast temporal correlation structure of neurons (such as zero-lag synchrony and precise lag-lead relationships) is complicated by their many unobserved and correlated inputs. Classical approaches to this problem suffer from model misspecification and incidental parameter problems (meaning the number of nuisance parameters grows with data, leading to inconsistent estimates).

Conditional inference focuses on the parameters of interests while being robust to various non-stationary background effects that influence neuron firing rates. So far we have looked at the exact conditional approach for log-linear (maximum entropy) models. We are also exploring pseudo-likelihood and saddle-point approximations that would scale better to a larger network and to more complex models such as generalized linear models and hidden Markov models.

We also implemented non-parametric Bayesian approach using Dirichlet process priors on the incidental parameters to tackle the inconsistency issues. The conditional inference approach is typically faster than the non-parametric sampling techniques and more suitable to  problems where you do not care about inferring the incidental parameters.

  Geographic Information from Gene data

This project, a collaborative effort with Sohini Ramachandran from the Ecology and Evolutionary Biology department at Brown, explored the connections of genetic data with geography and migration.

Principal Component Analysis (PCA) applied to genetic data sampled from populations in Europe and Asia provides evidence for the correlation between genetic information and geographic location of individuals. The goal of the project was to investigate whether gene data can predict migration patterns. We ran multiple simulations of population growth and migration, and using ancestral distance as a proxy for genetic data, attempted to map out inferred geography. We used various techniques and visualizations, such as Multi Dimensional Scaling (MDS), Independent Component Analysis (ICA) and Isomap, to infer the underlying patterns in the high dimensional data.

We also applied these techniques to the original European and Asian data sets and were able to identify some sub-populations with distinct genetic traits using ICA.

Final project presentation link - [pdf]

  Work presented at Society of Molecular Biology and Evolution Annual Meeting, Kyoto, Japan, 2011. 

  S. Ramachandran, D. Nadkarni and M. Harrison, "Detecting gene flow low-dimensional summaries of genotype data"

  SEIR epidemics on graphs: Latent Network Inference

This was a group project for the class 'Probabilistic and statistical models for graphs and networks (AM2821K)' in collaboration with Laura Slivinsky and Daniel Klein.  

In order to predict and hopefully prevent epidemics, we need to better understand the underlying contact networks on which diseases spread, as well as certain disease-specific parameters. We investigated if and how we can retrospectively infer these parameters given an incomplete record (eg: population size and recovery times) of a disease outbreak.

We simulated the SEIR model of diseases spreading over types of contact networks, such as Erdos-Renyi and preferential attachment models and used Gibbs sampling in R to estimate the parameters in various scenarios of missing data. In particular, we explored the parameter identifiability issue which limits the scope of inference.


Link to the final poster presentation - [pdf]

  Masters Thesis: Time Optimal Control of a Spherical Mobile Robot (SMR)

I studied the control problem of time optimal maneuvering of an SMR with two or three internal actuators having constraints on the control inputs. Formulating the problem in a differential geometric framework, I used Pontryagin’s maximality principle and Lie-Poisson reduction theorem to reduced the control problem to a lower dimensional space.  I then established an algorithm to compute the candidate optimal trajectories for a given start and end point and to explore each candidate trajectory to find the global minimum for the cost function (time).


Link to thesis defense - [presentation]

Work presented at the European Control Conference 2009:

R. Banavar, A. Menon, D. Nadkarni, "Time optimal transfer in the plate-ball problem"