I am currently a PhD student in the Division
of Applied Mathematics at Brown University. My research interests are broadly
in the area of statistics and machine learning. Here are some of my recent
projects:
Insight Data Science Project: Predicting 311 ComplaintsMy most recent project is LocalityLookout.com, a tool I developed to identify and predict highpriority complaints your NYC neighborhood. I worked with the 311 complaint dataset, which contains over 7 million rows of nonemergency complaint logs from 2010 onwards. The data has interesting spaciotemporal information that can be used to gain insights into the different problems each area faces. I modeled the periodic trends in the complaint volume time series and compared the expected and actual volumes to identify the unusually high complaint types. I used MySQL to store and access the data, and performed the analysis in Python using the NumPy, SciPy, matplotlib and scikitlearn libraries. I also used flask and jQuery to build an interactive frontend hosted on Amazon web services. For more details, check it out at www.localitylookout.com Slides  [pdf]  
PhD Research: Conditional Inference and nonparametric approaches for network analysis of neural spiking data
My PhD research
focuses on analyzing neuron spike train data, modeled as a multivariate binary
time series, to infer an underlying anatomical network using conditional inference
techniques.
Inference about the
fast temporal correlation structure of neurons (such as zerolag synchrony and
precise laglead relationships) is complicated by their many unobserved and
correlated inputs. Classical approaches to this problem suffer from model
misspecification and incidental parameter problems (meaning the number of
nuisance parameters grows with data, leading to inconsistent estimates).
Conditional inference
focuses on the parameters of interests while being robust to various
nonstationary background effects that influence neuron firing rates. So far we
have looked at the exact conditional approach for loglinear (maximum entropy) models. We are also exploring pseudolikelihood and
saddlepoint approximations that would scale better to a larger network and to
more complex models such as generalized linear models and hidden Markov models. We also implemented nonparametric Bayesian approach using Dirichlet process priors on the incidental parameters to tackle the inconsistency issues. The conditional inference approach is typically faster than the nonparametric sampling techniques and more suitable to problems where you do not care about inferring the incidental parameters.


Geographic Information from Gene data
This project, a collaborative effort
with Sohini Ramachandran from the Ecology and Evolutionary Biology department at
Brown, explored the connections of genetic data with geography and migration. Principal Component Analysis (PCA)
applied to genetic data sampled from populations in Europe and Asia provides
evidence for the correlation between genetic information and geographic
location of individuals. The goal of the project was to investigate whether
gene data can predict migration patterns. We ran multiple simulations of
population growth and migration, and using ancestral distance as a proxy for
genetic data, attempted to map out inferred geography. We used various
techniques and visualizations, such as Multi Dimensional Scaling (MDS),
Independent Component Analysis (ICA) and Isomap, to infer the underlying
patterns in the high dimensional data.
We also applied these techniques to
the original European and Asian data sets and were able
to identify some subpopulations with distinct genetic traits using ICA.
Final project presentation link
 [pdf]

 Work presented at Society of Molecular Biology and Evolution Annual Meeting, Kyoto, Japan, 2011. S. Ramachandran, D. Nadkarni and M. Harrison, "Detecting gene flow lowdimensional summaries of genotype data"
SEIR epidemics on graphs: Latent Network Inference
This was a group project for the
class 'Probabilistic and statistical models for graphs and networks (AM2821K)' in
collaboration with Laura Slivinsky and Daniel Klein.
In order to predict and hopefully
prevent epidemics, we need to better understand the underlying contact networks
on which diseases spread, as well as certain diseasespecific parameters. We
investigated if and how we can retrospectively infer these parameters given an
incomplete record (eg: population size and recovery times) of a disease
outbreak. We simulated the SEIR model of diseases spreading over types of
contact networks, such as ErdosRenyi and preferential attachment models and
used Gibbs sampling in R to estimate
the parameters in various scenarios of missing data. In particular, we explored
the parameter identifiability issue which limits the scope of inference.
Link to the final poster presentation  [pdf] 

Masters Thesis: Time Optimal Control of a Spherical Mobile Robot (SMR)
I studied the control
problem of time optimal maneuvering of an SMR with two or three internal
actuators having constraints on the control inputs. Formulating the problem in
a differential geometric framework, I used Pontryagin’s maximality principle
and LiePoisson reduction theorem to reduced the control problem to a lower
dimensional space. I then established an algorithm to compute the
candidate optimal trajectories for a given start and end point and to explore
each candidate trajectory to find the global minimum for the cost function
(time).
Link to thesis defense  [presentation]
Work presented at the European Control Conference 2009: R. Banavar, A. Menon, D. Nadkarni, "Time optimal transfer in the plateball problem" 


