Pre-doctoral research projects

This is a selected list of projects I worked on when I was a research staff member at Auton Lab during the period January 2005 - August 2014

UPMC Health Plan data analysis 2013

Analyzed hospital operation room cost data as well as quality of service data, demonstrated an analytical framework to objectively measure the performance of doctors; Demonstrated statistical analysis and data mining techniques to build data driven model to forecast hospital admission using real time authorization data stream and other streams of data;

mHealth Real Time Biosurveillance Project in Sri Lanka 2010-2011

Spatial and temporal disease outbreak detection for real time mobile phone collected outpatient visits data in Sri Lanka. Automatic detection of erroneous entry using information theoretic metrics.

Modeling dynamics of conflict with Angola civil war data 2012

Machine learning application in event prediction/explanation to help understand the dynamics of conflict in the context of Angola Civil war data

Fuel efficiency based driver performance analysis 2011

The outcome of the project is a data driven performance evaluation scheme to objectively rate driver’s fuel consumption performance accounting for both controllable and uncontrollable factors, it also provided insight into how drivers may improve performance by adjusting controllable factors. The method/framework is being used in a startup company in Pittsburgh. In this project, I used tree based regression model as well as multilevel/hierarchical approach to model the relationship between a truck driver style of driving and fuel consumption. The estimation was obtained via Monte Carlo Markov Chain (MCMC) simulation.

Clusters of USDA/CDC sponsored projects 2005-2011

Cross agency, multiple data stream analysis using data from USDA and CDC

Contributed to the design of a web based interface(TCWI) to allow users to interactively explore multiple streams of spatial and temporal data. Contributed to design of methods to generate hypothesis of the linkage between USDA positive sampling result and human illness reports from CDC. The alert generated from the analysis may be used to assist in trace back work of USDA/CDC teams in outbreak. TCWI is now part of the Public Health Information System (PHIS) deployed at USDA (remotely accessible by CDC users).

Analytical work related to Risk based inspection

Co-developed lift analysis metrics - a variation of odds ratio, adapted to the analysis of multiple streams of time series data. This metrics is designed to measure the temporal correlation between two types of events. Application to USDA data provided insight into the utility of sanitation inspection records in predicting the salmonella contamination in food processing plants in the near future. This analysis was used to support Risk Based Inspection initiative at USDA. The lift analysis method is now also part of the PHIS system at USDA/CDC

Network/graph based analysis

Applied social network analysis to model the supply relations among USDA establishment to reveal the complex relations that otherwise is easily missed by flat table. The model can be used by outbreak investigation team to pinpoint the possible source of contamination. Applied dynamic social network analysis method to USDA routine sampling data to predict near future risk level of food processing plant at serotype and PFGE pattern level.