A data analysis and visualization website highlighting interesting trends from the SFPD emergency dispatch dataset; these insights are used to formulate solution steps to improve dispatch efficiency and reduce response times. Created in March 2018 as an entry to the MindSumo computer science challenge found here, for admission to the May 2017 Capital One Software Engineering Summit. Built from scratch with the help of Jupyter notebooks, numpy, pandas, seaborn, matplotlib, Tableau, the Google Maps API, and HTML/CSS/JavaScript.
My work has been distinguished as a top submission to the MindSumo challenge by Capital One engineers. Check it out here!
An end-to-end feature extraction and machine learning pipeline for predicting a village's electrification level using aerial imagery; streamlines converting ~30 GB of raw image data into desired feature statistics, subsequent feature selection, dimensionality reduction, and classification; current model predicts village-level electrification category for the Indian state of Bihar with .80 AUC. Built with Python, scikit-image, numpy, pandas, matplotlib, scikit-learn, and ArcGIS.
Our research poster won 1st Place at the 2018 Duke University Research Computing Symposium- check it out here. We also receieved an Honorable Mention at the 2018 State Energy Conference of North Carolina.
As a Co-Lab consultant, I maintain, develop and improve the main website, infrastructure, and microservices for Duke's technical innovation hub, the Co-Lab. Recently, I've taken up development of Project Discover, a new section of our website that allows creators to find other students to collaborate with on new initiatives, based on skills and interests. I work with a team and oversee both frontend and backend development of this web application using Ruby on Rails, HTML/CSS/JavaScript, and various deployment tools.
View the website: https://colab.duke.edu/
Research project advised by Dr. Lawrence Carin, investigating semi-supervised deep learning and autoencoder-based approaches to feature selection in natural language processing scenarios, such as text generation and summarization with limited ground-truth data. Collaborating with graduate researchers in applying newly found techniques to new datasets and comparing various architectures, primarily in TensorFlow.
A supervised-learning dataset compiled in summer 2017 in the Data+ program that contains detailed features for every village in the Indian state of Bihar, including satellite imagery, political boundaries, lights at night imagery, rainfall and vegetation indices and electrification rate approximations. In the current academic semester, I’m leading a Bass Connections research team in developing classifiers using this data to be able to predict electrification rates at scale. Technologies used include Python, scikit-image, GDAL, and ArcGIS.
Our dataset was published to the open source resource community on Figshare. For more about our work: Summary Presentation, Duke Research Magazine Feature, Project Poster
A web platform and accompanying REST API built using Django, the Django REST Framework and PostgreSQL for keeping track of loaner equipment inventory and users of Duke’s Biology Departmental IT helpdesk. Built with Python/Django, HTML/CSS/JavaScript and Bootstrap.
A Python GUI review program that uses the Amazon Mechanical Turk API to allow researchers or Turk users to review image labels and annotations received for their crowdsourced Human Intelligence Tasks. This tool drastically reduced the time needed for dataset review and ensured integrity of ground-truth data to be used for supervised learning. Built with Python, Tkinter, and the Amazon Mechanical Turk API.