Projects

My projects are divided into three categories: 

Model development is about designing statistical & machine learning models, either for research purposes or as a product for tech companies.

Business intelligence is concerned with optimizing how any company uses its data, identifying trends and issues in the business itself.

Educational projects provide data & AI literacy, making all of the above worthwhile for decision makers.

In this set of projects which comprise my dissertation, I identify how the distribution of statistically generated data across graphical networks affects the reliability and accuracy of the theories that result from the data. This is done by using bandit problems to simulate scientific data generation, graphical algorithms to distribute the data, and Bayesian learning to update theories.

Hierarchical machine learning models are constituted by layers of models. These allow for incremental categorization of hierarchical class structures. (E.g. predicting 'mammal' before 'dog'.) Doing so speeds up performance by dividing the difficulty of the problem as well as the data that each sub-model needs to process. Additionally, it provides predictions at intermediate levels thereby increasing insights and facilitating hyperparameter tuning.

Often you need to know what causes have what effects; mere correlation is not enough. This means standard statistical approaches and machine learning methods aren't up for the task. Luckily, Bayesian networks are here to the rescue! In this ad hoc analysis, I apply Bayesian networks to assess the causal structure of a supply chain, identifying bottlenecks and points of vulnerability to ensure a more robust system.

Coming soon!

Advanced statistical methods are known to have two traits: 1) Second to none power and versatility for analysis across essentially any domain. 2) Seemingly locked behind a wall of complication notation that requires years of experience to master. In this project, I explore how this challenge can be overcome with the design of an intuitive business intelligence dashboard. The goal here is not a particular piece of analysis, but providing easy access to techniques otherwise reserved for experts.

Leveraging the BallDontLie API, this dashboard provides up to date information on NBA games for the past 75 years. Implementing a careful orchestration of data slicers, the dashboard allows for all sorts of comparisons: Compare Abdul-Jabbar's time with the Lakers against the stats of the entire Celtic team. Or select your favorite combinations of players to build your super team and compare against alternatives. 

k-means clustering is a classic way to explore segmenting customers by their tendencies, but it only works for numerical data (sales amounts, etc.). k-modes clustering allows for grouping data based on categorical features (location, etc.), but not numerical data. k-proto is a hybridization of the two. In this project I utilize k-proto clustering to customer segmentation, which in turns facilitates optimizing marketing and shipping strategies.

Coming soon!

As data becomes more prevalent in everyone's lives, it's vital to know the basic concepts. However many people have real trouble anytime math-talk pops up. I built this app to demonstrate three of the most important concepts in all of statistics and data analysis: Mean, Variance, and Correlation. Seeing how these parameters determine the shape of data provides an intuitive, practical understanding that sidesteps technical jargon.

Probability theory is the corner stone of data science and decision making. And the cornerstone of probability theory is Bayes' Rule. An understanding of Bayes' Rule is a key to understanding how to update one's strategies based on evidence. In this project I provide a simple yet effective visualization of probability theory to show how Bayes' rule works without the formulas.

Clustering is a family of unsupervised machine learning methods that partitions data. The point is to find naturally occurring subsets of the data - carve it at its joints - you might say. In this project, I discuss the differences between three clustering methods: K-means, Gaussian Mixture Modelling, & Density Based Scanning. This includes an app that allows you to compare the clustering in real time!


What is Decision Science?


Decision Science ⊃ Data Science ⊃ Machine Learning
(Read '⊃' as 'includes'.)


Machine learning is one of those concepts that's hard to define, but at least one crucial aspect is that it deals with helping machines find more insights about data, often using neural networks. A machine learning engineer is concerned with improving a computer's abilities. 

Data science includes machine learning as well as other statistical techniques (which don't always involve machine learning) to improve our abilities. A data scientist is concerned with helping people find more insights in data, and using more insightful machines is just one way to do it.

Decision science is the combination of data science with the science of making decisions, called decision theory. While data science is largely passive and focuses on observing, decision science is active and focuses on what to do. As a decision scientist, I am concerned with helping people make better decisions, and I use decision theory along with statistical and machine learning techniques to do so.