Projects

My projects are divided into three categories:

Model development is about designing statistical & machine learning models, either for research purposes or as a product for tech companies.

Business intelligence is concerned with optimizing how any company uses its data, identifying trends and issues in the business itself.

Educational projects provide data & AI literacy, making all of the above worthwhile for decision makers.

Model Development Projects

Business Intelligence Dashboards

Educational Projects

Model Development Projects

I utilize novel approaches to explore new techniques in various directions.

How Variance in Data Affects Reliability of Predictions in Scientific Networks
[PhD Dissertation]

In this set of projects which comprise my dissertation, I identify how the distribution of statistically generated data across graphical networks affects the reliability and accuracy of the theories that result from the data. This is done by using bandit problems to simulate scientific data generation, graphical algorithms to distribute the data, and Bayesian learning to update theories.

Hierarchical Models for Student Performance Predictions

Hierarchical machine learning models are constituted by layers of models. These allow for incremental categorization of hierarchical class structures. (E.g. predicting 'mammal' before 'dog'.) Doing so speeds up performance by dividing the difficulty of the problem as well as the data that each sub-model needs to process. Additionally, it provides predictions at intermediate levels thereby increasing insights and facilitating hyperparameter tuning.

Causal Bayesian Networks for Supply Chains

Often you need to know what causes have what effects; mere correlation is not enough. This means standard statistical approaches and machine learning methods aren't up for the task. Luckily, Bayesian networks are here to the rescue! In this ad hoc analysis, I apply Bayesian networks to assess the causal structure of a supply chain, identifying bottlenecks and points of vulnerability to ensure a more robust system.

Coming soon!

Business Intelligence Dashboards

I leverage Power BI and Tableau to solve specific business problems and recommend actionable solutions for measurable success.

Dashboard for Customer Clusters and Sales Forecasts

Advanced statistical methods are known to have two traits: 1) Second to none power and versatility for analysis across essentially any domain. 2) Seemingly locked behind a wall of complication notation that requires years of experience to master. In this project, I explore how this challenge can be overcome with the design of an intuitive business intelligence dashboard. The goal here is not a particular piece of analysis, but providing easy access to techniques otherwise reserved for experts.

NBA Statistics Dashboard

Leveraging the BallDontLie API, this dashboard provides up to date information on NBA games for the past 75 years. Implementing a careful orchestration of data slicers, the dashboard allows for all sorts of comparisons: Compare Abdul-Jabbar's time with the Lakers against the stats of the entire Celtic team. Or select your favorite combinations of players to build your super team and compare against alternatives.

Customer Segmentation via k-means, k-modes, & k-proto Clustering Methods

k-means clustering is a classic way to explore segmenting customers by their tendencies, but it only works for numerical data (sales amounts, etc.). k-modes clustering allows for grouping data based on categorical features (location, etc.), but not numerical data. k-proto is a hybridization of the two. In this project I utilize k-proto clustering to customer segmentation, which in turns facilitates optimizing marketing and shipping strategies.

Coming soon!

Educational Projects

I break down seemingly daunting concepts in intuitive ways. These include full write-ups of different concepts like variance and clustering as well as interactive apps that you can use right in your browser! These are valuable for both data analysts and business decision makers that want a strong grasp on the basics.

Visualizing 2D Normal Distributions

As data becomes more prevalent in everyone's lives, it's vital to know the basic concepts. However many people have real trouble anytime math-talk pops up. I built this app to demonstrate three of the most important concepts in all of statistics and data analysis: Mean, Variance, and Correlation. Seeing how these parameters determine the shape of data provides an intuitive, practical understanding that sidesteps technical jargon.

Visualizing Conditional Probability &
Bayes' Rule

Probability theory is the corner stone of data science and decision making. And the cornerstone of probability theory is Bayes' Rule. An understanding of Bayes' Rule is a key to understanding how to update one's strategies based on evidence. In this project I provide a simple yet effective visualization of probability theory to show how Bayes' rule works without the formulas.

Comparing Three Methods of Data Clustering: K-Means, GMM, & DBScan

Clustering is a family of unsupervised machine learning methods that partitions data. The point is to find naturally occurring subsets of the data - carve it at its joints - you might say. In this project, I discuss the differences between three clustering methods: K-means, Gaussian Mixture Modelling, & Density Based Scanning. This includes an app that allows you to compare the clustering in real time!

What is Decision Science?

Decision Science ⊃ Data Science ⊃ Machine Learning
(Read '⊃' as 'includes'.)

Machine learning is one of those concepts that's hard to define, but at least one crucial aspect is that it deals with helping machines find more insights about data, often using neural networks. A machine learning engineer is concerned with improving a computer's abilities.

Data science includes machine learning as well as other statistical techniques (which don't always involve machine learning) to improve our abilities. A data scientist is concerned with helping people find more insights in data, and using more insightful machines is just one way to do it.

Decision science is the combination of data science with the science of making decisions, called decision theory. While data science is largely passive and focuses on observing, decision science is active and focuses on what to do. As a decision scientist, I am concerned with helping people make better decisions, and I use decision theory along with statistical and machine learning techniques to do so.

Google Sites

Report abuse

Projects

How Variance in Data Affects Reliability of Predictions in Scientific Networks[PhD Dissertation]

What is Decision Science?

Decision Science ⊃ Data Science ⊃ Machine Learning (Read '⊃' as 'includes'.)

How Variance in Data Affects Reliability of Predictions in Scientific Networks
[PhD Dissertation]

Decision Science ⊃ Data Science ⊃ Machine Learning
(Read '⊃' as 'includes'.)