Case StUDY

With a working understanding of tools like the pandas Python package and Mathematica, Cornell engineers can use each tool to its strength. Here's a simple case where you can use both pandas and Mathematica to find the best probability distribution that fits a messy dataset:

The Dataset

In certain medical emergencies, helicopters (medi-vacs) are dispatched to air-lift the patient to a hospital for urgent medical care. The dataset medivac_data.csv contains simulated (but still realistic) calls to a medi-vac dispatch center. For this case study, we will look at the "Scene time" column in the dataset, which tells us how long each helicopter spent at the patient's scene for a given call. (As we will see, if this field is 0, that means no helicopter was dispatched to the scene.)

A quick look at the csv file shows over 16,000 data points, so we'll want to leverage pandas' ability to quickly clean through datasets to remove the rows with 0 scene time and the rows with outlier scene times. After this, the goal of the case study will be to fit a probability distribution to the cleaned scene times.

This dataset and case study were adapted from a project in the Cornell course ORIE 4580 in Fall 2021. Special thanks to Professor Shane Henderson for letting us use his dataset!

Step 1: Clean the data

To optimize the validity of our probability distribution fit, we need to drop rows from our data and remove outliers. The best tool to do this is the pandas Python package, as its library provides functions designed for these data-cleaning tasks. Follow the guide below to see how we isolate the time that the medi-vac spends in the field:

By the end of this Python tutorial, you should have a file that looks like scene_time.csv.

Step 2: Import into Mathematica and apply fits

Now, we switch to a tool that is much better suited to fitting data to complex distributions. While we know that this data will fit a known probability distributions (the beta or gamma distributions), we could symbolically define a new probability function for less well-behaved datasets.


Follow the Mathematica tutorial below (or run it on your own machine) to see why Mathematica is the superior tool to complete this task.

fitting-medi-vac-dur-times.pdf