Abstract

The purpose of this experiment was to simulate and analyze random graphs synthesized to model the statistical properties of datasets for computer diagnostics. The procedures for this experiment can be separated into sections, beginning with the creation of the naive model. After this preliminary model was designed, the properties of the model and of the datasets were analyzed, and the findings were recorded. Then the relationship between the mean number of symptoms per diagnosis (mSpD) and the mean number of diagnoses per symptom (mDpS) was determined with the bipartite proof. This determined that the two averages are linearly related, the relationship being mDpS = M/N (mSpD), in which M is the number of diagnoses and N is the number of symptoms. Therefore, the bipartite proof supported the hypothesis that mSpD and mDpS will be dependent upon each other. Afterwards, the naive model was modified to reflect the properties of the datasets and then the properties of this updated model were analyzed. Next, the six sets of KS tests were run in order to compare the distributions of the models and the datasets. KS test 5 determined that the SpD distributions of datasets 1 and 2 (but not 3 or 4) come from Poisson distributions, and thus partially supported the second hypothesis. Finally, the skewness-statistics was calculated for each dataset, and the values supported the hypothesis that the number of diagnoses per symptom will come from a unimodal distribution that is skewed to the right, as they all have positive skews.

Presenter Information

Carla del Río is a high school freshmen at American Heritage School in Plantation, Florida.

Discusion Widget