Investigate a Dataset - NoShowAppointment

No show Appointment Project

The goal of this project is to investigate a dataset of medical appointment records for Brasil public hospitals. The data includes some attributes of patients stating if the patients showed up to appointments or not.


The analysis is focused on finding trends influencing patients to show or not show up to appointments. Using descriptive statistics and appropriate visualizations to showcase relationships, the following questions were answered using python's libraries such as Pandas, plotly, datetime and Numpy:

The following observation was documented after all necessary analysis and questions have been attempted:

The data of medical appointment show or no-show was used to answer a number of questions.

We were able to explore the data set to understand all its features and we went further to wrangle and clean up the data in line with what we intended to explore with the data.

The data features patients data with 80% of the patients showing up for their appointment while 20% of patients did not show up for their appointment. See chart for visuals. The reasons for these patients not showing up could range from a couple of reasons of which we will attempt to also find answers to in line with our research questions..

A number of research questions were considered in our analysis and my observations are thus:

For the first research question, it's obvious that more females booked for appointment compared to their male counterparts. See figure for visuals. This could be for a couple of obvious reasons which were not stated by the data. Although, Comparing the gender information to the ailments in the data may be able to help us give more insights on why the result was the way it was.

For the second research question, the data wrangling and analysis showed that a lot of awaiting days fell within the 0 days range. This was a major limitation to our analysis as that figure is relatively huge and substantial to our analysis - about 25%. I decided to work with this figure on the assumption that these sets of patients booked their appointment schedule same day as their appointment. Although, that assumption can only be accurate on the condition that all of the patients in this category showed up for their appointments. This of course is subject to more analysis which wasn't carried out by myself.

Furthermore, from our analysis, there was no distinct feature that was noticed to have had an effect on patients showing up for their appointments. More analysis would be needed on each of the features to get more insight sufficient to answer that question.

Finally, for the fourth research question, no relationship was found in the age of patients been attributed to them getting a scholarship. More data would be required to exactly know this fact. Data such as when the scholarship was obtained by these patients and the ages of the patients as at when they obtained the scholarship.

For access to the full documentation of project , see  GitHub  or Kaggle.