Dataset.
Analysis Based on the Dataset.
Interpretations.
Age
Gender
BMI
Children
Smoker
Region
Medical Cost
Associations.
Between 'Smoker' & 'Gender'.
Between 'BMI' & 'Medical Cost'.
This dataset contains detailed information about medical costs for individuals over the period from 2010 to 2020. It includes various attributes such as age, gender, BMI, number of children, smoking status, and region. This is continuation of the analysis on the same dataset as discussed in Activity 1&2.
The above mentioned variable is represented in the form of a Histogram, as it makes it easier to distinguish the different classes of BMI, of individuals in the dataset. As it is a numerical variable the Measures of Central Tendency and Dispersion can be calculated for it.
The above mentioned variable is represented in the form of a Bar graph, as it makes it easier to distinguish the number of children of individuals in the dataset. As it is a numerical variable the Measures of Central Tendency and Dispersion can be calculated for it.
From the bar chart, we can interpret that the longest bar is the Mode which is "No" & from the pie chart, we can interpret that the widest slice is the Mode which is also "No" as both represent the same data.
From the pie chart, we can interpret that the widest slice is the Mode which is "Northeast".
The above mentioned variable is represented in the form of a Histogram, as it makes it easier to distinguish to which class of expenditure on medical bills does an individual belong to in this dataset. As it is a numerical variable the Measures of Central Tendency and Dispersion can be calculated for it.
(Calculation of Covariance and Correlation Coefficient is in the Dataset document)
From the Stacked Bar Chart, it can easily be concluded that there is no association between the variables, as changing in one variable doesn't reflect the change in other variables. An individual being a smoker or not have no in direct relation with the individual's gender.
With accordance to the below spreadsheet , the is no correlation between the variables BMI & Medical cost , as depicted by the Covariance and Correlation coefficient value i.e. 400.108 and 0.00921 respectively. From the correlation coefficient value, it can be easily inferred that there is no relation between the variables as its value is tending to zero.
From the scatter plot, it is evident that there exists no association between BMI and Medical Cost, due to large gap between the plots. Although it seems that there would be a relationship between them but with accordance to the current data there is so association between them here.All age groups need medical facility.