Week 0 : Activity 5
| Fitting a distribution (Continuous)
| Fitting a distribution (Continuous)
WEEK - 0
Extra Activity 5
Directive :
(i) Collect a dataset that follows a continuous distribution or can be approximated by a continuous distribution.
(ii) Check if your data fits some distribution.
(iii) Find the probability density or cumulative density of the data.
(iv) Check if the actual probability matches with the one you have calculated using the distribution.
(v) If not, can you think of some other distribution which is a better fit.
INTRODUCTION
The data is collected from data.world which contains details about various comic book heroes.
It provides a structured collection of attributes for various superheroes, including their name, demographics (gender, race), physical characteristics (height, weight, hair/eye color, skin color), publisher affiliation (Marvel, DC Comics, etc.), and even their moral alignment (hero or villain).
Overall, this data model provides a structured way to represent and manage information about superheroes. It can be a valuable resource for data analysis and visualization tasks.
Model: Height of heroes
Total heroes = 107
A random variable X is defined as follows :
X = Height of the hero
PLEASE REFER TO THE GOOGLE DOC FOR DETAILED ANALYSIS
SUMMARY :
This doc analyzes the heights of comic book heroes from a dataset containing various hero attributes. Here are the key takeaways:
Data Description: The data includes height information for 107 superheroes, measured in centimeters.
Distribution: The height distribution appears to be approximately normal based on the histogram's bell shape, symmetry, and tapering tails. However, there's a slight positive skew towards taller heights.
Normal Distribution Model (Claim 1): The report argues that the height data can be modeled by a normal distribution due to the characteristics mentioned above. Calculations for mean (177 cm) and standard deviation (30.35 cm) are provided.
Validation: The report compares the empirical Cumulative Density Function (CDF) values (calculated from data points) with the theoretical CDF values from a normal distribution for specific heights. The close match suggests the data aligns well with a normal distribution.
Triangular Distribution Model (Claim 2): The report explores the possibility of modeling the data with a triangular distribution. While the mode (183 cm) coincides with the median (as expected in a triangular distribution), and the mean falls between the minimum and maximum values, the empirical CDF values significantly deviate from the theoretical ones. This suggests the data doesn't follow a triangular distribution within the observed range.
Conclusion:
The analysis provides evidence that the heights of comic book heroes in this dataset can be reasonably approximated by a normal distribution. The triangular distribution, however, is not a suitable model for this data.