-By Uday
Extra Activity 5 : Week 6 ----> Collecting dataset to approximate a variable with the continuous random variable and Modelling a dataset with the continuous random variable and verify it.
This Dataset is different from examples given under extra activity 5 week 6 heading in our course page. This sample contains peoples data with their name , gender , age , weight and height .But this particular activity only dealt with the weight . It is continuous data .From there it fits Normal Distribution .
=>All the Calculation is done in the below given Google Document
ππΌ ππΌ ππΌ
https://docs.google.com/document/d/1ItAwW-6jlmv_G6eVjowspDKM_2besb8jvGztrsQPhS4/edit?usp=sharing
=>The Dataset is in the Spreadsheet provide below
ππΌ ππΌ ππΌ
https://docs.google.com/spreadsheets/d/15G6-AN4JaDJ684JpqDQE5sz55l_Qx92QSnvtOrlH78s/edit?usp=sharing
Our main Aim is to verify that the data contains some Distribution . It has been proved in Google Document .But Below are some Histogram Graph just for reference. And one more thing ,in the left side spreadsheet you are viewing ,there only 100 rows are visible. It contains more rows .So visit it through the spreadsheet link provided above.
The graph are generated from spreadsheet .So it is showing like this . The one which is actually calculate through Python Codes by implementing some libraries looks like below attached images .
Female weight Histogram
Male weight Histogram
Female Height Normal Distribution
The above graph generated are done with the help of python programming . This week does not tell to embed Google Colab Notebook ,So coding part is not done their .So I'm pasting it here.
The code is given below :
(********************************************************************************)
Whole code is their below but some of the lines are there but partially visible due to color of text . From my side it is written in bold and black .Even then it is not visible .So if you are looking into it ,please select the whole code .It will be visible well and better .
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('week6-data.csv')
df.head(3)
df = df[df['Gender']=='Male']
df.head(3)
weights = df['Weight'].tolist()
plt.hist(weights, bins=20)
from numpy import mean,std
mean = mean(weights)
std = std(weights)
print("mean = ",mean,"\nS.D = ", std)
from scipy.stats import norm
distribution = norm(mean, std)
min_weight = min(weights)
max_weight = max(weights)
values = list(range(int(min_weight), int(max_weight)))
probabilities = [distribution.pdf(v) for v in values]
from matplotlib import pyplot
pyplot.hist(weights, bins=20, density=True)
pyplot.plot(values, probabilities)
plt.show()
From above provided documents ( Google Document and SpreadSheets ) we concluded that the above Data follows Normal Distribution .
Thank You :)