Week 6

Extra

Activity

-By Uday

Extra Activity 5 : Week 6 ----> Collecting dataset to approximate a variable with the continuous random variable and Modelling a dataset with the continuous random variable and verify it.

This Dataset is different from examples given under extra activity 5 week 6 heading in our course page. This sample contains peoples data with their name , gender , age , weight and height .But this particular activity only dealt with the weight . It is continuous data .From there it fits Normal Distribution .

=>All the Calculation is done in the below given Google Document

👇🏼 👇🏼 👇🏼

https://docs.google.com/document/d/1ItAwW-6jlmv_G6eVjowspDKM_2besb8jvGztrsQPhS4/edit?usp=sharing

=>The Dataset is in the Spreadsheet provide below

👇🏼 👇🏼 👇🏼

https://docs.google.com/spreadsheets/d/15G6-AN4JaDJ684JpqDQE5sz55l_Qx92QSnvtOrlH78s/edit?usp=sharing

week6-data

Our main Aim is to verify that the data contains some Distribution . It has been proved in Google Document .But Below are some Histogram Graph just for reference. And one more thing ,in the left side spreadsheet you are viewing ,there only 100 rows are visible. It contains more rows .So visit it through the spreadsheet link provided above.

The graph are generated from spreadsheet .So it is showing like this . The one which is actually calculate through Python Codes by implementing some libraries looks like below attached images .

Female weight Histogram

Male weight Histogram

Female Height Normal Distribution

The above graph generated are done with the help of python programming . This week does not tell to embed Google Colab Notebook ,So coding part is not done their .So I'm pasting it here.

The code is given below :

(********************************************************************************)

Whole code is their below but some of the lines are there but partially visible due to color of text . From my side it is written in bold and black .Even then it is not visible .So if you are looking into it ,please select the whole code .It will be visible well and better .

from matplotlib import pyplot as plt

import pandas as pd

df = pd.read_csv('week6-data.csv')

df.head(3)

df = df[df['Gender']=='Male']

df.head(3)

weights = df['Weight'].tolist()

plt.hist(weights, bins=20)

from numpy import mean,std

mean = mean(weights)

std = std(weights)

print("mean = ",mean,"\nS.D = ", std)

from scipy.stats import norm

distribution = norm(mean, std)

min_weight = min(weights)

max_weight = max(weights)

values = list(range(int(min_weight), int(max_weight)))

probabilities = [distribution.pdf(v) for v in values]

from matplotlib import pyplot

pyplot.hist(weights, bins=20, density=True)

pyplot.plot(values, probabilities)

plt.show()

From above provided documents ( Google Document and SpreadSheets ) we concluded that the above Data follows Normal Distribution .

Thank You :)

Week 6

Extra

Activity

Get in touch at [ 21f1004267@student.onlinedegree.iitm.ac.in ]