Statistics is a set of mathematical methods and tools that enable us to answer important questions about data. You must have heard about Machine Learning and Deep Learning which is all about Data and Statistics which is used in solving real world problems.
Statistics is divided into two categories :
Descriptive Statistics - this offers methods to summarize data by transforming raw observations into meaningful information that is easy to interpret and share.
Inferential Statistics - this offers methods to study experiments done on small samples of data and chalk out the inferences to the entire population (entire domain).
In isolation, raw observations are just data. We use descriptive statistics to transform these observations into insights that make sense.
Then we can use inferential statistics to study small samples of data and extrapolate our findings to the entire population.
Finally as we go deep into the course we will get to learn more new things and concepts in Statistics.
This week is all about recap of Statistics 1 with some more new concepts . We revised the basic concepts like Experiments, Outcomes ,Sample space, Events . It also introduced Probability and its properties like Conditional probability ,law of total probability ,Bayes' Theorem . Then we revised about Distribution like Binomial and Geometric Distribution . Some new and interesting thing we got to learned is programming using Python in Google Collab notebook which requires knowledge in programming in Python . One more concepts is been introduced that is Poisson random variables .
With these brush up ,there are practice assignments also which helps us to clarify the things where we stuck . All these basic concepts will be used every where one try to solve real life problems using Machine learning or Deep learning . To show up what we have learnt ,we came through managing Student Portfolio where we need to put all the weekly extra activity and our skills.
Extra Activity 1 : Creating Students Portfolio and a page of Statistics .
Extra Activity 2 : Collecting some Data and populating them in a spreadsheet.
docs.google.com/spreadsheets/d/1GU2SFMN8BrpQUOLnPixPT4hJXSXMQXCdlQB_uI-0LRs/edit#gid=2084073395
(The link for spreadsheet👆🏼 )
___________________________________________________
Above is the spreadsheet consisting of 200 rows and 8 columns . The sheet is all about Data of a country i.e lets say U.S.A
where the sample contains details from randomly selected 20 -30 private or government sector industry or workplace .
From these data we can conclude and summarize many things like :
Ratio of male and female in the given sample (using pie chart)
Counts school subjects they had taken (using column bar chart)
Counts of different countries where the working professionals belongs to (using column bar chart)
_____________________________________________________________
Extra Activity 3 : Monte Carlo Simulation in Python Notebook
colab.research.google.com/drive/1EilLomuTxx3YZ4oVIpD2NK9j9b1I0qL_?usp=sharing
(link for Python Notebook 👆🏼)
def coin_toss():
x = random.uniform(0, 1)
if x > 0.5:
return 'H'
else:
return 'T'
for i in range(10):
print(coin_toss())
This is the first program where we are checking only for 10 tosses of a fair coin ,where we found that number of Head=6 and number of Tail=4.
that means
probability of H=6/10=0.6
and probability of T =4/10=0.4
It shows that it is nearly same to 0.5 every time .
Then we check for large number of samples i.e 1000,2000,3000,4000 and 5000 samples whether the probability getting Head and Tail is same or not .
The program written is :
import random
import numpy as np
import matplotlib.pyplot as plt
def toss():
return random.randint(0,1)
outcomes_list = []
def Simulation(n):
Outcomes = 0
for i in range(n):
outcome = toss()
Outcomes = Outcomes + outcome
probability = Outcomes/(i+1)
outcomes_list.append(probability)
plt.axhline(y=0.5, color='r', linestyle='-')
plt.xlabel("Iterations")
plt.ylabel("Probability")
plt.plot(outcomes_list)
return Outcomes/n
X = Simulation(1000)
print("Result is :",X)
This is the 1000 samples outcome where we got the probability as 0.492.
X = Simulation(2000)
print("Result is :",X)
Taking 2000 samples we got the result as 0.5115 .
X = Simulation(3000)
print("Result is :",X)
Taking 3000 samples we got the result as 0.495 .
X = Simulation(4000)
print("Result is :",X)
Taking 4000 samples we got the result as 0.5015 .
X = Simulation(5000)
print("Result is :",X)
Taking 5000 samples we got the result as 0.5026 .
The conclusion is : It is nearly same to 0.5 every time . which means probability of getting Head and Tail in a fair coin toss even in large number of samples is nearly same that is 0.5 or 1/2 .
Extra Activity 4 : Working on Google Document on a distribution from the spreadsheet Data .
This activity requires us to collect an interesting data or we can also use the previous spreadsheet but I have taken another interesting data where it is shown about football players and their goal scoring result . It contains minimum 100 rows and 5 columns.
The spreadsheet :
docs.google.com/spreadsheets/d/1iDi0qrDYvy3j2kJeNdmC_-OeIGNrPHIaBVAvt2yRoo8/edit#gid=1985632832
Using this data ,which fits Binomial distribution ,the calculation related to it is shown in the google doc .
Link to the google Document :
docs.google.com/document/d/1ftjrUtW8meuQGYyUcDP99t2WYalkf9xNYgD0fx6z-dY/edit