Statistics
JilMac's favorite Statistics Resources:
Excellent variety of Calculators: https://goodcalculators.com/
The power of graphics http://www.utrend.tv/v/9-out-of-10-americans-are-completely-wrong-about-this-mind-blowing-fact/
Kahn Academy Videos - his playlist has 68 https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PLGcvh64d5a4HfGkm8O4SIqbrS5HFaaM3B
Patrick Just Math Tutorial Videos http://patrickjmt.com scroll down to Statistics
Stat Trek Web Site - http://stattrek.com
Easy Calculation to verify work http://easycalculation.com/
SmartR - because being a Kid is more fun: http://smartr.edc.org/mathstats
Understanding Real Life Statistics http://learner.org/resources/series65.html
Statistic Symbols: http://www.statistics.com/statistical-symbols/
On Line Stat Book http://onlinestatbook.com/2/
On-Line Statistics Calculators https://www.easycalculation.com/statistics/probability-and-distributions.php
Our very own CCV-Dan: http://www.youtube.com/channel/UC96crSQB1Ariz4W5GTMISYw/videos
Finding a Sample Size: http://youtu.be/s5hZiYaRro8
Statistics CALCULATORs:http://www.numberempire.com/statisticscalculator.php
Get all the stats http://www.hackmath.net/en/calculator/statistics
TED Talks on Statistics https://www.ted.com/read/ted-studies/statistics
On Line Math Book http://onlinestatbook.com
Dan's Statistics Book http://centerofmath.org/textbooks/stats/index.html
Open Learning Initiative - Carnegie Mellon University - grant from the Bill & Melinda Gates Foundation
Probability & Statistics https://oli.cmu.edu/jcourse/lms/students/syllabus.do?section=90d4c5a580020ca600de6c845c4a12ce
Statistical Reasoning https://oli.cmu.edu/jcourse/lms/students/syllabus.do?section=90d4feac80020ca600b7024ae2a70d38
Free College Books https://courses.candelalearning.com/catalog/lumen
Statistics: https://courses.candelalearning.com/introstats1xmaster/
Defining the variables so you can plug them into formulas is made a little easier when you realize there are two sets of Variables, Formulas and Rules depending on the data you are working with. These two sets are:
Population - all the data
Sample - a sub set of the Population, there are also distinctions between:
some terms for samples: Large Sample, Small Sample & Random Sample
Formulas & Symbols are different for POPULATIONS and SAMPLES
Mean (= average)
Standard Deviation
Size
Variance
Proportion w/attribute
Proportion wo/attribute
Confidance Interval
Population
μ or X or X (mu)
σ (sigma)
Sample
x or x or or μx
s
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
-or- s = σ /sqrt(N)
Z=Score = Standard Deviation
n
s2
σ = sqrt [ Σ ( Xi - X )2 / N ]
Z-Score of .3413 x 2 = .6826 = 68%
Z=Score = Standard Deviation
N
σ2
σ2 = Σ ( Xi - X )2 / N
s2 = Σ ( xi - x )2 / ( n - 1 )
P or
p
q (1-p)
(pi)
Q(1-P)
(alpha) alpha = 1-CI
ex. CI of 95% = alpha of 1-.95 = .05
Stat Trek's overview http://stattrek.com/sampling/populations-and-samples.aspx?Tutorial=AP
So their formula's are slightly different although when you are first learning it seams as though they are interchangeable
You can look up what a term means http://stattrek.com/statistics/dictionary.aspx
Statistic Formulas http://stattrek.com/statistics/formulas.aspx?Tutorial=Stat
Statistics Notation http://stattrek.com/statistics/notation.aspx
Standard Deviation
It has the formula for Standard Deviation from SAMPLE data (vs. Population data)
What is Standard Deviation and what do you use it for https://www.youtube.com/watch?v=N9LvWy1IGPY
Statisticians often use simple random samples to estimate the standard deviation of a population, based on sample data. Given a simple random sample, the best estimate of the standard deviation of a population is:
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
s = sample standard deviation
x = sample mean
xi = is the ith element from the sample
n = the number of elements in the sample.
-----------------------------------------------------------------------------------------------
Let's use an example for Sample Standard Deviation to understand your (x-x)2 question which was really ( xi - x )2
Example: A sample consists of seven observations: {2, 3, 4, 5, 6, 7, 8}. What is the Standard Deviation?
How you can solve this: Our Standard Deviation formula needs the mean so let's find the mean:
x = (2 + 3 + 4 + 5 + 6 + 7 + 8 ) / 7 = 5 ....... Sample Mean is denoted by the symbol x or x
Then we plug all of the known values into Sample Standard Deviation
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
s = sqrt [ (2-5 )2 + (3-5 )2 + (4-5 )2 +(5- 5 )2 + (6- 5 )2 + (7- 5 )2 + (8- 5 )2] / [7-1]
s = sqrt [ (-3 )2 + (-2 )2 +( -1 )2 +(0 )2 +(1 )2 + ( 2 )2 + ( 3 )2] / 6
s = sqrt [ 9 + 4 + 1 + 0 + 1 + 4 + 9 ] / 6 = 28 / 6 = 4.667
s = sqrt [4.667]
s = 2.16025
To verify use a Statistics Calculator http://easycalculation.com/statistics/standard-deviation.php
Sample Stat Calcluator http://www.miniwebtool.com/sample-standard-deviation-calculator
Note: The difference between the Standard Deviation formulas is to divide by (N) or (n-1)
Population Standard Deviation = σ = sqrt [ Σ ( xi - x )2 / (N) ]
Sample Standard Deviation = s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
We would enjoy working with you one-on-one. I eTutor on Monday & Tuesday morning from 8:45-10pm and Dan eTutors Friday & Saturday nights from 7-9:30pm
If neither of these works - then continue to ask questions and I'll find you Training videos and Tutorials to help you. Start with the Intro to Statistics series - they will give you a solid foundation. EVEN IF as you begin to watch these videos you don't get every concept - keep up - the exposure and hearing another person besides your instructor talk about Math and Statistics is invaluable. You need to expose yourself to this - it is better than watching TV or playing video games - and if you do this each time you get nervous - I promise you will get more out of this course.
Khan Academy - Intro to Statistics there are 68 videos about 10 minutes each. http://www.youtube.com/watch?v=uhxtUt_-GyM&list=PLGcvh64d5a4HfGkm8O4SIqbrS5HFaaM3B
Standard Deviation http://www.youtube.com/watch?v=HvDqbzu0i0E
Statistics Sample vs. Population Mean http://www.youtube.com/watch?v=HvDqbzu0i0E
You may also want to experiment with listening and watching different teachers discuss Statistics. On the right hand column in YouTube they give you lots of SUGGESTIONS for different videos - there are MANY excellent teachers - if Khan is not the teacher for you or if you want some variation find the teacher that speaks to YOU! then subscribe to their channel.
NOTE: if your learning environment does not allow sound or if you are not an audio/voice learner, Khan and many other videos have Closed Captions. On the bottom right of each video is a .cc. that displays what is being said in text on the screen. I find this helpful even if I keep the sound on. It helps me recognize the words - especially when I am reading.
=================================================================================================================
Normal Distribution Bell Curve and Z-scores
Watch JilMac Loves Math http://www.youtube.com/watch?v=xcBSl06_Ai0
Simple Definition about the 68-95-99 rule http://www.youtube.com/watch?v=xgQhefFOXrM
There are many good videos that explain Statistics, both visually and conceptually. Most have CC - Closed Caption option if you learn better by seeing the words along with hearing the concepts explained to you along with seeing the math being done. The most famous - Math/Stats on line videos are from Salman Khan, founder of Khan Academy which has a play list of 68 - you can start at the beginning https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL1328115D3D8A2566&nohtml5=False CI https://www.youtube.com/watch?v=bekNKJoxYbQ&nohtml5=False
Z-score Formula = x (x-mean) / standard deviation MarginOfError or Error = z*σ/√n
Margin of Error (could be what you are solving for) = z*σ/√n n = (z*σ/error)2
CI of 90% = 1.645 z value Error is the discrepancy
CI of 95% = 1.960 z value or what you are trying to
CI of 98% = 2.326 z value Confidence for.
CI of 99% = 2.576 z value
CI of 50%=0.674 z value
CI of 80%=1.282 z value
CI Calculator http://www.mathcelebrity.com/chiconf.php?n=+70&variance=+4.84&conf=90&pl=Variance+Confidence+Interval
T-score is similar https://people.richland.edu/james/lecture/m170/tbl-t.html
Note: alpha = 1-CI
Note: alpha = 1-CI = 1-.90 = alpha for a CI of 90% = .10
Excel formula: =CHIINV(alpha,[n-1])
Animated Z-score Tables http://www.sfu.ca/personal/archives/richards/Table/Pages/Table1.htm
For Normal SMALL sample size of under 30 use a T-Score
for Normal Sample size of 30 or larger use the Z-Score
This Confidence Interval Calculator lets you choose from http://MathCelebrity.com
========================================================================================
Additional images of curves http://allpsych.com/researchmethods/distributions
Probability when mean & standard deviation are known
This link inspired me Probability when mean and standard deviation are known.
SAMPLE QUESTION: According to the car company Chevy, the Cobalt model car has a mean MPG of 32 with a standard deviation of 3.5.
What is the probability that a randomly selected Cobalt has a MPG that exceeds 34?
If 10 Cobalts are randomly selected, what is the probability that the MPG exceeds 34?
Mean μ = 32 mpg
Standard Deviation σ= 3.5
Observation or what you are trying to solve for X>34
X = what you are looking to compare - sometimes referred to as the "observed score" = 34
The first question asks you to find the probability P(x Greater Than 34 MPG) = P(X>34)
NORMAL RANDOM VARIABLE
If we consider this to be a Normal Distribution
With Mean μ=32 and a Standard Deviation σ = 3.5
To find the Probability of X where X > 34 which can be written as (X > 34)
You can use the Z Table to find the Probability of an occurrence (X sometimes referred to as the Observed Score) under a Bell Shaped curve
If you want to know how to Read a Z-Score Table to Compute Probability use this link http://www.had2know.com/academics/normal-distribution-table-z-scores.html
You can find the Z score with this formula z = (X - mean)/Standard Deviation
Z score = ( 34-32 / 3.5 ) = (2/3.5) = .57
Z score of .57 = .2157 <-- you look this up on the Z-Score Table
http://statstutorstl.blogspot.com/2010/07/z-table-gives-probabilty-distribution.html
If your Z score table is only showing half the score than add .5
You want to add .5 + .2157 = .7157 This is almost 72% you could have at LEAST 34mpg
But since we want to know the probability of getting MORE THAN 34mpg
We have to subtract this from 1 - .7157 = ____
I recommend that you watch some Video Training from:
Khan Academy - Probability & Stats videos
https://www.khanacademy.org/library#probability
Here are the videos that relate to NORMAL DISTRIBUTION
Definitions - http://math.bu.edu/people/nkatenka/MA113/Lecture_7_Sol.pdf
========================================================================================
Correlation Coefficients & Scatter Diagrams
is explained nicely by the web site Statistics How To
http://www.statisticshowto.com/articles/how-to-compute-pearsons-correlation-coefficients/
Sampling Distribution - formula is about 6 min in - but watch the whole thing it is good
. Khan Academy https://www.khanacademy.org/math/probability/statistics-inferential/sampling_distribution/v/sampling-distribution-example-problem
USING EXCEL to create ScatterPlot and Add Trendline http://youtu.be/6rOlGbLeQxI
Margin of Error
Margin of Error http://en.wikipedia.org/wiki/Margin_of_error
The graph shows that the Larger the Sample - the Smaller the Margin of error
This gives you a "gut feeling" when we apply this to the chart you are asked to complete
the Larger the Confidence 99% the smaller the Error Tolerance
as the Confidence Interval/Level goes down ... the Error Tolerance AND Margin of Error goes up
Statistical Symbols - http://www.statistics.com/statistical-symbols
shows us what the Formulas are for E and
for large samples
and is The critical value for a confidence level c.
From the Z chart http://www.napce.org/documents/research-design-yount/93_CritValueTables_4th.pdf
We know that the Confidence Interval of 99% = 2.576, 95% = 1,960 and 90%=1.645
Here is a calculator - so that you can check your work http://www.raosoft.com/samplesize.html
This is a "BALL PARK FIGURE AND it does not show you the formula to calculate the Number of Samples required for a particular Error Tolerance.
Confidence Intervals
Definition Stat http://stattrek.com/estimation/confidence-interval.aspx
Khan Academy: https://www.khanacademy.org/math/probability/statistics-inferential/confidence-intervals/v/confidence-interval-1
This video explains Confidence Intervals for One Mean: Determining the Required Sample Size. https://www.youtube.com/watch?v=7zcbVaVz_P8&src_vid=ktD7MX5tF7k&feature=iv&annotation_id=annotation_456215
http://www.youtube.com/watch?v=ktD7MX5tF7k
This M-Margin of Error Formula that i used is similar to the video
If you plug in the .1 Error Tolerance you get very similar
Confidence Interval for a Population Mean
x¯¯¯ ± z * s / sqr(n)
=====================
Rejection Region & Critical Value from Dan Lamay
https://www.youtube.com/watch?v=008WTBZCIe8&list=TLUXWqmNVnnwg
===========================================
Word Problems
I encourage students to do the exercises in the textbook, that have the answers in the back of the book - to get practice. To do the problems that have the answers is an excellent way for you to learn if you are approaching a problem correctly.
When I think of word problems, I think of them as real life. Ex. Your friend comes to tell you about an event that happened and they include all kinds of other information that does not really affect how you would approach the problem.
I like the web site Purple Math. They explain things clearly. Here is a chart that I think may help you in understanding what MATH functions you need to apply to certain WORDS in a Word Problem.
http://www.purplemath.com/modules/translat.htm
I'll use the example of Addition below
Addition
increased by
more than
combined, together
total of
sum
added to
I also have several sites that help me Refresh Statistics Skills:
Stat Trek: http://StatTrek.com is a nice site that is laid out well.
Dictionary: Many times it is helpful to look up terms that are in word problems, I will check http://stattrek.com/statistics/dictionary.aspx to see if Stat Trek has a good definition.
This is a page with formula's http://stattrek.com/statistics/formulas.aspx?Tutorial=Stat
With Word Problems - I break up the question into the variables that have been stated, and I write down what you are trying to solve for. Then I look to see IF there is a formula that uses the variables I have and can solve for what the problem is asking.
Your topic of: Central Tendency Variatin, Position wasn't in the StatTrek dictionary, but I Goggled it and found Cliff-Notes - which I likehttp://www.cliffsnotes.com/study_guide/Measures-of-Central-Tendency.topicArticleId-267532,articleId-267451.html
My Web page has my favorite links.
https://sites.google.com/site/jilmactraining/mathematics/statistics
I love training videos. So I use KhanAcademy & PatrickJustMath.
Another resource is the Calculators - the web based calculators let you "Try" things to see if you are close to the right formula/tool. It is not going to help you on a test, but it does speed up the learning process - by letting you try out different formulas.
And don't pass up the kids learning sites. They are a lot easier to understand and they are fun!
http://onlinestatbook.com/2- has a very good Search Bar in the upper right hand side.
Let's see what they say about your topic: Central Tendency Variatin, Position
Look over these resources.
I recommend that you cut & paste them into your word processor, spreadsheet or website - so you can easily access these links and not have to retype them!
=======================================================================
Combinations & Permutations
Finding the Number of Combinations - withOUT regard to ORDER
nCr = n! / (n-r)!r!
Finding the Number of Permutations - when ORDER is important
nPr = n! / (n-r)! where r<=n
The ! is called "Factorial" - it means Multiply by each number ex=4!=4*3*2*1
ex. nCr = 16C4 = 16! / (16-4)!4!
= 16! = 16 *15 * 14 * 13 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2
12! 4! 4 * 3 * 2 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2
= 16! = 16 *15 * 14 * 13 = 43680 = 1820
12! 4! 4 * 3 * 2 24
========================================================================
Frequencies
Frequency Distribution can be organized and viewed in a Frequency Table
This groups items into columns & rows where you can see the essential elements.
To see "How Often" the age range of 30-39 happens in this data, you need to add all the frequencies to get the total # of frequencies. Then you would select each range and it's frequency and divide it by the total number of frequencies to get the percentage %.
Relative frequency = frequency ÷ number of observations
Percentage frequency = relative frequency X 100 = f ÷ n X 100
the sample data for the example above could be: 31, 33, 40,45,46, 51,52,52,52,55,57,57,66,66,67,68,68,71
Degrees of Freedom formulas
The t distribution is a family of curves based on the Degrees of Freedom, which is a number related to the sample size.
How to find a t-score on the t-table when you don't have the Degrees of Freedom.
Chi -Square Tests
df (goodness-of-fit) = k-1 where k = the number of possible outcomes in the experiment
df (contingency table) = (r-1)(c-1) where r= #of rows in a table, c=#of columns in a table
F distribution & Tables
Df = s21 / s22
One-Way ANOVA table
df (between samples) = k-1 where k = number of samples
df (within samples) = T-k where T = number of treatments/type
t Distribution values: df = n-1 where n = sample size
Queueing Models - how to Calculate using Excel Add-Ons
OnLine TextBook - Operations Management http://bcs.wiley.com/he-bcs/Books?action=index&bcsId=5869&itemId=0470525908
Chapter 5-Service Design bcs.wiley.com/he-bcs/Books?action=chapter&bcsId=5869&itemId=0470525908&chapterId=62384Service Design
Overview of FormulasOverview of Formulas:
YouTube Video explaining Concepts & how to for ExcelYouTube Video explaining Concepts & how to for Excel https://www.youtube.com/watch?v=c2DymN34w04
Binomial Probability
A realistic estimate for the probability of an engine failing on a transatlantic flight is 1/14000. Using the probability and binomial probability formula to find the probabilities of 0,1,2,and 3 engine failures for a three-engine jet and the probabilities of 0,1, and 2 engine failures for two-engine jet.
Can you please explain how to find p ?
Mean = n*P
SD = SQRT( n * P * Q)
===========================================================
Hi
The key to understanding what a Binomial Distribution is, is to watch Khan Academy's series on Binomial Distribution
OK - now let's apply what we have learned from Khan to our question
The formula for a Binomial Probability Formula http://www.mathwords.com/b/binomial_probability_formula.htm
Formula:
Example:
P(k success in n trials) = ( n k) p k * q n-k
n = number of trials
k = number of successes
n – k = number of failures
p = probability of success in one trial
q = 1 – p = probability of failure in one trial
You are taking a 10 question multiple choice test. If each question has four choices and you guess on each question, what is the probability of getting exactly 7 questions correct?
n = 10
k = 7
n – k = 3
p = 0.25 = probability of guessing the correct answer on a question
q = 0.75 = probability of guessing the wrong answer on a question
0,1,2,and 3 engine failures for a three-engine jet.
This is a 3 engine plane - so it is the Probability of these engines failing on the same plane - they are related. We are asking ...
What is the probability that we have NO engine failures
our k = 0
n = 3 (this is the number of engines on the plane)
k = 0 (is how many of the engines fail) Khan called this X
n-k = number of failures (engine not failing) = 3-0 = 3
p = 1/14000
q = (1-14000) = 13999/14000
We plug this into our formula
P(k=0) = ( 3 0) * 1/140000 * (13999/14000) 3-0
( n x)=( 3 0)=( 3!/0!(3-0)!), note: 0! = 1 = ( 3!/1(3)!) = ( 3!/3!) = 1
P(k=0) = 1 * 1 * (13999/14000) 3
------------------------------------------------------
What is the probability that we have 1 engine failure
P(k=1) = first ( 3 1) * 1/140001 * (13999/14000) 3-1
( 3!/1!(3-1)!) = ( 3!/1!(2)!) = ( 3!/2!)
( 3!/ 2!) = 3*2*1 / 2*1 = 6/2 = 3
P(1) = 3 * (1/14000)1 * (13999/14000) 2 = .00007142857!
What is the probability that we have 2 engine failures
P(2) = first ( 3 2) = ( 3!/2!(3-2)!) = ( 3!/2!) = ( 3*2*1/2*1)= 3
P(2) = 3 * (1/14000)2 * (13999/14000) 2
What is the probability that we have 2 engine failures
P(3) = first ( 3 3) = ( 3!/3!(3-3)!) = ( 3!/2!) = ( 3*2*1/2*1)= 1
P(2) = 1 * (1/14000)0 * (13999/14000) 3
Now you plug in the next set and see what you get?
Test answers to a Binomial Probability Calculatorhttp://vassarstats.net/textbook/ch5apx.html
Their are also 2 great videos on using EXCEL for Binomial Probability Distribution
Part 1 https://www.youtube.com/watch?v=JsDgLF0Npww
Watch the beginning up to 3:50 minutes to see how to set up the spreadsheet. It is the same formula, but if you are more familiar with Excel and Spreadsheets the math may be easier. Personally I love Excel and how you can set things up to "See the Math".
The next question is "Probability distribution graph. I tried to use excel but I coudn't. I also tried to use statcrunch, and didn't show anything."
=BINOM.DIST() I use Excel 2007 and the formula is =BINOMDIST (without the .)
Both work the same.
Displaying a graph - is all about the numbers you enter into for Excel to create the graph.
Binomial Probability Distribution Excel Part 2
https://www.youtube.com/watch?v=KcywOUZDP8s
Watch the 1st half of this video up to 5:50 to explain how to take the information that Excel calculated for several Binomial Probabilities. And create a histogram chart/graph.
==========================================================================
Continuous Uniform Random Variable
A tube of tooth paste is 4.2 oz
We are told this is uniformly distributed, which is different than a normal distribution
The formula is f(x) = 1/(b-a) for a <x<b
since the amount of tooth paste can be between 0 and 4.2
f(x) = 1/(4.2-0) = 1/4.2
I like the way that Math StackExchange explained how to think about information that is "Uniformly Distributed" vs. "Normally Distributed. http://math.stackexchange.com/questions/657254/normally-distributed-random-numbers-vs-uniformly-distributed-random-number
AND
I like the way ASK draws a picture of a Uniformly Distributed function (non-bell shaped curve)
The PDF=Probability Density Function from a Uniformly Distribution will always be the same no matter what x is.
The PDF of the random variable Uniform(0, 4.2) at x = 3.0 is: .2380952380952381
The PDF of the random variable Uniform(0, 4.2) at x = 1.5 is: .2380952380952381
The formula for standard deviation is sqrt((b-a)2/12)
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm
and here is a video on it http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm although it might be a bit advanced - it does reinforce that the formula for the
standard deviation = sqrt((b-a)2/12) or b-a / sqrt(12)
Hypothis Testing & P-value
https://sites.google.com/site/jilmacmath/statistics/hypothis-testing
Survey Sampling - Undercoverage
http://stattrek.com/statistics/dictionary.aspx?definition=Undercoverage
From this definition we see that Underciverage is a type of selection bias . It occurs when some members of the population are inadequately represented in the sample.
A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats. Undercoverage is often a problem with convenience samples .
StatTrek also gives us a link to Bias in Survey Sampling: It has a link to a video, which is very good & very detailed. It explains the 1936 presidential election in much more detail. This video also has:
· CC - Closed Captioning - which I like because it types out everything the video says, so you can see the spelling and wording of terms.
· It has a Table of Contents/Menu, to find Specific Types of Bias: Random, Leading Questions, Social Desirability, Sample Size, Voluntary response bias, and so on - so you can go directly to a particular type of Bias.
· It can be enlarged to the size of your monitor - to clearly see