Statistics

JilMac's favorite Statistics Resources:

Defining the variables so you can plug them into formulas is made a little easier when you realize there are two sets of Variables, Formulas and Rules depending on the data you are working with. These two sets are:

Population - all the data

Sample - a sub set of the Population, there are also distinctions between:

some terms for samples: Large Sample, Small Sample & Random Sample

Formulas & Symbols are different for POPULATIONS and SAMPLES

Mean (= average)

Standard Deviation

Size

Variance

Proportion w/attribute

Proportion wo/attribute

Confidance Interval

Population

μ or X or X (mu)

σ (sigma)

Sample

x or x or or μx

s

s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

-or- s = σ /sqrt(N)

Z=Score = Standard Deviation

n

s2

σ = sqrt [ Σ ( Xi - X )2 / N ]

Z-Score of .3413 x 2 = .6826 = 68%

Z=Score = Standard Deviation

N

σ2

σ2 = Σ ( Xi - X )2 / N

s2 = Σ ( xi - x )2 / ( n - 1 )

P or

p

q (1-p)

pi

(pi)

Q(1-P)

alpha symbol

(alpha) alpha = 1-CI

ex. CI of 95% = alpha of 1-.95 = .05

Stat Trek's overview http://stattrek.com/sampling/populations-and-samples.aspx?Tutorial=AP

So their formula's are slightly different although when you are first learning it seams as though they are interchangeable

You can look up what a term means http://stattrek.com/statistics/dictionary.aspx

Statistic Formulas http://stattrek.com/statistics/formulas.aspx?Tutorial=Stat

Statistics Notation http://stattrek.com/statistics/notation.aspx

Standard Deviation

It has the formula for Standard Deviation from SAMPLE data (vs. Population data)

What is Standard Deviation and what do you use it for https://www.youtube.com/watch?v=N9LvWy1IGPY

Statisticians often use simple random samples to estimate the standard deviation of a population, based on sample data. Given a simple random sample, the best estimate of the standard deviation of a population is:

s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

s = sample standard deviation

x = sample mean

xi = is the ith element from the sample

n = the number of elements in the sample.

-----------------------------------------------------------------------------------------------

Let's use an example for Sample Standard Deviation to understand your (x-x)2 question which was really ( xi - x )2

Example: A sample consists of seven observations: {2, 3, 4, 5, 6, 7, 8}. What is the Standard Deviation?

How you can solve this: Our Standard Deviation formula needs the mean so let's find the mean:

x = (2 + 3 + 4 + 5 + 6 + 7 + 8 ) / 7 = 5 ....... Sample Mean is denoted by the symbol x or x

Then we plug all of the known values into Sample Standard Deviation

s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

s = sqrt [ (2-5 )2 + (3-5 )2 + (4-5 )2 +(5- 5 )2 + (6- 5 )2 + (7- 5 )2 + (8- 5 )2] / [7-1]

s = sqrt [ (-3 )2 + (-2 )2 +( -1 )2 +(0 )2 +(1 )2 + ( 2 )2 + ( 3 )2] / 6

s = sqrt [ 9 + 4 + 1 + 0 + 1 + 4 + 9 ] / 6 = 28 / 6 = 4.667

s = sqrt [4.667]

s = 2.16025

To verify use a Statistics Calculator http://easycalculation.com/statistics/standard-deviation.php

Sample Stat Calcluator http://www.miniwebtool.com/sample-standard-deviation-calculator

Note: The difference between the Standard Deviation formulas is to divide by (N) or (n-1)

Population Standard Deviation = σ = sqrt [ Σ ( xi - x )2 / (N) ]

Sample Standard Deviation = s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

We would enjoy working with you one-on-one. I eTutor on Monday & Tuesday morning from 8:45-10pm and Dan eTutors Friday & Saturday nights from 7-9:30pm

If neither of these works - then continue to ask questions and I'll find you Training videos and Tutorials to help you. Start with the Intro to Statistics series - they will give you a solid foundation. EVEN IF as you begin to watch these videos you don't get every concept - keep up - the exposure and hearing another person besides your instructor talk about Math and Statistics is invaluable. You need to expose yourself to this - it is better than watching TV or playing video games - and if you do this each time you get nervous - I promise you will get more out of this course.

Khan Academy - Intro to Statistics there are 68 videos about 10 minutes each. http://www.youtube.com/watch?v=uhxtUt_-GyM&list=PLGcvh64d5a4HfGkm8O4SIqbrS5HFaaM3B

Standard Deviation http://www.youtube.com/watch?v=HvDqbzu0i0E

Statistics Sample vs. Population Mean http://www.youtube.com/watch?v=HvDqbzu0i0E

You may also want to experiment with listening and watching different teachers discuss Statistics. On the right hand column in YouTube they give you lots of SUGGESTIONS for different videos - there are MANY excellent teachers - if Khan is not the teacher for you or if you want some variation find the teacher that speaks to YOU! then subscribe to their channel.

NOTE: if your learning environment does not allow sound or if you are not an audio/voice learner, Khan and many other videos have Closed Captions. On the bottom right of each video is a .cc. that displays what is being said in text on the screen. I find this helpful even if I keep the sound on. It helps me recognize the words - especially when I am reading.

=================================================================================================================

Normal Distribution Bell Curve and Z-scores

Watch JilMac Loves Math http://www.youtube.com/watch?v=xcBSl06_Ai0

Simple Definition about the 68-95-99 rule http://www.youtube.com/watch?v=xgQhefFOXrM

There are many good videos that explain Statistics, both visually and conceptually. Most have CC - Closed Caption option if you learn better by seeing the words along with hearing the concepts explained to you along with seeing the math being done. The most famous - Math/Stats on line videos are from Salman Khan, founder of Khan Academy which has a play list of 68 - you can start at the beginning https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL1328115D3D8A2566&nohtml5=False CI https://www.youtube.com/watch?v=bekNKJoxYbQ&nohtml5=False

Z-score Formula = x (x-mean) / standard deviation MarginOfError or Error = z*σ/√n

Margin of Error (could be what you are solving for) = z*σ/√n n = (z*σ/error)2

CI of 90% = 1.645 z value Error is the discrepancy

CI of 95% = 1.960 z value or what you are trying to

CI of 98% = 2.326 z value Confidence for.

CI of 99% = 2.576 z value

CI of 50%=0.674 z value

CI of 80%=1.282 z value

CI Calculator http://www.mathcelebrity.com/chiconf.php?n=+70&variance=+4.84&conf=90&pl=Variance+Confidence+Interval

T-score is similar https://people.richland.edu/james/lecture/m170/tbl-t.html

Note: alpha = 1-CI

Excel formula for Confidence Interval

Note: alpha = 1-CI = 1-.90 = alpha for a CI of 90% = .10

Excel formula: =CHIINV(alpha,[n-1])

Animated Z-score Tables http://www.sfu.ca/personal/archives/richards/Table/Pages/Table1.htm

For Normal SMALL sample size of under 30 use a T-Score

for Normal Sample size of 30 or larger use the Z-Score

This Confidence Interval Calculator lets you choose from http://MathCelebrity.com

https://www.mathcelebrity.com/normconf.php?

========================================================================================

Probability when mean & standard deviation are known

This link inspired me Probability when mean and standard deviation are known.

SAMPLE QUESTION: According to the car company Chevy, the Cobalt model car has a mean MPG of 32 with a standard deviation of 3.5.

What is the probability that a randomly selected Cobalt has a MPG that exceeds 34?

If 10 Cobalts are randomly selected, what is the probability that the MPG exceeds 34?

Mean μ = 32 mpg

Standard Deviation σ= 3.5

Observation or what you are trying to solve for X>34

Z-scoreFormula

X = what you are looking to compare - sometimes referred to as the "observed score" = 34

The first question asks you to find the probability P(x Greater Than 34 MPG) = P(X>34)

NORMAL RANDOM VARIABLE

If we consider this to be a Normal Distribution

With Mean μ=32 and a Standard Deviation σ = 3.5

To find the Probability of X where X > 34 which can be written as (X > 34)

You can use the Z Table to find the Probability of an occurrence (X sometimes referred to as the Observed Score) under a Bell Shaped curve

If you want to know how to Read a Z-Score Table to Compute Probability use this link http://www.had2know.com/academics/normal-distribution-table-z-scores.html

You can find the Z score with this formula z = (X - mean)/Standard Deviation

Z score = ( 34-32 / 3.5 ) = (2/3.5) = .57

Z score of .57 = .2157 <-- you look this up on the Z-Score Table

http://statstutorstl.blogspot.com/2010/07/z-table-gives-probabilty-distribution.html

If your Z score table is only showing half the score than add .5

https://engineering.purdue.edu/~engr116/ENGR19500H_spr/General_Course_Information/Common/z-table2.jpg

You want to add .5 + .2157 = .7157 This is almost 72% you could have at LEAST 34mpg

But since we want to know the probability of getting MORE THAN 34mpg

We have to subtract this from 1 - .7157 = ____

I recommend that you watch some Video Training from:

Khan Academy - Probability & Stats videos

https://www.khanacademy.org/library#probability

Here are the videos that relate to NORMAL DISTRIBUTION

https://www.khanacademy.org/math/probability/statistics-inferential/normal_distribution/v/introduction-to-the-normal-distribution

Definitions - http://math.bu.edu/people/nkatenka/MA113/Lecture_7_Sol.pdf

========================================================================================

Correlation Coefficients & Scatter Diagrams

is explained nicely by the web site Statistics How To

http://www.statisticshowto.com/articles/how-to-compute-pearsons-correlation-coefficients/

Sampling Distribution - formula is about 6 min in - but watch the whole thing it is good

. Khan Academy https://www.khanacademy.org/math/probability/statistics-inferential/sampling_distribution/v/sampling-distribution-example-problem

USING EXCEL to create ScatterPlot and Add Trendline http://youtu.be/6rOlGbLeQxI

To add a trend line for a Correlation Scatter Plot

Margin of Error

Margin of Error http://en.wikipedia.org/wiki/Margin_of_error

The graph shows that the Larger the Sample - the Smaller the Margin of error

This gives you a "gut feeling" when we apply this to the chart you are asked to complete

the Larger the Confidence 99% the smaller the Error Tolerance

as the Confidence Interval/Level goes down ... the Error Tolerance AND Margin of Error goes up

Statistical Symbols - http://www.statistics.com/statistical-symbols

shows us what the Formulas are for E and

for large samples

and is The critical value for a confidence level c.

From the Z chart http://www.napce.org/documents/research-design-yount/93_CritValueTables_4th.pdf

We know that the Confidence Interval of 99% = 2.576, 95% = 1,960 and 90%=1.645

Here is a calculator - so that you can check your work http://www.raosoft.com/samplesize.html

This is a "BALL PARK FIGURE AND it does not show you the formula to calculate the Number of Samples required for a particular Error Tolerance.

Confidence Intervals

Definition Stat http://stattrek.com/estimation/confidence-interval.aspx

Khan Academy: https://www.khanacademy.org/math/probability/statistics-inferential/confidence-intervals/v/confidence-interval-1

This video explains Confidence Intervals for One Mean: Determining the Required Sample Size. https://www.youtube.com/watch?v=7zcbVaVz_P8&src_vid=ktD7MX5tF7k&feature=iv&annotation_id=annotation_456215

http://www.youtube.com/watch?v=ktD7MX5tF7k

This M-Margin of Error Formula that i used is similar to the video

If you plug in the .1 Error Tolerance you get very similar

Confidence Interval for a Population Mean

x¯¯¯ ± z * s / sqr(n)

=====================

Rejection Region & Critical Value from Dan Lamay

https://www.youtube.com/watch?v=008WTBZCIe8&list=TLUXWqmNVnnwg

===========================================

Word Problems

I encourage students to do the exercises in the textbook, that have the answers in the back of the book - to get practice. To do the problems that have the answers is an excellent way for you to learn if you are approaching a problem correctly.

When I think of word problems, I think of them as real life. Ex. Your friend comes to tell you about an event that happened and they include all kinds of other information that does not really affect how you would approach the problem.

I like the web site Purple Math. They explain things clearly. Here is a chart that I think may help you in understanding what MATH functions you need to apply to certain WORDS in a Word Problem.

http://www.purplemath.com/modules/translat.htm

I'll use the example of Addition below

Addition

increased by

more than

combined, together

total of

sum

added to

I also have several sites that help me Refresh Statistics Skills:

Stat Trek: http://StatTrek.com is a nice site that is laid out well.

Dictionary: Many times it is helpful to look up terms that are in word problems, I will check http://stattrek.com/statistics/dictionary.aspx to see if Stat Trek has a good definition.

This is a page with formula's http://stattrek.com/statistics/formulas.aspx?Tutorial=Stat

With Word Problems - I break up the question into the variables that have been stated, and I write down what you are trying to solve for. Then I look to see IF there is a formula that uses the variables I have and can solve for what the problem is asking.

Your topic of: Central Tendency Variatin, Position wasn't in the StatTrek dictionary, but I Goggled it and found Cliff-Notes - which I likehttp://www.cliffsnotes.com/study_guide/Measures-of-Central-Tendency.topicArticleId-267532,articleId-267451.html

My Web page has my favorite links.

https://sites.google.com/site/jilmactraining/mathematics/statistics

I love training videos. So I use KhanAcademy & PatrickJustMath.

Another resource is the Calculators - the web based calculators let you "Try" things to see if you are close to the right formula/tool. It is not going to help you on a test, but it does speed up the learning process - by letting you try out different formulas.

And don't pass up the kids learning sites. They are a lot easier to understand and they are fun!

http://onlinestatbook.com/2- has a very good Search Bar in the upper right hand side.

Let's see what they say about your topic: Central Tendency Variatin, Position

http://www.google.com/cse?cx=007785934265749776327%3Askmuxtecv14&ie=UTF-8&q=Central+Tendency+Variatin%2C+Position&sa=Search#gsc.tab=0&gsc.q=Central%20Tendency%20Variation%2C%20Position

Look over these resources.

I recommend that you cut & paste them into your word processor, spreadsheet or website - so you can easily access these links and not have to retype them!

=======================================================================

Combinations & Permutations

Finding the Number of Combinations - withOUT regard to ORDER

nCr = n! / (n-r)!r!

Finding the Number of Permutations - when ORDER is important

nPr = n! / (n-r)! where r<=n

The ! is called "Factorial" - it means Multiply by each number ex=4!=4*3*2*1

ex. nCr = 16C4 = 16! / (16-4)!4!

= 16! = 16 *15 * 14 * 13 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2

12! 4! 4 * 3 * 2 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2

= 16! = 16 *15 * 14 * 13 = 43680 = 1820

12! 4! 4 * 3 * 2 24

========================================================================

Frequencies

Frequency Distribution can be organized and viewed in a Frequency Table

This groups items into columns & rows where you can see the essential elements.

To see "How Often" the age range of 30-39 happens in this data, you need to add all the frequencies to get the total # of frequencies. Then you would select each range and it's frequency and divide it by the total number of frequencies to get the percentage %.

Relative frequency = frequency ÷ number of observations

Percentage frequency = relative frequency X 100 = f ÷ n X 100

the sample data for the example above could be: 31, 33, 40,45,46, 51,52,52,52,55,57,57,66,66,67,68,68,71

Degrees of Freedom formulas

The t distribution is a family of curves based on the Degrees of Freedom, which is a number related to the sample size.

How to find a t-score on the t-table when you don't have the Degrees of Freedom.

Chi -Square Tests

df (goodness-of-fit) = k-1 where k = the number of possible outcomes in the experiment

df (contingency table) = (r-1)(c-1) where r= #of rows in a table, c=#of columns in a table

F distribution & Tables

Df = s21 / s22

One-Way ANOVA table

df (between samples) = k-1 where k = number of samples

df (within samples) = T-k where T = number of treatments/type

t Distribution values: df = n-1 where n = sample size

Queueing Models - how to Calculate using Excel Add-Ons

OnLine TextBook - Operations Management http://bcs.wiley.com/he-bcs/Books?action=index&bcsId=5869&itemId=0470525908

Chapter 5-Service Design bcs.wiley.com/he-bcs/Books?action=chapter&bcsId=5869&itemId=0470525908&chapterId=62384Service Design

Overview of FormulasOverview of Formulas:

YouTube Video explaining Concepts & how to for ExcelYouTube Video explaining Concepts & how to for Excel https://www.youtube.com/watch?v=c2DymN34w04

Binomial Probability

A realistic estimate for the probability of an engine failing on a transatlantic flight is 1/14000. Using the probability and binomial probability formula to find the probabilities of 0,1,2,and 3 engine failures for a three-engine jet and the probabilities of 0,1, and 2 engine failures for two-engine jet.

Can you please explain how to find p ?

    • Mean = n*P

    • SD = SQRT( n * P * Q)

===========================================================

Hi

The key to understanding what a Binomial Distribution is, is to watch Khan Academy's series on Binomial Distribution

https://www.khanacademy.org/math/probability/random-variables-topic/binomial_distribution/v/binomial-distribution-1

OK - now let's apply what we have learned from Khan to our question

The formula for a Binomial Probability Formula http://www.mathwords.com/b/binomial_probability_formula.htm

Formula:

Example:

P(k success in n trials) = ( n k) p k * q n-k

n = number of trials

k = number of successes

n – k = number of failures

p = probability of success in one trial

q = 1 – p = probability of failure in one trial

You are taking a 10 question multiple choice test. If each question has four choices and you guess on each question, what is the probability of getting exactly 7 questions correct?

n = 10

k = 7

n – k = 3

p = 0.25 = probability of guessing the correct answer on a question

q = 0.75 = probability of guessing the wrong answer on a question

0,1,2,and 3 engine failures for a three-engine jet.

This is a 3 engine plane - so it is the Probability of these engines failing on the same plane - they are related. We are asking ...

What is the probability that we have NO engine failures

our k = 0

n = 3 (this is the number of engines on the plane)

k = 0 (is how many of the engines fail) Khan called this X

n-k = number of failures (engine not failing) = 3-0 = 3

p = 1/14000

q = (1-14000) = 13999/14000

We plug this into our formula

P(k=0) = ( 3 0) * 1/140000 * (13999/14000) 3-0

( n x)=( 3 0)=( 3!/0!(3-0)!), note: 0! = 1 = ( 3!/1(3)!) = ( 3!/3!) = 1

P(k=0) = 1 * 1 * (13999/14000) 3

------------------------------------------------------

What is the probability that we have 1 engine failure

P(k=1) = first ( 3 1) * 1/140001 * (13999/14000) 3-1

( 3!/1!(3-1)!) = ( 3!/1!(2)!) = ( 3!/2!)

( 3!/ 2!) = 3*2*1 / 2*1 = 6/2 = 3

P(1) = 3 * (1/14000)1 * (13999/14000) 2 = .00007142857!

What is the probability that we have 2 engine failures

P(2) = first ( 3 2) = ( 3!/2!(3-2)!) = ( 3!/2!) = ( 3*2*1/2*1)= 3

P(2) = 3 * (1/14000)2 * (13999/14000) 2

What is the probability that we have 2 engine failures

P(3) = first ( 3 3) = ( 3!/3!(3-3)!) = ( 3!/2!) = ( 3*2*1/2*1)= 1

P(2) = 1 * (1/14000)0 * (13999/14000) 3

Now you plug in the next set and see what you get?

Test answers to a Binomial Probability Calculatorhttp://vassarstats.net/textbook/ch5apx.html

Their are also 2 great videos on using EXCEL for Binomial Probability Distribution

Part 1 https://www.youtube.com/watch?v=JsDgLF0Npww

Watch the beginning up to 3:50 minutes to see how to set up the spreadsheet. It is the same formula, but if you are more familiar with Excel and Spreadsheets the math may be easier. Personally I love Excel and how you can set things up to "See the Math".

The next question is "Probability distribution graph. I tried to use excel but I coudn't. I also tried to use statcrunch, and didn't show anything."

=BINOM.DIST() I use Excel 2007 and the formula is =BINOMDIST (without the .)

Both work the same.

Displaying a graph - is all about the numbers you enter into for Excel to create the graph.

Binomial Probability Distribution Excel Part 2

https://www.youtube.com/watch?v=KcywOUZDP8s

Watch the 1st half of this video up to 5:50 to explain how to take the information that Excel calculated for several Binomial Probabilities. And create a histogram chart/graph.

==========================================================================

Continuous Uniform Random Variable

A tube of tooth paste is 4.2 oz

We are told this is uniformly distributed, which is different than a normal distribution

The formula is f(x) = 1/(b-a) for a <x<b

since the amount of tooth paste can be between 0 and 4.2

f(x) = 1/(4.2-0) = 1/4.2

I like the way that Math StackExchange explained how to think about information that is "Uniformly Distributed" vs. "Normally Distributed. http://math.stackexchange.com/questions/657254/normally-distributed-random-numbers-vs-uniformly-distributed-random-number

AND

I like the way ASK draws a picture of a Uniformly Distributed function (non-bell shaped curve)

http://www.ask.com/wiki/Uniform_distribution_%28continuous%29?o=2801&qsrc=999&ad=doubleDown&an=apn&ap=ask.com

The PDF=Probability Density Function from a Uniformly Distribution will always be the same no matter what x is.

The PDF of the random variable Uniform(0, 4.2) at x = 3.0 is: .2380952380952381

The PDF of the random variable Uniform(0, 4.2) at x = 1.5 is: .2380952380952381

The formula for standard deviation is sqrt((b-a)2/12)

http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm

and here is a video on it http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm although it might be a bit advanced - it does reinforce that the formula for the

standard deviation = sqrt((b-a)2/12) or b-a / sqrt(12)

Hypothis Testing & P-value

https://sites.google.com/site/jilmacmath/statistics/hypothis-testing

Survey Sampling - Undercoverage

http://stattrek.com/statistics/dictionary.aspx?definition=Undercoverage

From this definition we see that Underciverage is a type of selection bias . It occurs when some members of the population are inadequately represented in the sample.

A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats. Undercoverage is often a problem with convenience samples .

StatTrek also gives us a link to Bias in Survey Sampling: It has a link to a video, which is very good & very detailed. It explains the 1936 presidential election in much more detail. This video also has:

· CC - Closed Captioning - which I like because it types out everything the video says, so you can see the spelling and wording of terms.

· It has a Table of Contents/Menu, to find Specific Types of Bias: Random, Leading Questions, Social Desirability, Sample Size, Voluntary response bias, and so on - so you can go directly to a particular type of Bias.

· It can be enlarged to the size of your monitor - to clearly see