Class Two: Describing Our Univariate Data (02.29)
Goals:
Create List of Stats Terms Used in Our Readings
Evaluate Our Two Different Graphs
standard deviation, IQR: http://www.mathsisfun.com/data/skewness.html
What happens as data changes over time
mean and sd -- when to use, when to think
median and IQR -- when to use and when to think
Histogram vs. Bar Plot
Overall Idea for the direction of the class:
Create a list of all the 'technical' or 'math' words used in the readings.
Difference between skewed data and 'normal' data
mean salary vs. median salary.
how do we know if something is an outlier?
Univariate Charts Used:
FOR QUANTITATIVE:
histogram
stem plot
box and whisker plot
FOR QUALITATIVE
bar chart
pie chart
some data for us:
age guesses:
from my high school class:
MLB 2014 Data:
mlb.teams.2013=c(28, 38, 68, 72, 72, 76, 80, 82, 83, 85, 87, 91, 94, 94, 102, 104, 106, 110, 112, 116, 121, 131, 141, 142, 142, 154, 162, 177, 240, 242)
mlb.teams.2016=c(248, 222, 197, 192, 165, 164, 164, 149, 142, 139, 136, 135, 133, 131, 124, 120, 103, 99, 98, 92, 92, 89, 87, 84, 80, 78, 77, 63, 60, 50)
http://www.spotrac.com/mlb/payroll/
HW:
WSJ: Chapter 2 "Chart Smart"
Naked Statistics: Chapter 3 "Deceptive Description"
Stephen Few: Save the Pies for Dessert
Use the 'pet' data to create some type of graphic for us to look at/use
argument using statistics somewhere in your literature
what are the numbers used
what's the source
what other information can you find on that topic
---
the program I use to do these things is:
r-project.org <-- follow the steps for download
rstudio.com <-- download the free desktop version.
some things to do:
mine = seq(0,100,0.1)
plot(mine, dnorm(mine, mean=30, sd=5),ylim=c(0,0.1))
points(mine, dnorm(mine, mean=40, sd=5),ylim=c(0,0.1),col="blue")
---
easier:
curve(dnorm(x,10,2),ylim=c(0,1),xlim=c(0,50),lwd=5) <-- add=T is okay.