Part One: Discovering outliers.
There are two methods we use for finding an outlier: 1.5IQR and 2*sd.
Some data for us to consider:
data used:
mlb.teams=c(228, 216, 165, 150, 148, 140, 127, 119, 117, 115, 114, 114, 107, 104, 90, 89, 89, 83, 81, 80, 78, 76, 73, 72, 72, 67, 60, 58, 36, 22)
...what is a standard deviation? REALLY?
let's break it down with 5 numbers: 1792 1666 1362 1614 1460 1867 1439
bodytemp: 96.3 96.7 96.9 97.0 97.1 97.1 97.1 97.2 97.3 97.4 97.4 97.4 97.4 97.5 97.5 97.6 97.6 97.6 97.7 97.8 97.8 97.8 97.8 97.9 97.9 98.0 98.0 98.0 98.0 98.0 98.0 98.1 98.1 98.2 98.2 98.2 98.2 98.3 98.3 98.4 98.4 98.4 98.4 98.5 98.5 98.6 98.6 98.6 98.6 98.6 98.6 98.7 98.7 98.8 98.8 98.8 98.9 99.0 99.0 99.0 99.1 99.2 99.3 99.4 99.5 96.4 96.7 96.8 97.2 97.2 97.4 97.6 97.7 97.7 97.8 97.8 97.8 97.9 97.9 97.9 98.0 98.0 98.0 98.0 98.0 98.1 98.2 98.2 98.2 98.2 98.2 98.2 98.3 98.3 98.3 98.4 98.4 98.4 98.4 98.4 98.5 98.6 98.6 98.6 98.6 98.7 98.7 98.7 98.7 98.7 98.7 98.8 98.8 98.8 98.8 98.8 98.8 98.8 98.9 99.0 99.0 99.1 99.1 99.2 99.2 99.3 99.4 99.9 100.0 100.8
mundt=c(31, 34, 32, 32, 29, 37, 32, 32, 39, 35, 39, 35, 30, 27, 29, 34, 30, 32, 32) #this is your data
but what defines an outlier? Try this: http://stats.stackexchange.com/a/60240
most general rule of thumb: if the data is skewed, use 1.5IQR. If the data is "normal", try 2*sd.
In class defintions:
5 point summary
IQR
determining outliers
standard deviation
summation
x bar
----
Day Two: Practice in the Use of Removing Outliers.
...how do we know when to identify, remove, and excise data from what we've collected? When do we just leave it? What method should we use? Why are we asking so many questions?
The best way to tackle this is to try a bunch of problems. To that end, here are the ones we are going to do in class.
---
Data Set 02: The first you will be graded on ...