Class 15 (11/01/2012):
NY Times Census explorer: http://projects.nytimes.com/census/2010/explorer
Problems with iteration:
Expected Value:
* a game to play I've created
* roulette and EV
* powerball and when powerball is a good idea (and when it is not)
power.odds=c(55.41,110.81,706.43,360.14,12244.83,19087.53,648975.96,5153632.65,175223510)
power.payout=c(4,4,7,7,100,100,10000,1000000,25000000) #this is where the jackpot starts, $25 million
The Monte Carlo situation
Designing an experiment:
* when you have more money and time.
* subjects, factors, treatments
---see example, p. 226
* control group
---drug company example
* Design for Coke vs. Pepsi. Many people claim they can tell the difference in taste between Coca Cola and Pepsi. Design an experiment that will allow you to test whether or not one person in your group has this ability.
Goal: what is the goal of your experiment. This will vary a little from group to group, but is important when working with the rest of this information
Hypothesis: What do you believe will be the outcome of your experiment?
Ho:
Ha:
alpha:
Cautions: What problems do we need to look out for (for example, will the taste of Pepsi first change the taste of coke afterwards? How can we make sure the person isn’t a lucky guesser?)
Materials Needed: what will you need for this experiment. make sure this list is detailed--I need to get these things before Monday.
Procedure: this should be written step-by-step with nothing left out.
Class 13 (10/18):
Class 13 data set.
Class 12 (10/15): confidence intervals:
BMI data.
boxes data (scroll down if you need it again)
newts: 29 27 34 40 22 28 14 35 26 35 12 30 23 18 11 22 23 33
known pop sd = 8
seventh graders: 114 100 104 89 102 91 114 114 103 105 108 130 120 132 111 128 118 119 86 72 111 103 74 112 107 103 98 96 112 112 93
known pop sd = 15
NewtSalve2000: 25 26 30 21 22 19 23 33 22 21 20 19 22 28 29 28 22 18 18 21
known pop sd=8
comp screen (p 381): 23.2 21.2 28.9 27.7 29.1 27.3 16.1 22.6 25.6 34.2 23.9 26.8 20.5 34.3 21.4 32.6 26.2 34.1 31.5 24.6 23.0 28.6 24.4 28.1 41.3
known pop sd = 6
zscore<-function(x,mean,sd){
zsc=(x-mean)/sd
print(zsc)
}
pnorm(zscore)
Class Eleven (10/11): more r than you can shake a stick at.
goals in class:
* try to create a sweet table
* 2 way table
* importing libraries
* ggplot and the concepts of trying stuff out
* importing maps
* making a basic map out of census data and Vermont
* make a map of a different state with a different data issue
HW: re-read chapter 14. our plan is to hit this HARD in class on Monday.
find a data set to visualize. figure out how to visualize it. visualize it. (THIS IS DUE THURSDAY)
this is REALLY open ended.
I want an update as an e-mail before class Monday (I will send a reminder e-mail).
I will be available online all weekend for help, but at the same time, the point is to work through things when possible. So if you need help ask.
I may be able to help, or I might not (depending on what you want to do).
Use all tools (photoshop, r, excel, your best friends coding ability, whatever).
Stay Positive.
I always want a great finished product, but the process here is as important. That being in mind, here's what I want turned in:
1) What is your data set? Explain why this is interesting to you. Give me the source.
2) What do you believe visualizing this data will accomplish that just looking at a table doesn't show you? Explain why you chose this data set for visualization and what good that can do for the data.
3) What is your plan for visualizing it? This can be a couple of sentences or a rough sketch on the back of a bar napkin. But the object is to have in mind your goal of what this will look like.
4) What analysis are you doing of the information before you actually use it? Is this univariate or bivariate? Are there any influential points? Should you remove them or do you need them? Are there any considerations you have to take of the data before you start to prepare it?
5) Do that sweet thing you were thinking about in part 3. Bring something amazing forth into the world--or at least try to. But remember--amazing doesn't have to be Earth shattering. It can be something REALLY WELL DONE that's direct, to the point and simple.
if you're using r, give me finished code. If somewhere else, or in another fashion, explain what you did.
6) Post creation reflection:
Do you think your creation was a success? Explain why/why not.
What might you have done differently in the process that could have made your life easier?
Was there anything that could have been in these directions that would have made life any easier for you?
Class Ten (10/8):
HW: read chapter 11, do questions 11.14 - 11.21.
read chapter 14 (new stuff). do questions 14.24 - 14.33 <--this is where the ideas of the first four weeks start to REALLY be used.
Compare one sheets throughout the class.
first with own group
circulating different groups data
Discuss Stephen Few's dislike of circular objects
Sample from the boxes data again. here is it for your personal enjoyment:
boxes<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,12,4,5,12,8,4,4,16,4,5,10,4,5,4,10,16,16,8,6,4,5,9,10,3,12,4, 10,12,10,6,16,16,8,4,5,18,4,3,9,12,16,3,6,8,4,2,5,18,4,12,4,12,8,3,16,5,9,6,10,3,18,8,10,16,6,15,8,4, 18,10,4,2,5,8,16,6,9,12,4,9,18,8,8,8)
look at data, describe the data (use script, sample 1, get graph)
let's create a data set of sampling 10 (one each)
lets create a data set of sampling 25 (one each)
50? (make sure to clear out of values as we go to each one, otherwise we will have troubles)
while our individual samples may not be perfect, multiple samples over multiple sets will yield strangely consistant values.
walk through the sample script (here it is):
samplebox<-c()
for(i in 1:1000){samplebox<-c(samplebox,mean(sample(boxes,10)))}
samplebox
the more we sample, the closer we get to the mean.
IT IS IMPORTANT: THIS WORKS FOR NORMAL AND NON-NORMAL DATA EQUALLY BECASE WE ARE SETTLING AROUND THE MEAN. THIS MAY OR MAY NOT HAVE ACTUAL SIGNIFICANCE, HOWEVER.
consider: moneybag.town=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,100000)
sample.moneybag.town=c()
for(i in 1:1000){sample.moneybag.town<-c(sample.moneybag.town, mean(sample(moneybag.town,10000,replace=TRUE)))}
hist(sample.moneybag.town)
let's make the graph i did in class last week.
MAKE A PLAN OF WHAT I WANT IT TO LOOK LIKE.
start data:
put in excel
"" around words
numbers for Dem and Repub are guessed (no other real way).
want to get neutral
want to order in some way
create a table in r:
make a long list in the order that i want
create a table: http://www.cyclismo.org/tutorial/R/tables.html
label the table the way i want it to be: colnames(), rownames()
set it up in r and see what happens
...did we mess it up? check a basic barplot to look at it:
http://www.statmethods.net/graphs/bar.html
did it do what we wanted it to?
colors and legends and random oh my!
how to make an awesome graph in r (the trials and tribulations of Mundt in making that graph)
http://www.stats.bris.ac.uk/R/doc/contrib/refcard.pdf
stretch the ylim to accomodate the legend (or just make one in illustrator)
las=2
NOTHING WORKS PERFECT THE FIRST TIME.
after an hour ++:
barplot(a.m3, col=c("blue","white","red"), main="What Y'all Drivin?!", legend=rownames(a.m3), ylim=c(0,180), las=2, cex.names=0.8)
Class Nine (10/4):
HW:
Read "Save the Pies For Dessert" by Stephen Few ( http://www.perceptualedge.com/articles/08-21-07.pdf ) . Be prepared to discuss this on Monday.
take the information from your surveys (make sure the person with the data shares it with everyone). Make a one sheet with the data. Do your own so we can compare them in class.
IN CLASS:
Check out the Graphs we all made for class
Some sampling methods (and the joys of r)
Big issues in the reading:
sampling the 'correct' way
undercoverage
stratified samples
Pass out the surveys
tally results
consider the art of the one sheet (dashboard)
how to make an awesome graph in r (the trials and tribulations of Mundt in making that graph)
http://www.stats.bris.ac.uk/R/doc/contrib/refcard.pdf
stretch the ylim to accomodate the legend (or just make one in illustrator)
las=2
NOTHING WORKS PERFECT THE FIRST TIME.
after an hour ++:
barplot(a.m3, col=c("blue","white","red"), main="What Y'all Drivin?!", legend=rownames(a.m3), ylim=c(0,180), las=2, cex.names=0.8)
Class Eight (10/1):
on surveys and bias, oh my!
HW:
take the data and the information from this website:
http://campaignstops.blogs.nytimes.com/2012/04/15/let-the-nanotargeting-begin/
specifically one of the graphs that looks like this:
http://graphics8.nytimes.com/images/2012/04/13/opinion/0416edsall-chart5/0416edsall-chart5-jumbo-v2.png
you may choose any of the graphs that look like this.
your job is to make that data presentable to the rest of the world. Be as accurate as you can, though because of the way they show the data, you may need to guess a little bit at the actual numbers.
read chapter 8. Check your understanding with questions 8.16 - 8.24
Class Seven: (9/27):
* Why extrapolation can be dangerous.
* On lurking variables
* playing with several data sets (comfort with the data) --
Class Six (9/24):
* find exactly one correlation by hand
* show why outliers are such a big deal when talking about correlation
* find a correlation in r
* definition of a residual
* definition of a least squares regression line
* creating a line of best fit (least squares regression line) in r
* creating a plot with all of the points AND the line of best fit in r
* answer any questions pertaining to the assignment
* take a minute and contemplate the wonder that is statistics
Class Five (9/20):
note: your assignment will be available by 1 pm tomorrow (Friday). It will be due by midnight the following Thursday (11:59 pm 9.27.2012)
You scent graphs: a retrospective
Bad graphs:
* two bars when a sentence would do and other generalizations
Dealing with an excel file: test scores on stats tests
Boats and Manatees
Plotting Data with r
monsoon: 722.4 736.8 866.2 877.6 728.7 739.2 1020.5 887.0 852.4 784.7 792.2 806.4 869.4 803.8 958.1 793.3 810.0 653.1 735.6 785.0 861.3 784.8 823.5 976.2 868.6 923.4 858.1 913.6 955.9 896.6 750.6 898.5 863.0 913.8 920.8 885.8 922.8 748.3 836.9 938.4 716.8 781.0 804.0 843.9 911.3 930.5 709.6 963.0 760.0 826.4 885.5 951.1 903.1 908.7 904.0 983.6 740.2 857.0 743.2 857.3 777.9 1004.7 853.5 842.4 945.9 789.0 860.3 883.4 769.4 870.5 897.5 651.2 768.2 908.6 874.3 889.6 754.8 909.5 961.7 873.8 889.6 885.0 821.5 789.9 904.2 944.3 831.3 708.0 866.9 827.0 935.4 719.4 804.9 853.6 877.3 839.9 940.0 882.9 908.8 770.2
Class Four (9/17):
In the book: 3.15 - 3.24 . 3.30 - 3.34
1) Find a graph that is to generalized. Send it to me, explaining what you think should be changed.
2) Complete question 3.50 in r. Make your graph look nice. E-mail it to me.
Use and abuse of graphs
The normal curve
things you can do to data in r
fruit fly thorax length (in mms):
0.64 0.64 0.64 0.68 0.68 0.68 0.72 0.72 0.72 0.72 0.74 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.78 0.80 0.80 0.80 0.80 0.80 0.82 0.82 0.82 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.92 0.92 0.92 0.94
Class Three (9/13):
HW:
BOOK QUESTIONS: 2.21 - 2.24
1) Find one graph that uses some method to lie to you. It could be 3D, not telling the whole truth, etc. Try to find an interesting one. In other words, don't just type in 'bad graph'. I've seen all of those.
2) If you are excited about r, awesome. Take this data set...
baseballteam<-c(.285,.283,.274,.270,.270,.270,.268,.268,.266,.264,.263,.263,.263,.263,.262,.261,.260,.260,.260,.259,.258,.258,.258,.257,.255,.253,.252,.247,.242)
these are the averages for a baseball team. Find the following:
a) 5 number summary and mean
b) make a histogram. here is a place to help you:
c) make a bar plot. Here is a place to help you:
d) which method do you suggest for checking for outliers? Be STRONG in your conviction, and then check for outliers.
e) did you find any points you want to remove? make sure to be able to back your decision
3) Be able to explain in class what the 68-95-99.7 rule is and why it is useful.
4) Take the data from table 2.3 on amount customers spend on food with different scents in the air. How can you make this interesting? We will analyze it later, but for now just create some graphic that shows the difference and would attract someone to viewing the data.
Illusions of Graphics
Finding the sd by hand
Finding the sd not by hand
Intro to r
pigs: 43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 92 97 99 99 100 100 101 102 102 102 103 104 107 108 109 113 114 118 121 123 126 128 137 138 139 144 145 147 156 162 174 178 179 184 191 198 211 214 243 249 329 380 403 511 522 598
Class Two:
HW:
read Ch 2 - 3 (pay special attention to standard deviation and what defines a normal curve)
2.15 - 2.20, 2.25
use the data below in the file labelled 'pets'. (its in a .csv -- all spreadsheet programs should be able to read it) to create some type of visual to share with the class on Thursday. Consider the interest of the user as well as being able to access the data.
use the hot dog data (also below) to:
* check poultry for any possible outliers using 1.5*IQR method. If there are any outliers, remove them before making the box plots below.
* create three stacked box plots. Use those to compare and contrast the differences in the three types. they should all have the same scale.
HOT DOG! (calories per dog)
Beef
186 181 176 149 184 190 158 139 175 148 152 111 141 153 190 157 131 149 135 132
Meat
173 191 182 190 172 147 146 139 175 136 179 153 107 195 135 140 138
Poultry
129 132 102 106 94 102 87 99 170 113 135 142 86 143 152 146
Class One:
* Introduction of Course
* Gathering of Data
* Qualitative vs. Quantitative Data
* Histograms, Stem Plots, Bar Plots, Pie Charts
* Data Set to Deal With
HW:
Thursday, September 6: read Chapter 1 and 2. questions: 1.13 - 1.22 (these are multiple choice). 1.26, 1.33, 1.38