Question One: Data from classmates
Ask 25 different students who go to BBA what time they go to bed and what time they wake up on a specific day (specify Monday, Tuesday, etc. so the data is all about one evening--there are big differences usually between a Saturday and a Tuesday, so we're trying to avoid graphing that).
a. Using the information collected, create a third set of data of how long each person slept.
b. Create a histogram, 5 point summary, and boxplot for the data created in part a (how long each person slept). If there are any influential points, identify them. If you believe them to be outliers, make a case and then remove them from your data.
c. Create ONE GRAPHIC that easily shows when each preson went to sleep, when they woke up, and how long they slept. Try to use r to do this--but it will stretch your abilities. If not possible, send me the data and work from parts a and b, and then make a graphic by hand and give it to me by the end of the day Friday.
Question Two: Baseball Part One
Here is a list of the homerun leaders for baseball from 1876 to 2003:
5 3 4 9 6 7 7 10 27 11 11 17 14 20 14 16 13 19 18 17 13 11 15 25 12 16 16 13 10 9 12 10 12 9 10 21 14 19 19 24 12 12 11 29 54 59 42 41 46 39 47 60 54 46 56 46 58 48 49 36 49 46 58 35 43 37 36 34 33 28 44 51 40 54 47 42 37 47 49 51 52 44 47 46 41 61 49 45 49 52 49 44 44 49 45 48 40 44 36 38 38 52 46 48 48 31 39 40 43 40 40 49 42 47 51 44 43 46 43 50 52 58 70 65 50 73 57 47
a) put this data into a list.
b) create a time plot of this data by doing plot(data) . Fill in the main title, and then describe the trend of the data overall.
* for additional fun: plot(data, pch=16, col="red")
c) create a summary for the data like we are used to. Are there any suspected outliers? Is there a way to figure out what year those possible outliers happened (if there are any)?
d) WWII happened from 1942 to 1945. What affect did that have on homerun totals?
Question Three: more homeruns.
Let's talk Barry Bonds. Here are his homerun totals from 1986 - 2007:
bonds=c(16,25,24,19,33,25,34,46,37,33,42,40,37,34,49,73,46,45,45,5,28,26)
a) run a summary of the data, and find any possible outliers.
b) the 5 in the data set is really low. find a reason that the total was so low, and see whether or not you should remove the data point, regardless of whether or not it is an outlier. (hint--internet research is your friend).
c) compare the mean and the median of this data with the 73 included and excluded. Give a few sentences explaining the effect of high values on these two measures of central tendency.
Question Four: Fuel Economy
As of 2008, the average fuel economy of cars in America was roughly N(18.7, 4.3). [outliers high and low have been removed in the finding of this data]. Data found here: http://www.fueleconomy.gov/feg/feg2008.pdf
Here are a couple of cars and their MPG:
2008 Chevy Malibu: 25 MPG
2008 Aston Martin V8 Vantage: 14 MPG
2008 Dodge Durango (4W): 10 MPG
2008 Honda Fit: 30 MPG
for each car:
a) find the z-score for gas mileage
b) find their percentile (what percent of cars to the get better gas mileage than)?
c) what percentage of cars get gas mileage between the Durango and the Aston Martin?
d) in the data, find a car you might be interested in. What percent of cars get better gas mileage than the one you selected?
Questions from your book: (these are also part of your data set)
p. 127: 34, 35, 37, 43