Question One: Considering How Things Look.
Ask 25 different students who go to BBA what time they go to bed and what time they wake up on a specific day (specify Monday, Tuesday, etc. so the data is all about one evening--there are big differences usually between a Saturday and a Tuesday, so we're trying to avoid graphing that).
a. Using the information collected, create a third set of data of how long each person slept.
b. Create a histogram, 5 point summary, and boxplot for the data created in part a (how long each person slept). If there are any influential points, identify them. If you believe them to be outliers, make a case and then remove them from your data.
c. Create ONE GRAPHIC that easily shows when each person went to sleep, when they woke up, and how long they slept. I do NOT suggest you use r to do this--instead, consider a way so that we can see ALL of the information in one graphic. It should be:
no larger than an 8.5 x 11 sheet of paper
be organized in such a way that the data makes sense. For example, by length or sleep, or by time to bed, or by whatever. The data should not be hard to pull from the graph.
clear and concise
labelled properly, with all correct info
colorful
I should be able to look at your graphic and see--when people went to bed, when they woke up, and ALSO how long that person slept. HOW you do that is up to you, but by looking at your graph I should be able to say "Person 15 went to bed at 8 pm, woke up at 7 am, and slept 11 hours."
SOME POSSIBLE INSPIRATIONS (not directly related to the information at hand, but a possible way to look at the data. Ask: what does each graph tell me?):
http://www.danasilver.org/2014/02/25/sochi-2014-athletes/
http://i.imgur.com/eBXtDBa.jpg
http://media.economist.com/sites/default/files/cf_images/20050625/CUS159.gif
http://i.imgur.com/O609WgA.png
not labelled well, but is for 'ride share' bikes in Chicago: http://i.imgur.com/yAbVe3e.png
http://road.cc/sites/default/files/imagecache/galleria_900_nocrop/images/News/Eurobarometer%20cycling%20chart%202013%20(source%20TNS,%20Eurostat).jpg
http://i.imgur.com/BrUbsVW.jpg
Question Two: Baseball Part One
Here is a list of the home run leaders for baseball from 1876 to 2003:
5 3 4 9 6 7 7 10 27 11 11 17 14 20 14 16 13 19 18 17 13 11 15 25 12 16 16 13 10 9 12 10 12 9 10 21 14 19 19 24 12 12 11 29 54 59 42 41 46 39 47 60 54 46 56 46 58 48 49 36 49 46 58 35 43 37 36 34 33 28 44 51 40 54 47 42 37 47 49 51 52 44 47 46 41 61 49 45 49 52 49 44 44 49 45 48 40 44 36 38 38 52 46 48 48 31 39 40 43 40 40 49 42 47 51 44 43 46 43 50 52 58 70 65 50 73 57 47
a) put this data into a list.
b) create a time plot of this data by doing plot(data) . Fill in the main title, and then describe the trend of the data overall.
* for additional fun: plot(data, pch=16, col="red").
c) create a summary for the data like we are used to (graphically and numerically). Are there any suspected outliers? Is there a way to figure out what year those possible outliers happened (if there are any)?
d) WWII happened from 1942 to 1945. What affect did that have on home run totals?
e) Describe in complete sentences the differences between: a histogram, a time plot, and a box plot. Describe at least one thing for each that it does better than the others.
Question Three: more homeruns.
Let's talk Barry Bonds. Here are his homerun totals from 1986 - 2007:
bonds=c(16,25,24,19,33,25,34,46,37,33,42,40,37,34,49,73,46,45,45,5,28,26)
a) run a summary of the data, and find any possible outliers.
b) the 5 in the data set is really low. find a reason that the total was so low, and see whether or not you should remove the data point, regardless of whether or not it is an outlier. (hint--internet research is your friend. double hint: http://www.baseball-reference.com/).
c) compare the mean and the median of this data with the 73 included and excluded. Give a few sentences explaining the effect of high values on these two measures of central tendency.
Question Four: Fuel Economy
As of 2008, the average fuel economy of cars in America was roughly N(18.7, 4.3). [outliers high and low have been removed in the finding of this data]. Data found here: http://www.fueleconomy.gov/feg/feg2008.pdf
Here are a couple of cars and their MPG:
2008 Chevy Malibu: 25 MPG
2008 Aston Martin V8 Vantage: 14 MPG
2008 Dodge Durango (4W): 10 MPG
2008 Honda Fit: 30 MPG
for each car:
a) find the z-score for gas mileage
b) find their percentile (what percent of cars to the get better gas mileage than)?
c) what percentage of cars get gas mileage between the Durango and the Aston Martin?
d) in the data, find a car you might be interested in. What percent of cars get better gas mileage than the one you selected?
Question(s) Five: From the book.
You will be able to answer these using a z-table and without r. Make sure to show the pictures asked for each problem, and any work that you do (for example, finding a z-score). Please do the following problems:
p. 126: 29, 30, 34, 36, 38 (make sure you understand the idea of percentiles), 43