There are not many men in the world who are over 7 feet tall - less than 0.0003%. However, almost 2% of the NBA is >7 feet tall! You have a much higher probability of making the NBA if you are tall, even if you are a terrible athlete.
Is the same true for women and the WNBA? Your goal with this coding exercise is to try and figure out this probability.
<- Margo Dydek, the tallest woman to ever play in the NBA, measuring 7'2".
First, we are going to look at an entirely different dataset to try and learn the skills necessary to answer that question. Here are the finishers of the 2017 NJ Marathon, which was a marathon that I competed in! I was trying to qualify for the Boston marathon when I ran this. The qualifying time for men under the age of 35 was 3:05 (185 min) and the time for women under the age of 35 was 3:35 (215 min). Which of those is more unusual in their distribution? Here is how I'd go about that problem... The code you will use in this exercise is all old Bulls vs. Warriors code that you should update - that's what good coders do!
First, get a subset of men under (not equal to) 35 and another subset of women under 35. Here is some subset code that you should try and modify to do this work. **THIS IS OLD CODE FOR A DIFFERENT DATA SET BUT GOOD CODERS USE OLD CODE TO DO NEW TASKS. SO YOU SHOULD MODIFY THIS CODE*
bulls <- subset(greatestNBA, greatestNBA$team == "96CHI")
bigRebounders <- subset(bulls, greatestNBA$rebounds > 5)
# combining both subsets
bullsRebounders <- subset(greatestNBA, greatestNBA$team == "96CHI" & greatestNBA$rebounds > 5 )
2. Then, find the mean and standard deviation of each group's finishTime.
mean(bullsRebounders$rebounds)
sd(bullsRebounders$rebounds)
3. Use that to calculate the z-score of the qualifying times. Use pnorm() to determine the "percentile" of the qualifying time.
## (you write some code / do some calculations)
4. Also, do some some subsetting to find out how many people in each of the two groups actually DID get the qualifying time. You can either subset your subsets or make a new subset with another &. Calling length() on a column will tell you how many data points are in it.
length(bullsRebounders$rebounds)
Which qualifying team seems "easier"? Then check all your answers to the above in the box below.
CHECK ANSWERS HERE:
Women's qualifying time has a slightly lower Z score (-1.11 vs. -1.09). Both of these Z-scores correspond with about 13.5% of people qualifying (13.3% vs 13.8%).
In the dataset, 37 women and 56 men qualified, which was 11.7% and 15.3% of the people running respectively.
The women's qualifying time seems slightly harder, but more men did qualify.
Okay, now we are ready to investigate the WNBA players. Download the data here. We should have the tools to be able to investigate the situation now.
Using the WNBA data, and the information we learned yesterday about how to use pnorm() and qnorm() on distributions to find percentages, determine the following:
What percentage of WNBA players are taller than 6'5"? What percentage of women in the US should be taller than 6'5" (mean: 5'5", SD: 3.5")?Â
What percentage of WNBA players are between 6' and 6'5"? What percentage of women in the US should be in that range?
Does height correlate with any of the other stats? Try this sort of plot:
plot(wnbaPlayers$HEIGHT, wnbaPlayers$GP)