The 1996 Bulls and the 2017 Warriors are the teams that many claim may be the best NBA history (though, go Celtics!). But the teams themselves were very different. Your task is to try to answer two questions:
Who performed better - the typical 1996 Chicago Bull perform better or the typical 2017 Golden State Warrior?
How has the game of basketball changed in the 20ish years between the two greatest teams of all time?
First, download the dataset "greatestNBA.csv" from here. Then import it into R Studio by clicking the button on the top right of RStudio that says "Import Dataset." Select "From Text (base)..." and then find the file on your computer. If you're getting a bunch of "V1, V2" etc at the top line, just check the little box that says "Heading - Yes".
Let's check out the data set...
Copy and paste the following into the console and run it.
str(greatestNBA)
summary(greatestNBA)
The first line shows what variables are in the dataset and what types they are. The second shows some statistics about different columns. Scroll through to see what variables are available to look at for the two teams.
Next, let's split the dataset into two smaller datasets, one for each team. Don't worry about the details here - we will practice that later - but after these two lines the '96 Bulls players will all be in "bulls" and the '17 Warriors players will all be in "Warriors"
bulls <- subset(greatestNBA, greatestNBA$team=="96CHI")
warriors <- subset(greatestNBA, greatestNBA$team=="17GSW")
Nothing will show up in the console because we are just making variables. But we can see what we did! Take a look at each dataset by just running "bulls" and "warriors" each on their own.
bulls
warriors
Things look good if the teams are separated.
Okay, now we need to take some statistics... Here's how to calculate the mean of the Warriors players points as an example:
mean(warriors$points)
mean() is the function warriors specifies the dataset and $points brings us to the points column. If you just tried mean(warriors), nothing would happen because there are many columns and if you just tried mean(points), it wouldn't work because you haven't said what dataset you are working with.
Okay, let's start recording what's happening. Instead of just putting everything in the console, let's keep track of our work in a script. Go to File >>> New File >>> R Script. Let's record our result as a comment. Anything after a # sign, the computer knows is not code, so we can make little notes to ourselves.
mean(warriors$points)
#8.182353
If those are in lines 1 and 2 of your script, copy and paste the line of code in line 1 to line 3 and change warriors to bulls. You can run code right from a script by putting your cursor anywhere on that line and clicking the little [ ]-> Run button to the top right of the script window. Once you get that to work, comment your answer on line 4.
Who performed better - the typical 1996 Chicago Bull perform better or the typical 2017 Golden State Warrior?
How has the game of basketball changed in the 20ish years between the two greatest teams of all time?
Gather pieces of evidence by exploring. You can use a bunch of different functions to measure things about the Bulls and the Warriors. Try these different functions below, and use them on various different columns to come up with an argument about which team had better players, and how the game has changed.
Functions to call on columns: mean(), median(), quantile(), IQR(), range()
Visualizations: You can also do boxplot(), and hist() to make boxplots and histograms. To plot two things on a box plot next to each other, try boxplot(thing1,thing2) (this only works with boxplots.
Scatter plot: You can do plot(numericalVariable1, numericalVariable2) to make a scatter plot (columns of the data will go in those spots).
More data to make the case, especially for point #2: Team stats for the NBA in 1996 and 2017 (use the same process of downloading them and importing them into R)
Keep track of everything that you do with comments showing the results!