In 1941, Joe DiMaggio recoreded a hit in 56 straight games. This was an unbelievable feat - of both skill and luck!
Below is some code that I used to evaluate how likely Joe DiMaggio's win streak was, just to see how lucky he was.
install.packages("runner") # installing a package - only run once ever
library(runner) # loading the package - only run once per session
JoeD <- c(rep(1,83),rep(0,17)) # creating a base probability
streaks <- c() # empty bucket to store seasons
for(i in 1:10000){
season<- sample(JoeD,162,replace=TRUE) # simulating his batting in 162 games
streaks[i] <- max(streak_run(season)) # seeing the maximum streak in each of the seasons
}
hist(streaks) # seeing a histogram of the max streaks
length(streaks[streaks>=56]) # testing how often this streak happened
# happened 7 times out of 10000, so 0.07%!!
Simulate the streak that you researched. How likely is the streak?
What's the most likely maximum streak according to your model?
What's the maximum streak that you found in your simulations?
What percentage of simulations had a streak as extreme as the one that happened in real life?
Make some changes in the situation and note how it changes the likeliness of the streak - for example, if Joe DiMaggio were a worse or better better, how would that change the odds of the streak?
Add a complication into your modeling to better account for reality - for example, I could simulate the other 750 players in the MLB at the time for the season and see if it's likely that just someone had a streak that long. I could also vary the probability he gets a hit based on who he is playing.
Here are some examples below of complications that have a few new coding techniques:
badPitcher <- c(rep(1,93), rep(0,7)) # making separate vectors for a bad or good pitcher
goodPitcher <- c(rep(1,73), rep(0,27))
streaks <- c()
for(i in 1:1000){
seasonBad <- sample(goodPitcher,81,replace=TRUE) # then drawing 81 games from each
seasonGood <- sample(badPitcher,81,replace=TRUE)
season <- sample(c(seasonBad,seasonGood)) # and shuffling them together before I find streaks
streaks[i] <- max(streak_run(season))
}
hist(streaks)
length(streaks[streaks>=56])
Same idea but using If/Then statements and a nested loop:
badPitcher <- c(rep(1,93), rep(0,7)) # making separate vectors for a bad or good pitcher
goodPitcher <- c(rep(1,73), rep(0,27))
streaks <- c()
season <- c()
for(i in 1:1000){
for (j in 1:162){ # outside loop to do 1000 seasons, inside loop for 162 games
flipCoin <- sample(c("H","T"), 1)
if (flipCoin == "H") {
season[j] <- sample(goodPitcher,1)
} else if (flipCoin == "F") {
season[j] <- sample(badPitcher,1)
}
streaks[i] <- max(streak_run(season))
}
}
hist(streaks)
length(streaks[streaks>=56])
Testing by at bats instead!
JoeD <- c(rep(1,400),rep(0,600)) # Using his batting average instead
streaks <- c()
hits <- c()
season <- c()
for(i in 1:1000){
for(j in 1:162){
for(k in 1:3){
hits[k] <- sample(JoeD, 1)
}
if (sum(hits) >=1 ) {
season[j] <- 1
} else {
season[j] <- 0
}
streaks[i] <- max(streak_run(season))
}
}
hist(streaks)
length(streaks[streaks>=56])
R MARKDOWN TEMPLATE >> Open up a new RMarkdown file (File >> New File >> RMarkdown) and then delete everything below the top part with your name, and copy and paste this below it.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## 0. Background.
Joe DiMaggio got a hit in 56 straight games in 1941. This is a very unlikely long streak, a record that hasn't been broken since. In fact, the closest anyone has gotten to this was the person whose streak Joe DiMaggio broke - 45 games in 1897.
To simulate this streak, let's assume that Joe DiMaggio is a 0.357 hitter, which was his batting average in that season in 1941. In that season, he got a hit in 83% of games, so we will use that as our underlying probability that he gets a hit in any game. We are going to simulate 10000 seasons and see how often in random trials that he got a hit in that many games in a row.
## 1. Simulating This Streak.
```{r}
library(runner)
JoeD <- c(rep(1,83),rep(0,17))
streaks <- c()
for(i in 1:10000){
season<- sample(JoeD,162,replace=TRUE)
streaks[i] <- max(streak_run(season))
}
hist(streaks, col="navy")
length(streaks[streaks>=56])
```
#### Summary:
This happens SUPER rarely. Almost never! It is very likely that he would get around a 20 game streak, and even a 40 game one seems like it is reasonable, but 56 times rarely or never happened when I would run the code.
## 2. Testing Assumptions.
Let's change the situation a bit, and make him a better hitter to see if that makes the streak even possible. Let's change it to a 90% probability that he
```{r}
library(runner)
JoeD <- c(rep(1,90),rep(0,10))
streaks <- c()
for(i in 1:10000){
season<- sample(JoeD,162,replace=TRUE)
streaks[i] <- max(streak_run(season))
}
hist(streaks, col="navy")
length(streaks[streaks>=56])
```
#### Summary:
This makes it more likely, but it is still rare (only about 3% of the time)
## 3. Coding a Complication.
In order to make the simulation more accurate to real life, let's try making Joe DiMaggio face pitchers who are sometimes good and sometimes bad. Maybe he only got the streak because he faced a bunch of bad pitchers in a row! In order to accomplish this, we will sample 81 games from a lower probability of getting a hit (73%) and 81 games from a higher probability of getting a hit (93%).
```{r}
badPitcher <- c(rep(1,93), rep(0,7))
goodPitcher <- c(rep(1,73), rep(0,27))
streaks <- c()
for(i in 1:1000){
seasonBad <- sample(goodPitcher,81,replace=TRUE)
seasonGood <- sample(badPitcher,81,replace=TRUE)
season <- sample(c(seasonBad,seasonGood))
streaks[i] <- max(streak_run(season))
}
hist(streaks, col="navy")
length(streaks[streaks>=56])
```
#### Summmary:
This doesn't really change the simulation very much. The peak of the histogram is still in the same spot and the tail looks pretty similar.
## 4. Conclusion.
Joe DiMaggio's streak is truly unbelievable! Even if we assume he's a much better hitter than he was, it was super unlikely that he would get a hit in 56 games in a row. Even if we assume there are bad pitchers in the mix, it's still very unlikely.