Contact me at the buttons in the footer!
Experienced Data Scientist
#RStats
%>% it up
a subsidiary of Tokio Marine Group
Master of Business Administration (MBA)
Simpsons Paradox in statistics: an effect that occurs when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more variables.
Example:
Two baseball players in two seasons
Player 1 Batting Average in 2020 : .253
Player 1 Batting Average in 2021 : .321
Player 2 Batting Average in 2020 : .250
Player 2 Batting Average in 2021 : .314
Player 1 Combined 2020-2021 Batting Average : .270
Player 2 Combined 2020-2021 Batting Average : .310
How can this happen? Player 1 had less at bats in 2020 than in 2021. Providing different weights in the combined calculation.
Three gentlemen walk into a hotel late one evening for a stay. They ask for a double bed room for the night. The hotel clerk informs them that the hotel will cost $30 for the night. The three men decide to split the fee between them. Each gentlemen gives the hotel clerk $10. Later that evening, the hotel manager speaks with the hotel clerk and together they realize that they were running a weeknight special of $25 for the room. Immediately the hotel clerk reaches into the register and grabs $5 in one-dollar bills for the gentlemen's refund. The clerk gives the hotel bellhop the $5 in single bills. The hotel bellhop makes his way to the elevator towards the gentlemen's room. The bellhop, previously upset about his wage decides that he is going to skim $2 from the refund for himself as a tip. As nefarious as this act may be, he continues to give the additional $3 to the gentlemen. All together the gentlemen each got $1 back as a refund. Therefore they each paid $9 for the room for the evening. This totals to $27. If the total is $27 and the bellhop has $2, for a total of $29, what happened to the other $1 from the initial investment?