Most data analysis packages today that trip off our tongue incorporate Student's t-test, not least among those are Python, R, Microsoft Excel, Matlab, Stata, SPSS
This statistical approach was developed by William Sealy Gosset. He studied mathematics and chemistry at New College, Oxford, and then in 1899 commenced working for Guinness Breweries in Dublin. The following year, Guinness launched the "Guinness Research Laboratory," with a view to exploring scientific methods to improve standardization and reduce variability in terms of quality and costs incurred. Despite having a potato famine 40 or so years before, Ireland was, as is now, an Agricultural hub that supplied the Industrial Revolution and Dublin was important cog in sourcing fresh and processed food to Britain. A good deal of cereals were harvested in Ireland, water was never really in short supply and Dublin was accessible to all the major urban population centers with ports within the United Kingdom.
In 1907 Gosset took up the reins at Guinness’ Experimental Brewery, where he employed the "Student tables" to discern the right blend of barley optimal for best brewing conditions. Gosset, under the pseudonym Student, solved a problem which today we readily observe when testing new medicines/techniques and we need to established statistical significance perhaps where only a few participant in trials can be reliably measured. When the mean of a purportedly normally distributed population is not known and the sample size can be small and the population standard deviation is also unknown, we can still intuit thresholds for maintaining standards. Gosset posited that "Tables can be given by which it can judged whether a series of experiments, however short, have given a result which conforms to any required standard of accuracy or whether it is necessary to continues the investigation." This was pivotal for brewers who had to test limited batches of beer to ascertain quality. The t-distribution is widely employed in contemporary data science, for inspecting the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and for linear regression analysis.
The river Liffey was an important artery for transporting beer by barge up until 1961. James Gate was located at a shallower part of the river and larger craft could not negotiate waterways beyond Dublin Port. In an earlier incarnation, Guinness also was made from water from the Liffey which flowed from the Wicklow mountains. The major benefit to consumers was that beer was subjected to a process not unlike pasteurization when potable water could not always be assured.
# https://www.statmethods.net/advgraphs/probability.html
# Display the Student's t distributions with various
# degrees of freedom and compare to the normal distribution
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
degf <- c(1, 3, 8, 30)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of t Distributions")
for (i in 1:4){
lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
# https://www.youtube.com/watch?v=ANMuuq502rE&feature=youtu.be
# install.packages(gapminder)
library(gapminder)
data("gapminder")
summary(gapminder)
x <- mean(gapminder$lifeExp)
x
attach(gapminder)
hist(lifeExp)
boxplot(lifeExp ~ continent)
#install.packages(dplyr)
library(dplyr)
gapminder %>%
select(country, lifeExp) %>%
filter(country == "South Africa" |
country == "France") %>%
group_by(country) %>%
summarise(Average_Life = mean(lifeExp))
dflife <- gapminder %>%
select(country, lifeExp) %>%
filter(country == "South Africa" |
country == "France")
View(dflife)
t.test(data = dflife, lifeExp ~ country)