1.5. One sample t-test and their relation with two-sample t-test

T-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30) [1].

This kind of statistical test had and the proper situation to apply had been described at Track 08, section 1.6.

A numerical example of a T-test

Numerical example and its solution summary

The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is different from 12.44 minutes. A sample with 25 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently different?

Let's recall the development to solve the previously numerical example presented.

Null hypothesis (Ho): The mean had been not affected, i.e., μ = 12.44.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ ≠ 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a two-tailed test should be carried.
Since the sample is smaller than 30, then a student's T distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value t α/2 for the degrees of freedom (sample size - 1).

Using α = 5% and n = 25 (df = 25 - 1 = 24) will lead to t α/2 = 2.060.
To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Tobs). These could be done using the following equation:

tobs = (x̄ - μ)/(s/(n^0.5))

This equation will result in the following numbers:

tobs = (13.71 - 12.44)/(2.65/(25^0.5)) = (13.71-12.44)/0.53 = 2.40

Since tobs = 2.40 is higher than upper critical value t α/2 = 2.06, then we can reject the Null hypothesis.

When to use a T-test

The following conditions must hold to apply a T-test:

The sample size should be less than 30. Otherwise, we should use the z-test.
Samples should be drawn at random from the population.
The standard deviation of the population should be known.
Samples that are drawn from the population should be independent of each other.
The data should follow Student-T distribution.

Types of T-test

Assuming that [1]:

Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H0.
Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by HA.
Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝)

Then:

Two-tailed T-test: The region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.
Left-tailed T-Test: The region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
Right-tailed T-Test: The region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Two-tailed T-Test

Right-tailed T-Test

Left-tailed T-Test

The Python code with the data, and detailed computation to generate the three types of T-test are given at:

https://colab.research.google.com/drive/1_nR2eMhvXxVY-HCVDjpRiQ2-2SV7TeqT?usp=sharing

Two-sampled T-test

Suppose are provided 2 T-student distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider μ1 and μ2 to be the population mean, and x̄1 and x̄2 to be the observed sample mean. Here, our null hypothesis could be like this:

Null hypothesis (Ho): There is no difference between the means, i.e., μ1 - μ2 = 0.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
And the formula for calculating the t-test score tobs is given by [2, 3]:

Where x1 = first sample mean, x2 = second sample mean, s1 = first sample standard deviation, s2 = second sample standard deviation, n1 = first sample size, n2 = second sample size, and df (degree of freedom) = min(n1, n2) - 1.

A numerical example of a two-sample T-test

Let’s consider that the first factory shares 21 samples of ball bearings where the mean diameter of the sample comes out to be 10.5 cm. On the other hand, the second factory shares 25 samples with a mean diameter of 9.5 cm. Both have a standard deviation of 1 cm [3]:

Step 1: Pose the research question and determine the proper statistical test.

The company wants to determine if the performance of the employees in Factory 1 is different from the performance of the employees in the Factory 2. To do this, we will use a two-sample t-test for means.

Step 2: Obtain the samples statistics from the two factories (populations).

Factory 1: x̄1 = 10.5, σ1 = 1.
Factory 2: x̄2 = 9.5, σ2 = 1.

Step 3: Formulate the null and alternate hypotheses and set the level of significance for the test.
- Null hypothesis (Ho): There is no difference between the performance of employees at different Factories. There is no difference between the means, i.e., μ1 - μ2 = 0.
- Alternate hypothesis (Ha): There is a difference in the performance of the employees. Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
- We will perform two-tailed test using confidence level (alpha), α = 5%.
- Df (degree of freedom) = min(n1, n2) - 1 = 21 - 1 = 20

Step 4: Use the formula for two-sample t-test for means to calculate the z-test statistic tobs.
- tobs = (x̄1 – x̄2 ) / √((s1 )²/n1 + (s2)²/n2)
- tobs = (1) / √((1)²/21 + (1)²/25))
- tobs = 3.378

Step 5: Compare tobs with the critical value t α/2 from the previous table.

Step 6: Conclude that Since tobs = 3.378 is higher than critical value t α/2 = 2.080, then we can reject the Null hypothesis.

A numerical example of a two-sample T-test in Python

The next Python code is useful to automate the manual computation made previously [2, 3].

import numpy as np

import scipy.stats as stats

import matplotlib.pyplot as plt

# Step 2: Sample statistics

x1 = 10.5 # Sample mean of Factory 1

s1 = 1 # Sample standard deviation of Factory 1

n1 = 21 # Sample size of Factory 1

x2 = 9.5 # Sample mean of Factory 2

s2 = 1 # Sample standard deviation of Factory 2

n2 = 25 # Sample size of Factory 2

# Degrees of freedom

df = min(n1, n2) - 1

# Step 4: Calculate the t-test statistic

t_obs = (x1 - x2) / np.sqrt((s1 ** 2) / n1 + (s2 ** 2) / n2)

print(f"Tobs: {t_obs:.3f}")

# Step 5: Critical values for a two-tailed test at alpha = 0.05

alpha = 0.05

t_critical = stats.t.ppf(1 - alpha/2, df)

print(f"Critical value (t_alpha/2): ±{t_critical:.3f}")

# Step 6: Conclusion

if abs(t_obs) > t_critical:

conclusion = "Reject the null hypothesis"

else:

conclusion = "Fail to reject the null hypothesis"

print(conclusion)

# Plotting the results

x = np.linspace(-4, 4, 1000)

y = stats.norm.pdf(x)

plt.figure(figsize=(10, 6))

plt.plot(x, y, label='Standard Normal Distribution')

# Critical regions

plt.fill_between(x, y, where=(x < -t_critical) | (x > t_critical), color='red', alpha=0.3, label='Critical regions')

# Tobs line

plt.axvline(t_obs, color='blue', linestyle='--', label=f'Tobs = {t_obs:.3f}')

plt.text(t_obs - 0.1, max(y)*0.5, f'Tobs = {t_obs:.3f}', color='blue', ha='right')

# Critical region text

plt.text(-t_critical, max(y)*0.1, f'Critical region: -{t_critical:.3f}', color='red', ha='center')

plt.text(t_critical, max(y)*0.1, f'Critical region: {t_critical:.3f}', color='red', ha='center')

# Formatting the plot

plt.title('Two-Tailed t-Test')

plt.xlabel('t-value')

plt.ylabel('Probability Density')

plt.legend()

plt.grid(True)

plt.show()

Tobs: 3.378

Critical value (t_alpha/2): ±2.086

Reject the null hypothesis

The Python code with the data, and detailed computation to employ two-sample T-test to verify if the two classes have the same mean is given at:

https://colab.research.google.com/drive/1igHfmxU5P7veSXIA0eatBsnuBdvun8jY?usp=sharing

References

[1] https://www.geeksforgeeks.org/z-test/

[2] https://www.datacamp.com/tutorial/an-introduction-to-python-t-tests

[3] https://medium.com/codex/hypothesis-testing-paired-and-unpaired-two-sample-t-test-and-z-test-in-r-python-and-google-c7fff80d01ac

Page updated

Google Sites

Report abuse