1.4. One sample z-test and their relation with two-sample z-test

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30) [1].

This kind of statistical test had and the proper situation to apply had been described at Track 08, section 1.3.

A numerical example of a Z-test

The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is different from 12.44 minutes. A sample with 150 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently different?

Null hypothesis (Ho): The mean had been not affected, i.e., μ = 12.44.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ ≠ 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a two-tailed test should be carried.
Since the sample is larger than 30, then a normal distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value z α/2.

To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Zobs). These could be done using the following equation:

zobs = (x̄ - μ)/(s/(n^0.5))

This equation will result in the following numbers:

zobs = (13.71 - 12.44)/(2.65/(150^0.5)) = (13.71-12.44)/0.2164 = 5.87

Since Zobs = 5.87 is higher than upper critical value z α/2 = 1.96, then we can reject the Null hypothesis.

When to use a Z-test

The following conditions must hold to apply a Z-test:

The sample size should be equal or greater than 30. Otherwise, we should use the t-test.
Samples should be drawn at random from the population.
The standard deviation of the population should be known.
Samples that are drawn from the population should be independent of each other.
The data should be normally distributed, however, for a large sample size, it is assumed to have a normal distribution because central limit theorem.

Types of Z-test

Assuming that [1]:

Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H0.
Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by HA.
Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝)

Then:

Two-tailed Z-test: The region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.
Left-tailed Z-Test: The region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
Right-tailed Z-Test: The region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Two-tailed Z-Test

Right-tailed Z-Test

Left-tailed Z-Test

Two-sampled Z-test

Suppose are provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider μ1 and μ2 to be the population mean, and x̄1 and x̄2 to be the observed sample mean. Here, our null hypothesis could be like this:

Null hypothesis (Ho): There is no difference between the means, i.e., μ1 - μ2 = 0.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
And the formula for calculating the z-test score zobs is given by:

where 𝜎1 and 𝜎2 are the standard deviation and n1 and n2 are the sample size of populations which means are μ1 and μ2, respectively.

A numerical example of a two-sample Z-test

A company wanted to compare the performance of its factory employees in two different factories located in two different parts of a country – Factory 1, and Factory 2, in terms of the number of manufactured products in a day. The company randomly selected 30 employees from the Factory 1 and 30 employees from the Factory 2. The following data was collected [2]:

Step 1: Pose the research question and determine the proper statistical test.

The company wants to determine if the performance of the employees in Factory 1 is different from the performance of the employees in the Factory 2. To do this, we will use a two-sample z-test for means.

Step 2: Obtain the samples statistics from the two factories (populations).

Factory 1: x̄1 = 750, σ1 = 20.
Factory 2: x̄2 = 780, σ2 = 25.

Step 3: Formulate the null and alternate hypotheses and set the level of significance for the test.
- Null hypothesis (Ho): There is no difference between the performance of employees at different Factories. There is no difference between the means, i.e., μ1 - μ2 = 0.
- Alternate hypothesis (Ha): There is a difference in the performance of the employees. Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
- We will perform two-tailed test using confidence level (alpha), α = 5%.

Step 4: Use the formula for two-sample z-test for means to calculate the z-test statistic zobs.
- zobs = (x̄1 – x̄2 ) / √((σ1 )²/n1 + (σ2)²/n2)
- zobs = (-30) / √((20)²/30 + (25)²/30))
- zobs = -5.13

Step 5: Compare zobs with the critical value z α/2 from the following table:

Step 6: Conclude that Since zobs = 5.87 is lower than critical value z α/2 = -1.96, then we can reject the Null hypothesis.

A numerical example of a two-sample Z-test in Python

The next Python code is useful to automate the manual computation made previously [3, 4].

import numpy as np

import scipy.stats as stats

import matplotlib.pyplot as plt

# Step 2: Sample statistics

x1 = 750 # Sample mean of Factory 1

sigma1 = 20 # Population standard deviation of Factory 1

n1 = 30 # Sample size of Factory 1

x2 = 780 # Sample mean of Factory 2

sigma2 = 25 # Population standard deviation of Factory 2

n2 = 30 # Sample size of Factory 2

# Step 4: Calculate the z-test statistic

z_obs = (x1 - x2) / np.sqrt((sigma1 ** 2) / n1 + (sigma2 ** 2) / n2)

print(f"Zobs: {z_obs:.3f}")

# Step 5: Critical values for a two-tailed test at alpha = 0.05

alpha = 0.05

z_critical = stats.norm.ppf(1 - alpha/2)

print(f"Critical value (z_alpha/2): ±{z_critical:.3f}")

# Step 6: Conclusion

if abs(z_obs) > z_critical:

conclusion = "Reject the null hypothesis"

else:

conclusion = "Fail to reject the null hypothesis"

print(conclusion)

# Plotting the results

x = np.linspace(-4, 4, 1000)

y = stats.norm.pdf(x)

plt.figure(figsize=(10, 6))

plt.plot(x, y, label='Standard Normal Distribution')

# Critical regions

plt.fill_between(x, y, where=(x < -z_critical) | (x > z_critical), color='red', alpha=0.3, label='Critical regions')

# Zobs line

plt.axvline(z_obs, color='blue', linestyle='--', label=f'Zobs = {z_obs:.3f}')

plt.text(z_obs + 0.1, max(y)*0.5, f'Zobs = {z_obs:.3f}', color='blue', ha='left')

# Critical region text

plt.text(-z_critical, max(y)*0.1, f'Critical region: {z_critical:.3f}', color='red', ha='center')

plt.text(z_critical, max(y)*0.1, f'Critical region: {z_critical:.3f}', color='red', ha='center')

# Formatting the plot

plt.title('Two-Tailed Z-Test')

plt.xlabel('Z-value')

plt.ylabel('Probability Density')

plt.legend()

plt.grid(True)

plt.show()

Zobs: -5.132

Critical value (z_alpha/2): ±1.960

Reject the null hypothesis

The Python code with the data, and detailed computation to employ two-sample Z-test to verify if the two classes have the same mean is given at:

https://colab.research.google.com/drive/1_nR2eMhvXxVY-HCVDjpRiQ2-2SV7TeqT?usp=sharing

References

[1] https://www.geeksforgeeks.org/z-test/

[2] https://vitalflux.com/two-samples-z-test-for-means-formula-examples/#google_vignette

[3] https://medium.com/@ritusantra/what-is-z-test-and-how-we-implement-it-using-python-d0f92820b4c

[4] https://www.statology.org/z-test-python/

Page updated

Google Sites

Report abuse