Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. Practical Assignments on Statistical / Algorithmic Data Modeling
Objective:
To develop skills in statistical data modeling, hypothesis testing, classification and regression
algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.
4.1: Hypothesis Testing and Probability Distributions
1. Hypothesis Testing:
o Perform a hypothesis test to determine if the mean of a sample differs significantly from a known population mean
We will use a t-test to determine if the mean of a sample is significantly different from a known population mean.
Program:
import numpy as np
from scipy.stats import ttest_1samp
import matplotlib.pyplot as plt
import seaborn as sns
# Sample data (e.g., heights of a sample of people in cm)
sample_data = [170, 168, 174, 171, 169, 167, 172, 176, 173, 175]
# Known population mean (e.g., average height of people in the population)
population_mean = 170
# Step 1: Perform t-test
t_stat, p_value = ttest_1samp(sample_data, population_mean)
# Step 2: Print results
print(f"T-Statistic: {t_stat:.2f}")
print(f"P-Value: {p_value:.4f}")
# Step 3: Conclusion based on p-value
alpha = 0.05 # Significance level
if p_value < alpha:
print("Reject the null hypothesis (H₀): There is a significant difference between the sample mean and the population mean.")
else:
print("Fail to reject the null hypothesis (H₀): There is no significant difference between the sample mean and the population mean.")
# Step 4: Visualize the sample data distribution
sns.histplot(sample_data, kde=True, color='skyblue', bins=5)
plt.title("Sample Data Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Output:
T-Statistic: 0.94
P-Value: 0.3741
Fail to reject the null hypothesis (H₀): There is no significant difference between the sample mean and the population mean.
Explaination:
Import Required Libraries:
numpy: For handling numerical operations.
scipy.stats: For performing the t-test.
matplotlib.pyplot and seaborn: For visualizing data.
Sample Data and Known Population Mean:
The sample data represents a collection of measurements (e.g., heights).
population_mean is the known average of the population.
Perform t-Test:
ttest_1samp(sample_data, population_mean) compares the sample mean to the population mean.
It returns a t-statistic and a p-value.
Print Results and Conclusion:
The p-value helps determine if there's a significant difference. If p_value < alpha, we reject the null hypothesis (H₀) and conclude there is a significant difference.
Visualize Sample Data Distribution:
A histogram with a kernel density estimate (KDE) is plotted to show how the data is distributed.\