Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Assignment no. 4. Practical Assignments on Statistical / Algorithmic Data Modeling
Objective:
To develop skills in statistical data modeling, hypothesis testing, classification and regression
algorithms, model evaluation techniques, and hands-on exercises with the scikit-learn library.
4.1: Hypothesis Testing and Probability Distributions
2. Probability Distributions:
o Visualize the probability distribution of a dataset using histograms and probability density functions (PDFs).
A Python program to visualize the probability distribution of a dataset using histograms and probability density functions (PDFs).
Program:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Step 1: Generate or provide sample data
data = np.random.normal(loc=50, scale=10, size=1000) # Example: 1000 data points with mean 50 and standard deviation 10
# Step 2: Create a histogram and overlay a KDE plot
sns.histplot(data, bins=30, kde=True, color='purple')
# Step 3: Label the graph
plt.title("Probability Distribution of Sample Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Output:
The output will be a plot that looks similar to this:
Histogram: The bars represent how frequently each range of values appears in the dataset.
KDE Plot (Smooth Curve): The curve overlays the histogram and shows the probability density function (PDF), indicating where the data values are most concentrated.
When you run this program, you will see a graph with a smooth bell-shaped curve overlaid on a histogram. This is typical for data that follows a normal distribution, where the mean is at the center of the peak, and the spread indicates the standard deviation.
Explaination:
Generate Sample Data:
np.random.normal(loc=50, scale=10, size=1000): This generates 1000 data points from a normal distribution with a mean of 50 and a standard deviation of 10. You can replace this with your own dataset.
Histogram and KDE Plot:
sns.histplot(data, bins=30, kde=True, color='purple'): This line creates a histogram and overlays a Kernel Density Estimate (KDE) plot, which provides a smooth probability density function (PDF).
Labeling and Display:
Titles and axis labels are added for better understanding, and the plot is displayed using plt.show().