This project applies key statistical analysis techniques using Python. The goal is to explore a dataset, fit a known distribution, and estimate its parameters using different methods.
Histogram for the Data: Visualize the distribution of the data to identify trends.
Fit a Known Distribution: Model the data using a Normal distribution.
Parameter Estimation: Calculate estimates using:
Method of Moments (MoM)
Maximum Likelihood Estimation (MLE)
Bootstrap Confidence Intervals: Form 95% confidence intervals using the bootstrap method to assess the accuracy of parameter estimates.
The Iris dataset contains 150 samples of iris flowers with measurements of sepal length, sepal width, petal length, and petal width across three species: Setosa, Versicolor, and Virginica.
Dataset :- Iris_Data
Fitted Normal Distribution
The data was fitted using a Normal Distribution. The estimated parameters using two approaches are:
Method of Moments Estimates (MoM):
Mean: 5.843
Standard Deviation: 0.825
Maximum Likelihood Estimates (MLE):
Mean: 5.843
Standard Deviation: 0.825
Bootstrap Confidence Intervals
Using the Bootstrap Method (with 1000 resamples), approximate 95% confidence intervals were generated for both parameters.
Mean: [5.715, 5.969]
Standard Deviation: [0.760, 0.913]
The Normal distribution provides a good fit for the Sepal Length data.
Both MoM and MLE gave consistent parameter estimates.
The confidence intervals indicate the level of uncertainty in the estimates, with a reasonably narrow range suggesting reliable results.
This comprehensive analysis highlights the effectiveness of using the Normal distribution for the given dataset.
This project involved fitting a Normal distribution to the Sepal Length data from the Iris dataset using Python. The data was visualized using a histogram to observe its distribution. Parameter estimates were calculated using the Method of Moments (MoM) and Maximum Likelihood Estimation (MLE), both yielding a mean of 5.843 and a standard deviation of 0.825. Additionally, the Bootstrap method with 1000 resamples was used to generate 95% confidence intervals, resulting in a mean range of [5.715, 5.969] and a standard deviation range of [0.760, 0.913]. The analysis concluded that the Normal distribution provides a good fit for the data, with narrow confidence intervals indicating reliable parameter estimates.