This section takes you through the entire data preparation journey for PCA, showcasing the transformation of the dataset before and after applying Principal Component Analysis (PCA). By the end of this section, you will have a clear understanding of how PCA is used to reduce the dimensionality of data while retaining its most important features.
In its original form, the dataset contains a set of weather-related features such as Temperature, Humidity, Wind Speed. Each feature represents a different variable measured on different scales.
The dataset we used consists of three key weather-related variables:
Temperature (in °C)
Humidity (as a fraction, ranging from 0 to 1)
Wind Speed (in meters per second, m/s)
These three variables were chosen because they provide essential information about weather conditions and are interrelated in meteorological studies. Here's a snapshot of the dataset before applying PCA:
Here's a snapshot of the dataset before applying PCA:
Observations:
The temperature is measured in degrees Celsius, humidity as a fraction (percentage), and wind speed in meters per second.
The variables are measured on different scales, which can lead to bias during PCA if not standardized.
Standardized Dataset
Since the original variables are measured on different scales, it is necessary to standardize them before applying PCA. Standardization ensures that each variable has a mean of 0 and a standard deviation of 1. This is crucial because PCA is sensitive to the scale of the data.
Here’s a snapshot of the dataset after standardization:
After standardization, each feature is scaled to have a mean of 0 and a standard deviation of 1.
This transformation ensures that no variable dominates the PCA due to its larger range or scale, allowing all features to contribute equally to the analysis.