SSK4604-Data Mining - Data Standardization

SSK4604-Data Mining

Data Standardization

Data Standardization

In statistics, standardization (sometimes called data normalization or feature scaling) refers to the process of rescaling the values of the variables in the dataset so they share a common scale. Often performed as a pre-processing step, particularly for cluster analysis, standardization is important if working with data where each variable has a different unit (e.g., inches, meters, tons, and kilograms), or where the scales of each of your variables are very different from one another (e.g., 0-1 vs 0-1000). The reason this importance is particularly high in cluster analysis is that groups are defined based on the distance between points in mathematical space.

K-Means clustering is sensitive to the distance. Therefore we decide to standardized the data using Rapidminer in Turbo Prep Section.

Introduction To Dataset

In this step, we are using the clean dataset which is StudentEvent dataset. This dataset contains 35 rows and 11 columns.

Import Dataset

First, we need to import the StudentEvent.xlsx file into our local repository.

Select StudentEvent.xlsx (clean dataset before standardized)

Select the excel file and choose all columns.

Format Column Option

Ignoring this option and click Next.

Save File

Save the file in local repository as StudentEvent

Choose Cleanse Option

Select all the columns
Choose Normalization
Choose Standardization
Click Apply

Export To Excel

We export the dataset to excel because Python will use this dataset to analyzed.

Choose File Type

Choose Excel file type.

Select Location

Select location and save file as StudentEvent.xlsx

Export Data To Repository

Select Repository

Select Repository option to save the processed file in the local repository.

Save The File

Save the file as StudentEvent in local repository.

Dataset After Standardization

After standardization, we can see that all values are changes adn different from before cleansing. We

Next Topic: Data Resampling

Page updated

Report abuse