Since our dataset is small which is 35 rows and our class has 6 values, we cannot split our data into train and test sets. Therefore we need to resample our dataset. Resampling is the method that consists of drawing repeated samples from the original data samples. The method of Resampling is a nonparametric method of statistical inference. Resampling generates a unique sampling distribution on the basis of the actual data. The method of resampling uses experimental methods, rather than analytical methods, to generate the unique sampling distribution. The method of resampling yields unbiased estimates as it is based on the unbiased samples of all the possible results of the data studied by the researcher. Resampling is also known as Bootstrapping. The method of bootstrapping, which is equivalent to the method of resampling, utilizes repeated samples from the original data sample in order to calculate the test statistic. Therefore we do the resampling data in Rapidminer using Bootstrap Sampling.
In this step, we are using the clean dataset which is StudentEvent dataset. This dataset contains 35 rows and 11 columns. Our objective is to resample the dataset so the total number of rows will increase from 35 to 100 rows.
This is our resample model in Rapidminer.
First, we need to import the StudentEvent.xlsx file into our local repository.
Drag StudentEvent dataset from local repository to a blank process and retrieve operator are created.
Choose Select Attribute operator and connect Retrieve operator with Select Attribut operator. Set the parameter to select all the attributes.
Choose Sample (Bootstrapping) and connect Select Attribute operator with Sample (Bootstrapping) operator. Sample (Bootstrapping) operator creates a bootstrapped sample from an ExampleSet. Bootstrapped sampling uses sampling with replacement, thus the sample may not have all unique examples. The size of the sample can be specified on absolute and relative basis.
Choose absolute on sample parameter and set sample size to 100
We need to write the Sample output to an excel file. Therefore, we need to use a Multiply operator so that the Sample output can be used to write excel files, also to show the result. Connect example output Sample to Multiply operator. From the Multiply operator, one output should go to the result point and the other one goes to the Write Excel operator.
Choose the file location and save the file as StudentEvent.xlsx
After resampling, we can see that total number of rows has increased from 35 to 100 rows.
Choose to select Repository option to save the dataset in local repository.
Save as StudentEvent_Resample.xlsx