[TBD]Role of data reduction in machine learning

Introduction

Laymen explanation

The data reduction procedures are of vital importance to machine learning and data mining. It is well known that in order to avoid excessive storage and time complexity and to improve generalisation accuracy by avoiding noise and overfitting, it is often advisable to reduce original training set by selecting the most representative information

Technical explanation

Data reduction is a process that reduces the volume of original data and represents it in a much smaller volume. Data reduction techniques ensures the integrity of data while reducing the data.

Data reduction strategies include

Dimensionality reduction,
numerosity reduction, and
data compression.

Benefits

[Verify]

- Effectively handles curse of dimensionality issue. This issue tells that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily.
- Removes data noise. Noise can cause overfitting.
- Reduces training time

Dimensionality reduction

It is the process of reducing the number of random variables or attributes under consideration. Refer here for the detail

Numerosity reduction

This techniques replace the original data volume by alternative, smaller forms of data representation.

Data compression

It is applied so as to obtain a reduced or “compressed” representation of the original data. It can be lossless or lossy. Refer here for the detail.

Reference

https://www.sciencedirect.com/topics/computer-science/dimensionality-reduction

https://towardsdatascience.com/the-curse-of-dimensionality-50dc6e49aa1e

https://en.wikipedia.org/wiki/Curse_of_dimensionality#Machine_Learning

https://link.springer.com/chapter/10.1007/978-3-540-69052-8_29

https://binaryterms.com/data-reduction.html

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-data-compression-in-machine-learning-1

Page updated

Google Sites

Report abuse