The data reduction procedures are of vital importance to machine learning and data mining. It is well known that in order to avoid excessive storage and time complexity and to improve generalisation accuracy by avoiding noise and overfitting, it is often advisable to reduce original training set by selecting the most representative information
Data reduction is a process that reduces the volume of original data and represents it in a much smaller volume. Data reduction techniques ensures the integrity of data while reducing the data.
Data reduction strategies include
Dimensionality reduction,
numerosity reduction, and
[Verify]
Effectively handles curse of dimensionality issue. This issue tells that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily.
Removes data noise. Noise can cause overfitting.
Reduces training time
It is the process of reducing the number of random variables or attributes under consideration. Refer here for the detail
This techniques replace the original data volume by alternative, smaller forms of data representation.
It is applied so as to obtain a reduced or “compressed” representation of the original data. It can be lossless or lossy. Refer here for the detail.
https://www.sciencedirect.com/topics/computer-science/dimensionality-reduction
https://towardsdatascience.com/the-curse-of-dimensionality-50dc6e49aa1e
https://en.wikipedia.org/wiki/Curse_of_dimensionality#Machine_Learning
https://link.springer.com/chapter/10.1007/978-3-540-69052-8_29
https://binaryterms.com/data-reduction.html
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-data-compression-in-machine-learning-1