It is impossible to precisely estimate the minimum amount of data required for an AI project. Obviously, the very nature of your project will influence significantly the amount of data you will need. For example, texts, images, and videos usually require more data.
In a real-world setting, you often only have a small dataset to work with. Models trained on a small number of observations tend to overfit and produce inaccurate results.
There should be enough data which represents the population. In other words, biases should not be there. Refer here to understand about bias in detail
As a rule of thumb, 10 samples per variable is needed. Refer here for more detail.
Generalisation issue
Underfitting
Less data for validation and test phase
Argumentation using real data modification
Artificial data synthesis
k-fold cross validation approach works well without any exclusive need of separate data. Refer here for the detail
This issue tells that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily(curse of dimensionality). (?? Verify )
It is about combining existing features to create a new, more useful feature that can have a higher importance in model. So, the model will have more appropriate features to get trained. (?? Verify if it helps)
When using a small dataset, outliers/noise can have a huge impact on the model. Noise in small dataset can cause overfit (verify). So, when working with scarce data, you’ll need to identify and remove outliers/noise.
Understand How it helps [Refer https://arxiv.org/abs/1207.0580]??
https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-cross-validation-data-in-machine-learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/understanding-bias-in-machine-learning
https://hackernoon.com/7-effective-ways-to-deal-with-a-small-dataset-2gyl407s
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-large-training-dataset-in-machine-learning#TOC-Point-to-remember
https://en.wikipedia.org/wiki/Curse_of_dimensionality#Machine_Learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning
https://towardsdatascience.com/problems-in-machine-learning-models-check-your-data-first-f6c2c88c5ec2
https://images.app.goo.gl/AY58czEYzXof5Pjv8
https://coursera.org/share/fd4a7f1f6feee1d6921b803e86340d96