Have you ever tried working with a large dataset on a 4GB RAM machine? It starts heating up while doing simplest of machine learning tasks? This is a common problem data scientists face when working with restricted computational resources.
Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common.
Working with large amounts of data can cause out-of-memory errors, re-drawing issues, and sluggish behaviour.
Training will be slow
Out of memory issue
Compare multiple algorithm will be slower since cross-validation needs training and it will take lots of time for each algorithm.
Resource enhancement is reactive action. However, before doing this, it is better to work with data.
Look for redundancy in the data. Redundancy can be
in the form of data - Data condensation technique can be used. k-means clustering can be one approach. this paper talks about another approach. Effectiveness of data condensation approach is measured based on model accuracy.
in the form of features - In this case, apply feature selection approach. This article can help.
You can remove noisy samples.
Also, You can perform removal of data with missing values etc. However, be careful to not affect the generalisation capability.
Based on the problem at hand, requirement for training data size varies. As a thumb rule, there should be at least 10 observations/samples per variable(Note that in a categorical feature, each category is a variable). And so it is pretty okay to work with large size dataset.
https://machinelearningmastery.com/large-data-files-machine-learning/
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning
https://machinelearningmastery.com/feature-selection-to-improve-accuracy-and-decrease-training-time/
https://www.isical.ac.in/~sankar/paper/PAMI_02_PM_CAM_SKP_2.pdf
https://youtu.be/q8gVpKl1f-4
https://www.analyticsvidhya.com/blog/2018/08/dask-big-datasets-machine_learning-python/
http://www.fekete.com/san/webhelp/welltest/webhelp/content/html_files/procedures/preparing_data_for_analysis/handling_large_datasets.htm
https://images.app.goo.gl/1fH1aDuBWQctSS957
https://www.statisticssolutions.com/sample-size-formula/?__cf_chl_jschl_tk__=5032fe736fc7563364c6269501bfc92b20b9b3ee-1612011879-0-AeWcCMM-10x3Co-znQ5Bv013JJ0wpw8sc6-lTj2od6NPfA6TwFWcd6mIfYKsCE7JPyqFRqoGOr2sAE1HRBpVZMOih_gRRtIqHkYrho6NzWs7V9QAIduock-lK1RTP1GysuNxtiYhYPM5brdPpItu3JEUWXuVy4wrPTT1P5cVGsWZmJFzW1D1rkIROBt5rm4fOhysZDamAT6YdmHRUd3ZnMZuS2Xrg4QdcDcIfiTcA3rtZw6IRKtGEHXkXyzdRjvgb3NpQKwCtvJXnUT0YHjJPiHiQHp6axXiarIF-qGku3524p8MEVy_LAJTzRahebfUL_lOWH_c_cRZTn9ymhrihHU