Choosing training dataset and test dataset in machine learning