Preprocessing is the cleaning and structuring of data to prepare it for a certain machine learning algorithm. If you were to simply feed a model a list of tweets, it wouldn’t know what to do. We must first simplify each tweet, separate into its individual words, and reduce each term to its stem word for efficiency; this is where preprocessing comes in.
In order to find the optimal machine learning algorithm for a dataset, you first need to play around with it and understand its features, and what about the dataset stands out to you. In doing so, you will eventually discover the machine learning algorithm that would best satisfy your purposes.