Stemming= reducing a word to its word stem (ex: "running" and "runs" to "run")
Lowercase conversion= converting all characters to lowercase
Tokenization = splitting a phrase into smaller units (ex: ["The cat sat on the mat "] = ["The", "cat", "sat", "on", "the", "mat"])
Stop-words removal = removing words that occur commonly as they are not important in terms of context
Creating lemmatizer and defining regex patterns
Data cleaning: replacing unnecessary characters
Lemmatizing non-stop words