Below is an example from the Amazon Consumer Reviews of Amazon Products data set showing the review rating (1-5) and the correlating review text for a single product in the Amazon data set. Amazon reviews should be at least 20 words and limited to 5,000 words. They are written typically in informal English, contain irregular expressions, and can contain abbreviations and slang words. Amazon sentiment analysis approaches can be generally categorized into two main approaches: machine learning and lexicon-based approach.
Data set used in this study has a total of 34,626 records.
Data is imbalanced as most reviews are of class 4 or 5. This issue was overcome by over sampling the data with lower class labels.
With the unbalanced data set, the second model is trained by oversampling class labels with lower count.
Certain words are used more often than others.
Split data into a training, test, and validation set.
This is a list of over 34,000 consumer reviews for Amazon products like the Kindle, Fire TV Stick, and more provided by Datafiniti's Product Database. The dataset includes basic product information, rating, review text, and more for each product.