NLP (Natural Language Processing) & Sentiment Analysis Soheila Rahmani

Sentiment Analysis of Amazon Electronic Products Reviews

Phase I- Introducing the Project

Introduction
Why is it important to you and to others?

Electronic commerce had a large growth from its resurfacing throughout the internet and network technologies, leading large platforms such as Amazon, Target, Walmart, etc. to have a flow of customers lead into online retail. Customers positive and negative reviews play a major role in the decision a consumer makes before they purchase something online, providing them with a better description and functionality of the product and comparing it with other similar products. The seller is also provided with information about their product and how they could improve their services. Sentiment analysis also known as opinion mining is a major task of NLP (Natural Language Processing) which is a strategy to show feeling, thought, judgement, or expression and would be used to study the target audience’s sentiments towards an entity. It senses negative and positive opinions by analyzing the whole text to a categorized sentiment. Nowadays, consumers have a lot more freedom to express their thoughts and opinions freely and can help companies to customized their products and services. With the technological improvements in the field of machine learning and automation, companies create a system to analyze customers feedback and many interactions on different platforms to serve their customers needs.

Sentiment analysis uses is extremely useful in social media monitoring as it allows us to gain an overview of the wider public opinion behind certain topics.

Amazon product review is simply a mirror of the consumer’s thoughts relating to that product. The analysis of product review can help you in understanding consumers’ demand and interests. It would also provide you with marketing intelligence about the type of products which the consumers are more willing to purchase. This valuable insight will also help them better understand their own product and why it may (or may not) be working.

Why Amazon Reviews are Important for Sellers?

Amazon Safeguards the Trustworthiness of Reviews
Reviews Give Important Market Insight
Buyers Cherish Online Reviews
Amazon Product Review Leads to Higher Rankings
Amazon Seller feedback Facilitates “Amazon Buy Box” Wins
Reviews Increase Conversion Rates
Amazon Product Review as a Free Marketing Tool
Reviews Engage Customers
Increase Rankings

In this project we focused on text and analyze a dataset of real observations of Amazon customer reviews which collected from Datafiniti website. We focus on how this dataset is interacting with the customer reviews and perform a sentiment analysis on these reviews in order to understand what the customer is saying about the electronic products they purchased. The following pipeline illustrates the stages of NLP.

Figure 1

Natural Language Processing Pipeline

Data Set

The dataset has been collected from https://data.world/datafiniti/amazon-and-best-buy-electronics website and is a .csv file with the size of 8.29 MB, consisting of information of over 7,000 online reviews for 50 electronic products from websites like Amazon and Best Buy provided by Datafiniti's Product Database. This dataset has 20 attributes, writing reviews on Amazon electronic products every day. Each review contains textual feedback along with a 1-to-5-star rating system (1 being least satisfied and 5 being most satisfied).

Note that this is a sample of a large dataset. The full dataset is available through Datafiniti.

TABLE COLUMNS TYPE

id string

asins string

brand string

categories string

colors string

dateAdded datetimestamp

dateUpdated datetimestamp

dimension string

manufacturerNumber string

name string

primaryCategories string

reviews_dateSeen datetimestamp

reviews_doRecommend boolean

reviews_numHelpful integer (numerical)

reviews_rating integer (numerical)

reviews_sourceURLs url

reviews_text string

reviews_title string

reviews_username string

weight string

Hypothesis / Research Question(s)

Based on the customer product review (textual feedback) along with a 1-to-5-star rating system (1 being least satisfied and 5 being most satisfied) we need to quantify customer satisfaction to help with future business decisions. The target is to build models which can identify the sentiment (positive or negative) of each of these interactions. The goal of this project is to show how sentiment analysis can be performed on Amazon Customer Electronic Products Reviews using python. For each textual review, we want to predict if it corresponds to a good review (the customer is happy) or to a bad one (the customer is not satisfied). The reviews overall ratings can range from 1 to 5. Here are the steps we follow:

- Read and analyze the input text data and the corresponding response variables.
- Perform basic pre-processing to prepare the data for modeling.
- Learn and apply various ways of featuring the reviews text (Bag of Words model, TF-IDF and Word
  
  Embeddings features).
- Build supervised machine learning models (Logistic Regression and Linear Support Vector Machine).

Implementation (Models)

To perform the Text-Classification (NLP), we applied four different feature extraction models (Bag of Words, TF_IDF, and Word Embeddings) and compared the results of their prediction accuracies by applying two following models to classify text as either exhibiting positive or negative sentiment (1 or 0):

- Logistic Regression (LR)
- Linear Support Vector Machine (LSVM)

ppt_phase_Final

Phase II - EDA & Model Construction

Data Cleaning

The chance of getting a perfectly cleaned dataset that meets all of our requirements is slim to none. So, we need to clean the data before we start modeling. Here is a checklist when it comes to cleaning the data:

- Check the data types to make sure they are correct.
- Make sure the column names are correct. This makes the process of selecting data easier.
- Check for missing values. This helps control for errors.

Data Preparation

The most important step to analyze the data, is to clean and prepare the dataset. A good model is not only dependent on the algorithm, but on a clean dataset mostly. There are many tactics in Text Data Processing, such as:

Remove non-alphanumeric characters, except for white space.
Convert all characters to lowercase, in order to treat the words all the
same.
Consider Tokenization, Stemming, and lemmatization.
and so on.

Figure 1 shows the steps to prepare the data for the final analysis.

Figure 2

Data Visualization

Figure 3

Figure 3 displays the number of words per review with the Average words: 46.5385 & Skewness: 8.2461

The graph is positively skewed and shows almost all the reviews have less than 200 words that is important for us. To analyze the reviews we need to know how this dimensional space is looking like (200 words per review as maximum and 46 words as average review).

Figure 4 displays the rates distribution. Looking at the results, 62.84% of reviews have 5 stars, 25.61% have 4 stars, 5.77% have 3 stars, 3.34% have 1 stars and 2.42% have 2 star.
The distribution is quite skewed, with a large number of 5s and very few 3s, 2s, and 1s.

Figure 5 displays the distribution of the number of positive and negative reviews. As you can see the number of positive reviews (3571) are much more than negative reviews (233)

Figure 4

Figure 5

Word Cloud

As a distribution of the text in a visualization way, a word cloud displays the words in different sizes, indicating the frequency of the each word in the text. Looking at the word cloud, it shows that the reviews are related to the electronic products such as: "Speakers", "Headphones", "Keyboard", etc.
Some words are related to the customers experiences, such as: "good", "love", "best", "great" and some are not desired ones like, "much" (determiner) and "something" (adverb), which might not give us any insights about the reviews. So, we need to use the techniques to delete or drop these words.

N-grams

As we can see by increasing the number of grams, the number of features and the number of computational expenses increases (7260 vs 192317). We have to balance between having a lot of features and the computational expenses of processing the features.

Data Pre-processing

Text preprocessing and normalization is crucial before building a proper NLP model. Some of the important steps are:

Converting words to lower/upper case
Removing special characters
Removing stop words and high/low-frequency words
Stemming/lemmatization

Extracting features from text documents

To extract features from text reviews I use BOW (Bag of Words), TF-IDF (Term Frequency - Inverse Document Frequency) and Word Embeddings models. These features can be used for training data in machine learning algorithms. It creates a vocabulary of all the unique words occurring in all the documents in the dataset.

Phase III - Execution & Interpretation

Building Machine Learning Models

BOW I. Applying Logistic Regression to Bag of Words Features¶

Creating a 1-gram bag of words from the reviews using word-document matrix

We can see that a sparse matrix of 3804 observations (number of rows of the reviews) has been built, and 7260 columns (words) corresponding to the features extracted by means of the representation of the 1-gram count of the user reviews.

Once the bag of words is prepared, to model the dataset, we divide the data into training (80%) and testing sets (20%).

As we can see the f1 score is 97.3% which is considerably high.

To check if we can increase the f1-score by increasing the n-grams, we compared the outcome of 1-gram and 3-grams and the result shows that the f1 score has increased slightly from 97.3% to 97.8%. So, we have to balance our computational expenses from coming of 7260 features to 192317 against 0.5% of increase in our f1 score.

BOW II: Applying Linear Support Vector Machine (LSVM) to Bag of Words Features

Comparing the f1-scores of LR and LSVM models shows that the accuracy of the LSVM with the value of 98.0% is higher than LR with the value of 97.0% .

ROC Curve comparison shows that the area under the curve for LSVM Classifier is bigger than LR (0.92 vs 0.91). Looking at the f1-scores of these models, we can also see that the LSVM model has a higher score than LR (98.0% vs 97.3%).

TF-IDF I. Applying Logistic Regression to TF-IDF Features¶

Here we have the same number of features/columns as bag of words model since we were applying the same approach of counting times a logarithm value.

Comparing the f1 scores, using Linear Regression model for Bag of Words and TF-IDF features shows that this score for TF_IDF with the value of 97.0% is less than BOW with the value of 98.0%. Depends on the context in some cases we might have a better accuracy for TF_IDF than BOW.

TF-IDF II. Applying Linear Support Vector Machine (LSVM) to TF-IDF Features

Comparing the f1 scores, using Linear Support Vector Machine model for Bag of Words and TF-IDF features also shows that this score for TF-IDF with the value of 97.2% is less than BOW with the value of 98.0%.

ROC Curve comparison shows that the area under curve for SVC Classifier and LR both are the same (0.90), and it means both models have almost the same performance. Looking at the f1-scores of these models, we can see that the SVC model has a higher score than LR (97.2% vs 97.0%).

In general, for imbalanced classification with a severe skew and few examples of the minority class, the ROC AUC can be misleading. This is because a small number of correct or incorrect predictions can result in a large change in the ROC Curve or ROC AUC score.

Word Embeddings I. Applying Logistic Regression to WE Features¶

Converting the list of vector representations for each review into a Data Frame and split it into train and test sets.

Applying logistic regression to our word embeddings representation.

Comparing the f1 scores, using Linear Regression ML model for Bag of Words, TF-IDF, and Word Embeddings features shows that this score for Word Embeddings is the highest (97.5%) and TD-IDF the lowest (97.0%).

Word Embeddings II. Applying Linear Support Vector Machine (LSVM) to BW Features

Applying Linear Support Vector to our word embeddings representation.

Comparing the f1 scores, using Linear Support Vector ML model for Bag of Words, TF-IDF, and Word Embeddings features shows that BOW has the highest score (98.0%) and WE has the lowest (94.07%).

ROC Curve comparison shows that the area under curve for LR is more than LSVM Classifier (0.67 vs 0.56). Also, Looking at the f1-scores of these models, we can see that the LR model has a higher score than LSVM (97.5% vs 94.06%).

References

1. Bannister, Kristian. “Sentiment Analysis: How Does It Work? Why Should We Use It?” Brandwatch, 26 Feb. 2018, www.brandwatch.com/blog/understanding-sentiment-analysis/.

2. Admin. “Importance of Amazon Product Review and Rating for Sellers.” AMZ Insight, Amzinsight, 10 Sept. 2018, amzinsight.com/amazon-product-review-importance/.

3. Jain, Vineet, and Mayur Kambli. “Amazon Product Reviews: Sentiment Analysis.” ResearchGate, Oct. 2020, www.researchgate.net/publication/344677952_Amazon_Product_Reviews_Sentiment_Analysis.

4. Malik, Farhad. “NLP: Introduction To NLP & Sentiment Analysis.” Medium, FinTechExplained, 21 June 2019, medium.com/fintechexplained/sentimental-analysis-an-introduction-7fc21d9b8625.

GitHub Repository : https://github.com/srahmani7/NLP_Project/tree/main
LinkedIn : https://www.linkedin.com/feed/