Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA)
In the EDA phase, we will undertake the following key steps:
Data Cleaning:
Addressing missing values and inconsistencies in the dataset.
Feature Engineering:
Adding a 'detected_language' column to identify the language of each comment.
Creating a 'sentiment' column based on the provided rating.
Translating non-English comments for a comprehensive analysis.
Text Preprocessing:
Converting emojis and emoticons into words.
Standardizing comments to lowercase.
Removing links, HTML tags, numbers, or special characters.
Tokenization and Lemmatization:
Breaking down comments into tokens.
Reducing words to their base or root form through lemmatization.
Visualisation
By executing these EDA tasks, we aim to prepare the data for subsequent modeling, ensuring that our sentiment analysis model is robust and effective in extracting meaningful insights from MOOC learner reviews.
1-Data Cleaning:
2-Feature Engineering:
3-Text Preprocessing:
4-Tokenization and Lemmitisation
5-Visualisation: