In this project, I embarked on a fascinating journey to understand and predict human emotions from textual data using Natural Language Processing (NLP). My exploration centred on the International Survey on Emotion Antecedents and Reactions (ISEAR) dataset, which captures basic emotions from individuals across 37 nations, providing a rich, multicultural canvas for our analysis. Utilizing a variety of NLP techniques and machine learning models, including advanced algorithms like XGBoost, LightGBM, and CatBoost, we sought to uncover how specific words are linked to emotional expressions. Our findings were revealing: words associated with joy often included social and positive elements like "friend" and "time", while those related to fear highlighted vulnerability and threat, such as "alone" and "afraid". You will find the project code in my GitHub. The report is available in my ResearchGate
A critical aspect of our study was sentiment analysis, where we discovered a predominant skew towards negative expressions in the dataset. This insight is particularly significant as it points to the emotional state of the respondents during their participation and offers clues about the commonality of negative sentiments in the expressed situations. Our methodological approach also included rigorous feature selection, significantly impacting the accuracy and effectiveness of emotion prediction. The performance of our models was quantitatively assessed, with techniques like Chi-Square and Mutual Information providing valuable enhancements in our predictive capabilities. This project not only advanced our understanding of how emotions are articulated in text but also equipped us with knowledge to aid future technological designs in creating emotion-aware systems. The implications of this work are broad, stretching from enhancing AI-driven communication interfaces to refining tools for psychological analysis and support. I look forward to further exploring the intersection of language, emotion, and machine learning, building on the valuable insights gained from this study to enhance our ability to interact with and understand the underlying emotional currents in textual data.