As the use of digital resources continues expand in education, an unprecedented amount of new data is becoming available to educational researchers and practitioners. Among these new data sources, unstructured data such as text represents a significant share. This introductory course to text mining is designed to prepare researchers and practitioners to use this data more efficiently, effectively, and ethically. This course will provide students with an overview of text mining as an analytical approach, examples of its use in educational contexts, and applied experience with widely adopted tools and techniques. As participants gain experience in the collection, analysis, and reporting of data throughout the course, they will be better prepared help educational organizations understand and improve both online and blended learning environments.
For first-time R learners, this course will undoubtedly be full of challenges and novelties. Every challenge in the learning process left a lasting impression and gave me a sense of accomplishment. I am especially grateful that I did not give up when faced with challenges. I am also grateful for the professors' encouragement, as well as my classmates' inspiration and cooperation. After independently completing three project analyses of sentiment analysis, topic modeling, bigrams, and word nets, I completed the final project of analyzing the audience's emotional tendencies and focus in the comment section of a YouTube documentary about AI. Throughout this project, I used not only R language technology to mine unstructured text, but also data visualization technology to make the overall analysis results not only interesting, but also meaningful. This series of experiences has greatly enriched my learning experience, providing me with a better understanding and enthusiasm for data analytics.
In this case study, I draws on an AI documentary from YouTube that has garnered significant attention since its 2019 release. And I employ Lexion (Bing, NRC, and loughran) to analyze emotional trends and distributions. Word clouds and network analyses pinpoint the primary public concerns reflected in the comments. For data cleaning, bi-grams and tokenization help retain essential information effectively. Additionally, we used the machine learning method (ChatGPT) to analyze the data, producing dynamic bar and line charts. Sentiment distributions show notable fluctuations following specific events; for instance, sentiments peaked positively in March 2023 and dipped to their lowest in late November and early December 2022. Overall, this timeline analysis predicts a future shift towards more positive and neutral attitudes toward AI. Compared to R, ChatGPT offers faster data analysis and visualization, with more vibrant images and easier value adjustments.