On Automatically Classifying Reviews

CS 615 Machine Learning

Group Members:

Kopal Rastogi [ 1711003 ]
Rohan Shewale [ 1711011 ]

Abstract

User feedback is imperative in improving software quality. Most of the apps or websites now a days have a user feedback system, where users can collaborate with their service/product and make it better by sharing resourceful feedback, (eg. Services such as Google Play for Android apps or online forums for discussion, allow users to submit feedback, reviews for downloaded apps or general topic of discussion in form of star ratings and text reviews). These reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. Current requirements engineering practices for gathering user input are characterized by a number of communication gaps between users and engineers which might lead to wrong requirements. The problem situations and context which underlie user input are either gathered back in time, or submitted with wrong level of details. We think that making user input a first order concern of both software processes and software systems harbors many innovation opportunities. We propose and discuss a continuous and context-aware approach for communicating user input to engineering teams and other users. The problem is defining different techniques to classify app reviews. We have proposed a generalized system which can be used in a similar feedback system, which can classify the reviews in six toxic behaviors, which are : toxic, severe_toxic, obscene, threat, insult, identity_hate. These classification will help us to extract resourceful user feedback.

Introduction

In modern product development, the input of the users and their acceptance of the product are of high importance for market success. This holds especially true for the software industry, which is characterized by rapid development cycles and a high competition. In software projects, user input receives enormous attention, e.g. through various activities in requirements engineering, short feedback cycles in agile methodologies, or user focused events such as user conferences or online forums. Surprisingly, there is no common and comprehensive theory of user input in software engineering. While decently incorporated in software processes, user input and feedback mechanisms in software systems themselves are not standardized and thus rather ad-hoc if they exist at all. We argue that user input is a concern, which is currently highly fragmented but harbors huge potential for innovation.

Internet has grown today to be collaborative platform to not only use different services and products but also to cast individual reviews and feedback. This has contributed to generating a very large amounts of information available in the form of on-line documents. Evaluating user needs is a subtle process, and even those companies with elaborate processes for gathering user input are not always successful. The problem may lie in what information was accumulated, how it was gathered, how it was processed, or how it was translated into product requirements. No matter which part of the process is “buggy”, it is crucial that the engineering team understands the users needs. In other words, understanding the needs means identifying the “pains” of the users and answering the question: why is this pain a real pain?

Technical advancements has now open new possibilities, like analyse the data on a very large scale within economical and time constraints. As part of the effort to better organize this information for users, researchers have been actively investigating the problem of automatic text categorization. One of the branches of analysis of such huge data is Sentimental analysis - the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.

Sentiment classification would also be helpful in business intelligence applications and recommender systems (e.g., Terveen et al. (1997), Tatemura (2000)), where user input and feedback could be quickly summarized; indeed, in general, free-form survey responses given in natural language format could be processed using sentiment categorization. Moreover, there are also potential applications to message filtering; for example, one might be able to use sentiment information to recognize and discard “flames”(Spertus, 1997).

Wikipedia is an open and large collaborative project, people from all over the world can make changes, suggest & create new content, and communicate with a feedback and review system to control the quality of the system. For our project we have consider this user feedback/review system. The review on this system serve as a communication channel between developers and users where users can provide relevant information to guide app developers in accomplishing several software maintenance and evolution tasks, and help other users understand a specific topic or action of the specific page or functionality.

Background and Problem Description

Billions of users regularly view use, contribute and review articles on Wikipedia. The reviews written by the users represent a rich source of information of not only the topic, but also the updated info, requested changes or feedback, it is also helpful for the the developers, as this feedback includes information such as user requirements, ideas for improvements, user sentiments about specific features, and descriptions of experiences with these features.

However, the amount of reviews is too large to be processed manually and their quality varies largely.

Moreover, there are also a bunch of useless, low quality reviews, which include senseless information, insulting comments, spam, or just acknowledgment/thumbs-up of parent comment, which do not contribute to actual conversation. With hundreds of reviews submitted per day for popular articles, it becomes difficult for community, developers and analysts to filter and process useful information from these reviews.

The (solution) Approach

User feedback is imperative in improving software quality. Many software companies collect data on user satisfaction through various means including store feedback, focus groups, surveys, error reports and other interaction networks.

Our approach to the solution is using a simple Neural Network architecture of type - CNN, which is popularly used for multi-label classification. The system also embeds fasttext to efficiently learn text representations and text classifiers. The CNNs can learn patterns from the test data in word embedding using the fasttext's English language words vector and as per the selected dataset of Wikipedia comments we can use the sub-word information to understand the behavior/emotion/sentiments of the comment and arrive at a definitive conclusion. The link to the dataset is provided below this article.

FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. It has directory with pre-trained word vectors of 294 languages[] , trained on Wikipedia using fastText, out of which we have used “Simple English”, though the CNN are quite generalized, and may produce positive analytical result in case the language is replaced with anyother word vectors provided. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.

Fig. User comment analysis

The main goal of our approach is to automatically identify application features mentioned in user reviews, as well as the sentiments and opinions associated to these features.

toxic
severe_toxic
obscene
threat
insult
identity_hate

An approach that merges three techniques:

Natural Language Processing,
Text Analysis and
Sentiment Analysis

to automatically classify app reviews into the above-said categories.

(src: "When Users Become Collaborators: Towards Continuous and Context Aware User Input")

Evaluation Study

The CNNs are feed pre-processed and pre-trained English language word vectors.

Fig. Visualizing the train & test data of comments length after normalization

Results

Fig.

Related Work

In [1] & [2] the authors approach the same problem in an app store specific scenario with a proprietary programming language MATLAB but common libraries (like NLTK), where their motive was to achieve better organization and classification of platform feedback and specific reviews, we have also used the sentiment score has describe in the paper for pre-processing. The authors of [3] presents a more friendly design to the development team of the app or program, in which the reviews are classified specifically to distinguish from an app request, bug report or simple comment. Our proposed project aligns more closely to [4] which generalizes user feedback systems and has software engineering approach for software modeling & evolution.

Conclusions and Future Direction

The current scope of project results in providing the toxic level of user comments in the system. It is very helpful to reduce the resourceful reviews from the toxic ones. This can be used as the first layer of filtration for applications like app review classification. Further, the resultant system give the toxic level of the user comments, which can also be extended useful to tackle fake news by their context.

References

W. Maalej and H. Nabil, “Bug report, feature request, or simply praise? On automatically classifying app reviews,” in Proc. of the International Requirements Engineering Conference, 2015, pp. 116–125.
- IEEE Xplore Digital Library
- Google Drive
E. Guzman and W. Maalej. How do users like this feature? a fine grained sentiment analysis of app reviews. In Requirements Engineering Conference (RE), 2014 IEEE 22nd International, pages 153–162, 2014.
- IEEE Xplore Digital Library
- Googel Drive
S. Panichella, A. Di Sorbo, E. Guzman, C. Visaggio, G. Canfora, and H. Gall, “How can i improve my app? classifying user reviews for software maintenance and evolution,” in Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on, Sept 2015, pp. 281–290.
- IEEE Xplore Digital Library
- Google Drive
L. V. Galvis Carreño and K. Winbladh. Analysis of user comments: an approach for software requirements evolution. In ICSE ’13 Proceedings of the 2013 International Conference on Software Engineering, pages 582–591. IEEE Press, May 2013.
- IEEE Xplore Digital Library
- Google Drive
K. Herzig, S. Just, and A. Zeller. It’s Not a Bug, It’s a Feature: How Misclassification Impacts Bug Prediction. In Proceedings of the 2013, International Conference on Software Engineering, pages 392–401. IEEE Press, 2013.
- ACM Digital Library
- Google Drive
W. Maalej, H.-J. Happel, and A. Rashid. When users become collaborators. In Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications -OOPSLA ’09, page 981. ACM Press, 2009.
- ACM Digital Library
- Google Drive
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov. In "Enriching Word Vectors with Subword Information", 2006
- Cornell University Library
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79-86, 2002
- ACM Digital Library
- Google Drive
Peter Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proc. of the ACL.
- ACM Digital Library
- Google Drive
Loren Terveen, Will Hill, Brian Amento, David Mc-Donald, and Josh Creter. 1997. PHOAKS: A system for sharing recommendations. Communica-ions of the ACM, 40(3):59-62.
- ACM Digital Library
- Google Drive
Roberto González-Ibáñez, Smaranda Muresan, Nina Wacholder. Identifying Sarcasm in Twitter: A Closer Look. In Proceeding of the HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 Pages 581-586 Association for Computational Linguistics Stroudsburg, PA, USA ©2011.
- ACM Digital Library
- Google Drive
Peter D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 417-424.

External Links

Wiki info page for open datasets - https://meta.wikimedia.org/wiki/Research:Detox/Data_Release
Selected dataset for project - https://figshare.com/articles/Wikipedia_Talk_Labels_Toxicity/4563973
Fasttext - https://fasttext.cc/
Fasttext Word Vectors - https://fasttext.cc/docs/en/pretrained-vectors.html

Download project files

Report abuse