2.2 Sentiment Analysis

Sentiment analysis is to study an opinion or emotion and attitude of respondents. Often, polarity is used in the sentiment analysis and it refers to the binary subject impression of the respondents. For instance, it can identify identify whether the reviews of the respondents are positive or negative.

Sentiment analysis still requires many of the text analytics steps such as tokenization, part of speech (POS) tagging, and word filtering. This study has used the default setting for both rule-based model and statistical model：

Statistical Model Settings

Rule-based Model Settings

Corpus

Under the corpus, rename the project name as “Hotel_Reviews” etc. Following, imported the reviews for the “positive” and “negative” directory respectively

Statistical Models

It is a form of supervised learning typically performed at the document level. It generated the basic rules that can be further modified or expanded with custom rules. It is effective in predicting the overall sentiment of a document.

Rule-Based Model

It is a form of unsupervised learning typically performed at the sentence level. It extract sentiment at the product & feature levels. Feature levels can reflect the sentiment on what excatly the customer like or dislike. Under the rule-based model sentiment analysis, it has two approaches, 1) Lexicon-based and 2) Syntactic-based.

Lexicon-based in relation to attributes mentioned in setence. For example, the hotel quality is "outstanding". Syntactic-based invlolved defining part-of-speech tags (nouns, verbs, adverbs etc), and conjunction words (AND, NOR, OR, NOT, BUT etc.)

Rule-Based (Rule Types)

There are 6 types of rule found in SAS Sentiment Analysis Studio:

CLASSIFIER
- The rule match specific words or strings found in the test documents. When a match occurs, the entire words or strings are highlighted and assigned the weightage that was specified
CONCEPT
- The rule has identical rule functions as CLASSIFIER. In addition, the rule is able to match words that are expressed as different forms such as “go” to “goes” or to “went”. Lastly, the rule is able to reference part-of-speech tags. “_def” marker can be used to match the defined category
  - Go@ -> Go, goes, going etc.
  - I love _def{places} -> I love Singapore, I love Orchard etc.
C_CONCEPT
- The rule matches words that are in particular order or context, by specifying the match be preceded, followed or both by other text
  - _c{I love} Sentosa -> I love Sentosa etc.
  - Fabulous _c{Hotels} -> Fabulous Hotels
CONCEPT_RULE
- The rule uses wildcards to match any word together. There are several types of Boolean operators (AND, OR, ORD, DIST_n, ORDDIST_n, SENT, ALIGNED, UNLESS) that can be used to specify rules to link words together. And all the specified arguments must be present to be valid
  - (AND, “best”, “_c{place}”) -> This place is good "and" affordable etc.
- “DIST_n”, all the specified arguments must occur within n words of one another to be valid
  - (DIST_3, “wonderful”, “_c{staycation}”) -> Wonderful place for staycation etc.
PREDICATE_RULE
- The rule allows multiple markers to define the relationship between two or more concepts with no specific order. Boolean operators are used to link words together
  - (SENT, “_a{good}”,”_b{restaurant}”) -> The restaurant is good etc.
  - (DIST_3, “_a{shopping}”, “_b{paradise}”) -> Paradise for shopping lovers etc.
REGEX
- The rule matches specific word or string and use regular expression to match patterns
  - [abc] -> Any single character a, b, or c in lowercase

Google Sites

Report abuse