Sentiment Analysis Using NLTK
The Modules used for Basic Sentiment Analysis Project are:
Using the Pre-Loaded data from NLTK
from nltk.corpus import twitter_samplesall_positive_tweets = twitter_samples.strings('positive_tweets.json')all_negative_tweets = twitter_samples.strings('negative_tweets.json')fig = plt.figure(figsize=(5, 5))labels = 'Positives', 'Negative'sizes = [len(all_positive_tweets), len(all_negative_tweets)] plt.pie(sizes, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)plt.axis('equal') plt.show()Most of the data we receive in real world is dirty data. It needs lot of pre-processing and cleaning
Regular Expression are used to Pre-process the data:
from nltk.corpus import stopwords from nltk.stem import PorterStemmer from nltk.tokenize import TweetTokenizertweet2 = re.sub(r'^RT[\s]+', '', tweet)tweet2 = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet2)tweet2 = re.sub(r'#', '', tweet2)Tokenization
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)tweet_tokens = tokenizer.tokenize(tweet2)