For Students

Notice: Twitter Sentiment has changed to Sentiment140. The documentation on this page is deprecated.

Is the code open source?

Unfortunately the code isn't open source. There are a few tutorials with open source code that have similar implementations to ours:

Want to discuss ideas?

We have a special forum for discussion ideas here:

Do you have any project ideas?

If you are new to the field of sentiment analysis, I recommend reading the following by Pang and Lee:

There are still many unsolved problems in sentiment analysis. If you're interested, you can help us by working on one of the problems below.
  • Building a classifier for subjective vs. objective tweets. We've focused mostly on classifying positive vs. negative correctly. We haven't looked at classifying tweets with sentiment vs. no sentiment very closely.
  • Handling negation. Words like no, not, and never are difficult to handle properly.
    • Relevant papers:
      • Isaac G. Councill, Ryan McDonald, and Leonid Velikovich. 2010. What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. [pdf]
      • Potts, Christopher. 2010. On the negativity of negation. [pdf]
  • Handling comparisons. Our bag of words model doesn't handle comparisons very well. For example, in the phrase "Stanford is better than Berkeley", the tweet would be considered positive for both Stanford and Berkeley using our bag of words model because it doesn't take into account the relation towards "better".
  • The "aboutness" problem. Given a tweet, automatically detect if the sentiment is towards an entity.
    • Example:
      • about the term [Google]: "I love Google."
      • not about the term [Google]: "You should Google that."
    • Relevant papers:
      • Target-dependent Twitter Sentiment Classification [pdf]
  • Determine context switches. Sometimes tweets contain two different ideas. It would be good to be able to segment these two different ideas out. Here's an example: "Just chomped my way through a massive apple, was pretty tasty. Now for work. Business revision."
  • Building an accurate parser for tweets. Dependency parsers, like the Stanford Parser, doesn't handle ungrammatical text very well because they were trained on corpuses like the Wall Street Journal . It would be great to develop a parser that can handle informal text better.
  • Sarcasm detection.
  • Topic classification for tweets.
  • Tag clouds.  Given a list of positive and negative tweets, what are the most meaningful words to put in a tag cloud?
  • Applying sentiment analysis to Facebook messages. Facebook messages don't have the same character limitations as Twitter, so it's unclear if our methodology would work on Facebook messages.
  • Internationalization. We focus only on English sentences, but Twitter has many international users. It should be possible to use our approach to classify sentiment in other languages.
  • Sentiment as it relates to religion. Please contact Greg Troxell (gtroxell65 [at] if you're interested in this.
If you want more details or want to brainstorm with us, please let us know.