Lab 1 - Twitter Client

Due - Thursday 9/2/2010 - 12:45pm

In this lab you will write a basic client program to access and update Twitter. Your program will query for a term and attempt to identify the most-discussed topics associated with the query. For example, you might query for san francisco and discover that a large percentage of the tweets returned refer to an upcoming marathon, while several also refer to a recent earthquake. You will then post an update with your findings -- for example, "popular topics related to san francisco: marathon, earthquake".

You will use raw sockets for part of the communication. You will also practice parsing data in either XML or JSON format.

Functionality

    1. Search - Your program will use raw sockets (java.net.Socket) to issue a query to the Twitter web service. The query phrase is configurable, and will be specified as described below. You may request results be returned in either JSON or Atom (XML) format, whichever you prefer to parse. See: http://apiwiki.twitter.com/Twitter-API-Documentation for more information about the Twitter Search API. Keep in mind that you should use HTTP 1.1 for communication, which requires you to specify the Host header field.
    2. Processing - Your program will extract only the status updates returned (ignore user ids, geo tags, etc) and process them to find the words that appear in multiple tweets. You may use a basic algorithm for this purpose, for example simply count the number of times each word appears. You may, however, want to ignore words like and and if.
    3. Update - Your program will post the results as a new tweet. You may use java.net.HttpURLConnection rather than raw sockets for the post.

Requirements and Hints

    1. Java is the recommended programming language for this assignment. You may choose to implement your program in another language (C, python, ruby) if you meet the following conditions:
      1. you receive explicit permission, via email, from the instructor;
      2. you provide explicit instructions for running your program in Linux (on the lab machines), ideally using a run script;
      3. you understand that the instructor and TA may or may not assist you with bugs, library installs, or other difficulties you encounter. Regardless of language, you must use raw sockets for the Search portion of the assignment.
    2. You will be graded on your code design and documentation, as well as the functionality of your program.
    3. In order to post status updates, you will need to use the OAuth authentication mechanism appropriately.
      1. Offline you will need to request a consumer key and a consumer secret from Twitter: http://twitter.com/oauth_clients
      2. You will then need to request a token and token secret. Signpost will help you with this. It is recommended you see this example: http://oauth-signpost.googlecode.com/files/OAuthTwitterExample.zip
      3. You will use your consumer key, consumer secret, token, and token secret every time you run your program. (Configuration details are below.) See this page: http://code.google.com/p/oauth-signpost/issues/detail?id=15 for a workaround to help you correctly authenticate a POST request. Note: you will also have to use the setTokenWithSecret method to set the token and token secret appropriately.
    4. When you program begins, it will read configuration information from a config file. You may either use the JSON-formatted config file or the XML-formatted config file. Examples of both are attached below. You may not change the format or name of either file.
    5. If you choose to use JSON for either the config file or the return format of the search, it is recommend you use the Argo JSON Parser: http://argo.sourceforge.net/ If you choose to use a different parser, you must provide the appropriate jar files when you submit your assignment.
    6. You must submit all .java and .class files in a jar named lab1.jar. Submit this file in a directory cs682/lab1 in your svn repository. Assuming config.json or config.xml is available in the directory where the program is run and that the jars available in /home/public/cs682/lab1 are in my classpath, I must be able to run your program as follows:

java -cp lab1.jar TwitterClient