Project 1 - Twitter v1

Due Tuesday 9/27/2011, 11:59PM

For Project 1, you will implement version 1 of a Twitter-like microblogging service. Your service will use a two-tier architecture. The replicated front end will consist of one or more hosts, each running a multithreaded HTTP server. You must implement the HTTP server yourself. The back end will consist of a data server that communicates with the front end, stores tweets, and provides tweets to the front end when requested.

User Interaction

Your service will provide very minimal functionality. There will be no user accounts; users can post status messages that may contain hashtags, and search the entire database based on hashtag. Valid user requests will be formatted as follows:

    1. POST /status/update?status=<a status containing one or more #tag items> HTTP/1.1 - The POST URI is /status/update and the POST must include a status parameter. The parameter may appear in the URI as shown here, or in the body of the POST request as described in the HTTP specification. The parameter is the actual text of the tweet, and may contain one or more hashtags. The HTTP version must be specified as HTTP/1.1. If a user issues a correctly formatted POST request, the server will reply with HTTP status code 204 No Content.
    2. GET /search?q=<tagnameForQuery> HTTP/1.1 - The GET URI is /search and the GET must include a q parameter in the URI. The value of the parameter is the hashtag on which the user wishes to search. Again, the HTTP version must be specified as HTTP/1.1. If a users issues a correctly formatted GET request, the server will reply with HTTP status code 200 OK and the body of the response will be XML formatted as shown below:

<tweets query="tag1" cached="yes/no">

<tweet>first status for #tag1 today</tweet>

<tweet>second status for #tag1 today</tweet>

<tweet>third status for #tag1 today</tweet>

<tweet>#tag1 #tag2</tweet>

</tweets>

The document is a collection of tweets. The root element contains an attribute that repeats the query and an attribute to indicate whether the data served was cached at the front end or retrieved from the back end (see below for details regarding caching). Each child of the root represents one tweet. If you prefer to use JSON as your response format, speak with the instructor. You may use JSON provided you receive explicit permission, via email.

Front End HTTP Server

Your front end will be a multithreaded HTTP server that receives requests from a web client. You may use Apache HTTP components http://hc.apache.org/. You may not use a web framework such as Servlets, Restlets, Rails, Django, or Pylons, though you may reuse code from 601 or other courses you have taken at USF. Your server must provide the following functionality:

    1. Process multiple requests concurrently. You may use any of the classes in java.util.concurrent for this purpose.
    2. Ensure that all requests are correctly formatted.
    3. Respond using correctly-formatted HTTP.

Below are the primary set of responses your server will generate:

In addition, your front end will cache results returned from the back end and respond using the cached results when appropriate. In order to support this functionality, the recommended implementation is as follows:

    1. The back end will maintain a version number that keeps track of the number of updates to the data store.
    2. When the front end executes a query on the back end, it provides the version number of its cached result of the query (if it has a cached result) along with the hashtag.
    3. If the back end has not been updated since the version specified by the version number, it will respond with a message indicating that the cached data is fresh.
    4. Otherwise, the back end with reply with the data and the front end will cache a copy of the data to service later requests.

Back End Data Server

The back end will be a separate, centralized process/program that maintains a database of all tweets. The design of the back end is largely up to you. The only requirement for the back end is that it must correctly handle concurrent requests to access and/or update data. It must support concurrent reads of the data but must prevent a read or write from happening during another write.

As you design the back end, consider the following:

    1. Communication with the front end - You may design your own communication protocol for front end/back end communication. Options include HTTP along with XML/JSON , Java RMI, XML/JSON over raw sockets, or a custom format of your design.
    2. Data storage - For this version, your data need not be persistent. You may use a HashMap and/or write to a flat file.

Testing

You will implement a test client that thoroughly tests all elements of concurrency in your system. You test client will, at minimum, do the following:

    1. Ensure that the web server handles multiple concurrent HTTP requests correctly.
    2. Ensure that the web server cache supports concurrent reads and disallows concurrent writes.
    3. Ensure that your data server handles multiple concurrent HTTP requests correctly.
    4. Ensure that your back end data store supports concurrent reads and disallows concurrent writes.

This will likely require that you implement careful logging in all parts of your system. Consider using log4j.

You will also implement an auto-launch script. This script will be a unix shell script that will log into three remote machines and launch the three components of your system: two web servers and one data server. This script must run on the linux machines in HRN 235 or 530.

Submission Instructions

    1. All code and instructions for running must be submitted in a jar file project1.jar. This file must be placed in your svn repository /cs682/project1.
    2. All students must sign up for a demonstration. A sign-up sheet will be made available closer to the deadline. During your demonstration, you will be asked to do the following:
      1. Run at least two instances of your web server on two different machines.
      2. Run your data server on a third machine.
      3. Demonstrate the following test cases - be prepared with an appropriate test client:
        1. Several correctly formatted POST requests.
        2. Several correctly formatted GET requests, including requests that retrieve cached data and requests that require data be fetched from the data server.
        3. An incorrectly formatted GET and POST.
        4. A GET/POST with incorrect URI
        5. A PUT request and a DELETE request.
      4. Provide an overview of your code design.
      5. Show specific elements of your code and be prepared to answer questions about any portion of your code.