Assignments‎ > ‎

Project 1 - A Search Engine

Due May 17, 2011

The goal of this assignment is to practice design and implementation of a large web application.  For this project, you will develop a personalized search engine that authenticates users, allows them to perform searches, and maintains a history of all prior searches each user has performed. 

Your application will support the following features for a total of 100 points plus possible extra credit:
  1. Search (15 points)- The main page of your site will present the user with a text box where he/she can type a search query.  When the user submits the query, your application will search a backend InvertedIndex for the crawled pages that contain the query terms and will return an HTML page that lists links to all of the pages containing the query terms.  
  2. User Registration (15 points) - In order to track a user's query history, you will allow users to register with your search application.  When a user visits the registration page, he/she will be asked to enter (at minimum) a username and password.  Your application will store this information and use it for login/logout.
  3. User Login/Logout (10 points) - Your site will allow a user to log in by entering a username and password.  If the user is not registered, you will ask him/her to register in order to log in.  If the user is registered, your site will use cookies to track the user's session. If the user logs out, you will clear the session.
  4. Search History (10 points) - Once a user is logged in, your site will save any queries the user enters.  Your site will provide a mechanism (e.g., a link) that will allow the user to view his/her search history.
  5. Account Maintenance (10 points) - Your site will allow a user to change his/her password and to clear his/her search history.
  6. Extra Features (40 points + possible extra credit)
    1. Page Preview (15 points) - In the search results provided by your site, you will show a few lines of the text from each result page.  This will require that you save a copy of each page you have crawled.   
    2. Page Visit History (15 points) - In addition to tracking a user's query history, maintain a list of the links a user has followed.  This will require that your search results page provide links that direct the user back to your site.  Your site will then record that the user has followed the link and redirect the user to the real page.
    3. Advanced Search (up to 15 points) - Allow the user to require that the query words appear next to one another in the result documents (i.e., allow the user to specify the query in quotation marks.)  Allow the user to specify a set of words such that pages containing the given words are not returned as part of the result set.
    4. Results Per Page (5 points) - Allow the user to select the number of results displayed on each page.
    5. Administrator Interface - New Crawl (10 points) - Provide an administrator interface that provides an administrator with the ability to enter a new seed URL to start a new crawl.  Newly crawled pages will be added to the InvertedIndex.
    6. Administrator Interface - Shutdown (10 points)- Provide an administrator interface that provides an administrator with the ability to gracefully shutdown the server.
    7. Performance Testing (10 points) - Implement a client-side program that will determine how much load your server will tolerate.  For full credit, produce a graph showing a performance metric, for example the response time for each request.
    8. Choose Your Own (up to 15 points) - See me to suggest a feature and I will tell you how many points you will earn for implementing it.

Implementation Requirements

  1. Your user account information and search histories will be stored in a mysql database. 
  2. You will use Servlets to dynamically generate web content and handle requests.
  3. Your server will take a seed URL as a command line parameter.  At startup, it will begin crawling the web starting at the given URL.  You may restrict the total number of pages crawled simply so you do not run out of memory.

Submission Requirements

  1. You must demonstrate a first release of your project on April 28, 2011 during the class period.  You need not demonstrate all functionality, but points will be deducted from your final score if you fail to show progress at the first release.
    1. It is recommend that you implement Search, User Registration, and User Login/Logout by the first release.
  2. You must schedule a demonstration appointment during the final exam period.  For your demonstration, you will show the functionality of your project, give an overview of your design, and respond to questions regarding your code.  Up to 20 points will be deducted from your final score for failure to complete this portion of the assignment.