Lab 3 - HTML

Due Friday 10/16, 5PM

The goal of this assignment is to give you practice with using and processing HTML. You will (1) hand-write an HTML site and (2) write an HTML tag stripper.

Part 1

For Part 1 of this assignment, you will familiarize yourself with HTML by hand-writing a web site. You will develop at least two web pages. The first will provide a Google-like interface where a user can perform a search query. The "Search" button will link to the second page that displays the results of a query. You will not actually be generating results. Instead, your "Search" button can simply link to another page that shows the results for a static query. The goal is really just to design the interface.

For submission, you will copy your HTML files to /home/web/your-user-name/searchsite. This will make your pages available at http://www.cs.usfca.edu/~your-user-name/searchsite.

If you really want to impress me, design a logo for your site!

Part 2

For Part 2 of this assignment, you will further familiarize yourself with HTML by writing a tag stripper. Your program will work as follows:

    1. Your program will take as input the name of an HTML file stored on disk.
    2. Your program will open the HTML file and remove all tags found in the document. Tags are anything between a < and >.
    3. All text occurring in a script element, <script> to </script>, will be removed.
    4. All href attributes (links appearing in <a> tags) will be saved for future processing.
    5. The text of the stripped document will be saved to a file text.txt.
    6. The links found in the document will be saved in a file links.txt.
    7. Your program will be run as follows (where /file/name.html will be replaced with the correct path and filename):
      1. java -cp tagstripper.jar Driver /file/name.html

As you design your program, keep in mind that you will eventually integrate this code with your Lab 2 code.

Grading

  1. (25 points) HTML page design and implementation.
  2. (30 points) Program correctly strips tags from HTML documents.
  3. (25 points) Program correctly extracts links.
  4. (20 points) Code design.

Submission Instructions