Lab 3 - HTML
Due Friday 10/16, 5PM
The goal of this assignment is to give you practice with using and processing HTML. You will (1) hand-write an HTML site and (2) write an HTML tag stripper.
Part 1
For Part 1 of this assignment, you will familiarize yourself with HTML by hand-writing a web site. You will develop at least two web pages. The first will provide a Google-like interface where a user can perform a search query. The "Search" button will link to the second page that displays the results of a query. You will not actually be generating results. Instead, your "Search" button can simply link to another page that shows the results for a static query. The goal is really just to design the interface.
For submission, you will copy your HTML files to /home/web/your-user-name/searchsite. This will make your pages available at http://www.cs.usfca.edu/~your-user-name/searchsite.
If you really want to impress me, design a logo for your site!
Part 2
For Part 2 of this assignment, you will further familiarize yourself with HTML by writing a tag stripper. Your program will work as follows:
- Your program will take as input the name of an HTML file stored on disk.
- Your program will open the HTML file and remove all tags found in the document. Tags are anything between a < and >.
- All text occurring in a script element, <script> to </script>, will be removed.
- All href attributes (links appearing in <a> tags) will be saved for future processing.
- The text of the stripped document will be saved to a file text.txt.
- The links found in the document will be saved in a file links.txt.
- Your program will be run as follows (where /file/name.html will be replaced with the correct path and filename):
- java -cp tagstripper.jar Driver /file/name.html
As you design your program, keep in mind that you will eventually integrate this code with your Lab 2 code.
Grading
- (25 points) HTML page design and implementation.
- (30 points) Program correctly strips tags from HTML documents.
- (25 points) Program correctly extracts links.
- (20 points) Code design.