Lab 3 - HTML
Due - Tuesday 10/26, 11:59pm
The goal of this assignment is to give you practice with using and processing HTML. You will (1) hand-write an HTML site and (2) write an HTML tag stripper.
Part 1
For Part 1 of this assignment, you will familiarize yourself with HTML by hand-writing a web site. You will develop at least two web pages. The first will provide a Google-like interface where a user can perform a search query. The "Search" button will link to the second page that displays the results of a query. You will not actually be generating results. Instead, your "Search" button can simply link to another page that shows the results for a static query. The goal is really just to design the interface.
For submission, you will copy your HTML files to /home/web/your-user-name/searchsite. This will make your pages available at http://www.cs.usfca.edu/~your-user-name/searchsite.
If you really want to impress me, design a logo for your site!
Part 2
For Part 2 of this assignment, you will further familiarize yourself with HTML by writing a tag stripper. Your program will work as follows:
- Your program will take as input the name of an HTML file stored on disk.
- Your program will open the HTML file and remove all tags found in the document. Tags are anything between a < and >.
- All text occurring in a script element, <script> to </script>, will be removed.
- All text occurring in a style element, <style> to </style>, will be removed.
- href attributes (links appearing in <a> tags) meeting the following criteria will be saved for future processing.
- It must start with http://. This means that you will NOT be
- handling relative links. This is not ideal, but you will not be
- penalized for not dealing with relative links for this assignment.
- It must end with no extension, the extension htm, or the extension
- html. For example, google.nl is valid -- it specifies only the domain.
- The text of the stripped document will be saved to a file text.txt.
- The links found in the document will be saved in a file links.txt.
- Your program will be run as follows (where /file/name.html will be replaced with the correct path and filename):
- java -cp tagstripper.jar Driver /file/name.html
As you design your program, keep in mind that you will eventually integrate this code with your Lab 2 code.
Attached are several sample test cases. We will design some similar, simple, test cases and also test your program on several real web pages (e.g., google.com).
Grading (still subject to change...)
- (25 points) HTML page design and implementation.
- (30 points) Program correctly strips tags from HTML documents.
- (25 points) Program correctly extracts links.
- (20 points) Code design.