Web Surfing

The Good, The Bad, and The Ugly: A Guide to URLs

Copyright 2005 by Jeff Suzuki

Introduction

The web was originally designed to facilitate research, and can be an incredibly rich source of information. But anyone can put up a web page, so how can you distinguish a good source from a bad one?

To answer this question, let’s think about the bibliographic reference style for printed works. A typical reference style includes the author, title, publisher, and copyright date. Why? Contrary to popular belief, this is not so that your teacher can check your quotations. There are two main reasons why you want to cite your sources in this fashion: First, if you say something that the reader finds very interesting, the bibliographic information gives the reader a path to further information. Second, if you say something the reader finds questionable, the bibliographic information allows the reader a means to assess the credibility of the source.

The author is the first clue: if the book is written by someone in a position to know, then it is probably a credible source: for example, a book on the 2000 election by Al Gore. The title gives you a second clue: not because a sober title like “Habsburg Reaction to the Defenestration of Prague” makes the work any better than “The Empire Strikes Back,” but because the former title gives more indication of what the work is actually about, and thus the reader can assess its relevance in your work: in this case, if your work is about the Thirty Years’ War, then the book is relevant, while if you’re writing about food and diet in twentieth century Germany, the book is probably less relevant and its conclusions, with respect to your thesis, are less credible.

But what if you don’t know the author’s credentials? This is where the publisher’s credentials come in. A paleontology book by John Smith sounds a lot less credible if it was published by Creationism Press. Finally the publication date is another way to judge a source: Women in Politics (1997) is probably much more credible than Women in Politics (1897).

With these factors in mind, let’s consider web resources.

URLs

Every web page has an address called the URL (for “Uniform Resource Locator”), which you can find by looking at the location bar. The http:// part just tells your web browser what it’s looking for a web page; the part following is the actual URL. Some (made up) URLs might be:

    • www.US.politics.com/election/2000/gore.html

    • www.creationism.org/paleontology/smith.html

    • www.lucystone.net/suffrage.htm

The last component (everything from the final “/” onwards) is the name of the actual web page: in the above, “gore.html”, “smith.html”, “suffrage.htm”. The web page might also end in “.asp”, “.jsp”, or have characters like “?” or “+” or “&”. These indicate that the web page was custom-generated at the time you visited the page. To use a print analogy, the web page is a page in a book; extending the analogy, the rest of the URL tells you which chapter of which book by which publisher, with each logical component separated by a “/”.

Domain Name

In our publishing analogy, the first part of the URL gives you the publisher (in webspeak, it’s the “domain name” of the “host” of the web page or website). For the above examples, the domain names were www.europe.politics.com, www.creationism.org, and www.lucystone.net. The first question you might want to ask is, “Is this a credible publisher, or a purveyor of cheap foolscap?”

To find out, we must break down the domain name. You can think of the first part as the actual name of the domain (publisher): for example, “www.US.politics” or “www.lucystone”. So long as the name is not in use, anyone can register any name. For example, you could try to register the domain name “www.bardcollege”. As long as no one else has the name, you could buy (actually rent) it for under $20 a year.

Because anyone can register any name, the important part of the domain name is the “domain name extension.” This is the very last part of the domain name: www.europe.politics.com, www.yale.edu, and www.lucystone.net. The extensions tell you something about who the domain is registered to. Some of the important ones:

    • .com (pronounced “dot-com”): This indicates a business. Anyone can get a .com site.

    • .net (pronounced “dot-net”): Usually indicates an internet service provider (ISP), though anyone can get a .net site.

    • .org (pronounced “dot-org”): Reserved for non-profit organization.

    • .edu (pronounced “dot-edu”): Reserved for schools, colleges, or universities.

    • .gov (you get the idea): Reserved for the US government.

    • .uk, .ca, .fr, .it, .se: Country codes for Great Britain, Canada, France, Italy, Sweden, etc.

For example, food.francaise.fr is a physically located in France, www.anarchy.gov is a US government website, www.encyclopedia.net is an internet service provider, hot.cars.edu is a school, and so on. If you wanted to register the domain www.bardcollege, you would have to take www.bardcollege.net, since www.bardcollege.com and www.bardcollege.edu are already taken (and in any case, .edu is reserved for actual educational institutions). Anyone reading the URL would see the .net extension and know that the site is probably not an official Bard College site.

Now consider publishers. University presses generally have some guidelines over what they will publish, and the fact that a book is published by a university press gives it a veneer of credibility. Likewise, the fact that a particular domain hosts a particular website can give you some information about the credibility of the information on the website.

Companies (.com) and the government (.gov) sites maintain the strictest control over what they host (which, of course, is not the same as guaranteeing accuracy or objectivity). Generally speaking organizations (.org) will also maintain strict control, though again, this is not the same thing as accuracy or objectivity: www.whitesupremacy.org might host pages on the biology of race, but you’d be wise not to trust their findings. ISPs (.net) are the least reliable: in general, they sell space to anyone able to pay, and as a result anyone can put up anything. (Just to muddy the issues, some commercial ISPs are also .com: www.yahoo.com, for example)

Educational institutions (.edu) fall into a middle category. Remember that at most institutions, anyone—students, faculty, or staff—can have personal web pages hosted by the institution. Unless the web page is in incredibly bad taste (for example, a web page advocating compulsory euthanasia for anyone over 40), it is unlikely to be censored—and even then, it will only be removed if somebody notices and complains.

As a rule of thumb, shorter URLs indicate a greater degree of domain sponsorship and sanction: www.college.edu/repulsivepolitics.html is probably institutionally sponsored and sanctioned, while www.college.edu/people/adams/misc/ideas/repulsivepolitics.html is more likely to reflect the ideas of someone at the institution.

Bottom to Top

The domain name is useful, but not necessarily reliable; moreover, for a variety of reasons all countries besides the US have country codes appended: a French site will end with .fr, a British site with .uk, and so on; British universities, for example, will have a domain name ending ac.uk, which indicate they are an educational institution (ac) in the United Kingdom (uk). In addition, appropriately assigning domain name extension is voluntary.

The easiest way to look at the domain names is that they tell you when a web page is not likely to be reliable. What you want to be able to do is to work your way back to the author of the web page. If the author of a page on civil war tactics is a Ph.D. military historian teaching at a major university, then the web page itself is credible.

How can we find out more about the author of a web page? In our publishing analogy, the URL gives publisher, book, and page. Good HTML style mandates links to the home page on all pages part of a site. Maybe 10% of the web pages in existence have these links; for the rest, you’ll need to find your own way to author’s home page.

Consider a web page like www.college.edu/faculty/smith/goodstuff.html. Using our book analogy, we see that the page “goodstuff.html” is in the chapter titled “smith” in the “faculty” book published by “www.college.edu”. Since an educational institution is hosting the web page, the web page itself has a veneer of credibility. But who is Smith, and why should he or she be a credible source of information?

Note that the URL is http://www.college.edu/faculty/smith/goodstuff.html. If we drop the last item, we obtain the URL for the “smith” chapter:

http://www.college.edu/faculty/smith

If Smith has a home page, entering this into the location bar would take us to it (if you’re already at http://www.college.edu/faculty/smith/goodstuff.html, you can just delete the “goodstuff.html” and hit ENTER to http://www.college.edu/faculty/smith).

Smith’s home page tells us what Smith thinks about Smith. Anyone can claim any credentials they want, so we might be skeptical of the information on the page (especially if it’s a hosted by a commercial ISP). If we go up one level to

http://www.college.edu/faculty

we can probably find what the faculty think about Smith. (Note: you might or might not have access to all levels of a site; if you don’t have permission, your browser will tell you. It’s sort of like the rare books room of a library: not everyone can get it, but there’s nothing wrong with asking. Also, just because you don’t have permission at one level does not mean you won’t have permission at another level) Finally the college’s home page is:

http://www.college.edu

This will tell you what type of college Smith works at, and will provide a further way to help you assess the validity of the source.

Practice Problems

Try your hand at the following.

    1. A search for the English poet Milton turned up the following links. Rank them in terms of likely credibility:

        • www.msu.edu/user/kilpela/ywy09.htm

        • www.2020site.org/milton/family.html

        • www.infoplease.com/cgi-bin/id/CE034505.html

        • www.smith.edu/english/courses/fall02/308.html

    2. Based just on the URLs, write down all the information as you can about the web page.

        • www.csus.edu/portfolio/prog/engl/2SyllabiEng145i.stm

        • home.earthlink.net/~glenn_scheper/Edmund_Spenser.htm

        • www.shef.ac.uk/english/modules/lit207/site/rflect1.html

Back to Home