CS8803 AIA @ Georgia Institute of Technology

The idea here is to allow readers to read my CS 8803 AIA commentaries at Georgia Institute of Technology.

MENU

1) Steps for installing Firestarter (fire wall) on Ubuntu   

2) Artificial Project 1 Uninformed and Informed Search Algorithms.

3) Artificial Intelligence Project on Alpha Beta Pruning.

4) Computer Security ( Bell LaPadula Model)

5) Critical Essay and Analysis Part 2

6) Critical Essay and Analysis Part 1.

7) Explicit and Implicit DLL Linking

8) Field Hiding and Method Overriding

9) FunWithJava

10) Fun With Pointers in C++

 

 

Paper #: 1.1 SE 1
Title: The Anatomy of a Large-Scale
Hypertextual Web Search Engine

The major problem that the paper attempts to 
address is the retrieval of high precision 
results from a search engine. To address this issue, 
a new novel algorithm Page Rank is introduced. 
Page Rank algorithm is an attempt by Google to deal 
with the challenges in information retrieval. 
The paper presents various attributes of Page Rank 
in relation to hyperlinks such as anchor text, font 
size etc that help Google produce quality results 
over their competitors. Later in the paper essential 
issues on scalability and search quality in Google 
that have been significantly improved are presented. 
The issue of improved search quality and scalability
 are central issue on which the paper revolves. 
Lastly the paper deals with the idea that Google 
can be used as a research academic search engine.



The central issue of harnessing the attributes of
 hyperlinks to improve search quality results 
(Page Rank) is a very novel and a unique method. 
None of the search engines including Yahoo in 
1992-1997 used this methodology like Google did. 
This alone is a major strength of the paper of
 looking at information retrieval in a different
 light. Other strengths of the paper are that the
 authors acknowledge the distinction between the web 
and the controlled collection of documents, 
limitations in computer hardware and their reasons 
for why the web crawler is a challenging task. 
Taking these issues into considerations, the authors 
present their algorithm. The paper has succinctly 
covered the central issues and the critical 
components of the authors' novel search retrieval 
from a very high perspective. This allows the reader 
to appreciate the algorithm without getting bogged 
down into low level details. The authors present 
storage statistics on page 14/20 that shows how 
Google will scale effectively as the size of the 
web grows by using storage efficiently through a 
repository. With these statistics it is clear that 
in 1997 the authors did grasp the notion of the 
explosion of the internet and a strength of the paper.
 

The paper fails to address any short comings of
the Page Rank algorithm. Potentially spammers 
can create websites with derogatory anchor texts
 all pointing to a victim site. This increases 
the ranking of the victim site in relation to that 
derogatory anchor text. The paper also does not 
present special cases of web pages that have no 
outgoing links. How is page rank calculated for 
such kind of pages? The authors assume a constant 
of .85 damping factor ('d') in their algorithm. 
This is hard to fathom because sites vary in their 
content from plain text to lots of images and 
animated text which allows for a wider distribution
 of the damping factor then just keeping it a 
constant .85. An extension to this idea can be 
incorporating the Page Rank Algorithm with Latent 
Semantic Analysis and whether this can yield 
better results?