Coursework 1 Example Outline

advertisement
Search Engines, Optimisation (SEO) and Web Search.
Overview and Critique
By Alessandro Ballarin - autumn 2002
Abstract
What are people truly seeking when they approach a web search engine? And also, what does determine
‘relevant information’? In the field of Information Retrieval restricted to the web, these are complex
questions, if we only think that two or more people might consider different answers relevant when they
process exactly the same query.
Search engines attempt to use automated scientific methods and techniques to deliver relevant results to
match a search query. It is a complex process that sometimes works wonderfully and other times can be
frustrating for any user.
But, is there anything that can be changed to improve it? With this in mind, an understanding of the
nowadays available search technologies and a brief analysis of the search engine optimisation (SEO)
emergent field, aimed to improve users’ web search, has been a main focus of this research.
This paper gives an overview of how search engines work, attempts a critical approach of various
developments of the search environment and put together a proposal on how better make use of the
relevancy systems. Search engines rank by their relevance scores, and even for those that offer other
advanced search options, relevance ranking is the default. Why not an option?
Finally, it is widely believed search engines are in an excellent position to continually adapt the relevance
techniques to try and achieve even better success, it could be a big step ahead for all of us as users to have
a more transparent and better regulated environment where to search information and a way to personalise
the subjective concept of relevancy in our web searches.
Page 1 of 16
Contents
Abstract …………………………………………………………………………………………………...1
Contents…………………………………………………………………………………………………...2
Discussion notes ….……………………………………………………………………………………...3
Question for discussion…………………………………………………………………………………..4
1.
How search engines work .................................................................................................. 5
1.1
1.2
1.3
1.4
1.5
1.6
2
What is SEO?..................................................................................................................... 9
2.1
2.2
2.3
2.4
2.5
3
Overview .................................................................................................................... 5
Spiders and indexes ................................................................................................... 5
Search interface ......................................................................................................... 6
Search features .......................................................................................................... 7
Search Results ........................................................................................................... 8
Relevancy algorithms ................................................................................................. 8
Overview .................................................................................................................... 9
Origins and establishment of SEO industry ................................................................. 9
Optimising .................................................................................................................10
Link analysis approach ..............................................................................................10
Spamming .................................................................................................................10
How to improve web searches?.........................................................................................11
3.1
3.2
The need of a regulation ............................................................................................11
An idea:ranking algorithm customisation....................................................................12
Conclusions…….….…………………………………………………………………………………….13
Bibliography…………...…………………………………………………………………………………14
Page 2 of 16
Discussion notes
Nowadays we all are aware that search engines have come a long way since their first appearance on the
web back in the mid-1990s. Today’s search engines are not only far more likely to deliver accurate
results, but they will also make use of images, audio files, databases in the process of delivering them.
Even though we give search engines so little information we still expect the “ideal search engine” would
understand exactly what we meant and give back exactly what we were looking for. We all are aware,
also in this, that search engines have long way to go.
With search currently the second most popular internet activity after e-mail, the race towards creating the
"ultimate" search engine is being highly contested in a packed marketplace.
Possibly, the future of search lies in creating personal profiles of searchers. Search company Inktomi
(inktomi.com) has just started experimenting with personalised searches within the corporate arena, using
information that is already available about employees to create customised searches within a closed
environment. However, privacy and technological issues stand in the way for this kind of solution.
As the web gets bigger, an entire industry is springing up around the business of how to get a website
listed high on a search engine results page, now that advertising has become a regular feature on such
results pages. Some companies, such as LookSmart, specialise in pay-for-performance content.
Meanwhile, companies such as iProspect and NetBooster specifically help advertisers find marketing
strategies for getting to the top of the results listing, practising what is called SEO, standing for search
engine optimising.
For some search providers, such as Convera (formerly Excalibur Technologies) and iphrase
(iphrase.com), the future of search lies in the automation of human thought and linguistic processes.
Iphrase and Convera's products aim to abstract the meaning of documents rather just look at their
syntactic properties, such as matching keywords.
Many are sceptical about the success of such "semantic engines". "The history of trying to bridge the
syntactic-semantic cut in artificial intelligence has been a history of ignominy," says Anil Seth, postdoctoral fellow in theoretical neurobiology at the Neurosciences Institute in San Diego. "Semantics
cannot simply be encoded or decoded from a syntactic foundation. Too many other factors, such as
culture and natural language, get in the way."
Generally search engines move in the direction of improving the relevant information given to the users in
the results listings by continuously updating the ranking algorithms, also in order to avoid attempt of
cheating their way into the mythical 'top ten position'.
But for all their improvements in relevance ranking techniques, there are plenty of searches where the
techniques fail significantly. The technology will continue to improve. But while the science of relevance
ranking may retrieve ever better possibilities, finding accurate and comprehensive answers will remain an
art for some time to come1 (Greg R. Notess).
1
Greg R. Notess “The never-ending quest: search engine relevance” from ONLINE magazine – May 2000
(www.notess.com).
Greg R. Notess is a reference librarian at Montana State University.
Page 3 of 16
Questions for discussion
1. User, as consumer, should certainly have adequate rights and protection also within the
context of web search. Between Pay-for-placement, adverts and unethical or excessive
SEO techniques, user is not aware if results shown are the best possible out there.
Could it be a ‘better regulated search environment’ a key solution to ensure transparency
of results for users in web searches?
2. SEO stands for ‘search engine optimisation’. It essentially is the art and science of
increasing a web site's visibility in the major search engines and directories across a
strategically defined list of keyword phrases that relate to products, services, or
information offered on the web site.
Do you think Search Engine Optimisation (SEO) professionals would be increasing in
number or totally disappear in 10 years time? Why?
3. What would you think to personalise the search engine ranking algorithm in each of your
web searches? Maybe to include it as a super advanced feature for expert search users.
Introduction
The Web is mainly lacking in organisation and is so vast that even long time experienced searchers
express great frustration in using it for finding information.
In fact many people who use search engines don’t really understand what they are and don’t bother to
learn how to take full advantage of their capabilities. Most of them would simply type one or two words
into a search form and are unpleasantly surprised when they are presented with thousands or even
millions of pages.
Easily people get the impression that is possible to ask anything to the favourite search engine and get the
relevant answer in seconds. They are also not aware at all of how the transparency and the relevancy of
results listing, obtained matching the search query, could be affected by a pay-for-placement policy
adopted by the search engine or search engine optimisation (SEO) issues.
Main focus here in this report is to acknowledge and help the search users to have improvements in their
web search experience by giving the know-how to get these improvements. Understanding what search
engines do, how they do work, what is the effect of the search engine optimisation in the results listing is
the way, the key for better web searches.
In the next chapter a brief overview of available search engines technologies will be given, then in turn a
chapter explaining what the SEO industry is there for, concluding with a theoretical approach on how a
web search could be better improved under a user perspective.
Page 4 of 16
1. How search engines work
1.1 Overview
Generally speaking, search engines work by matching queries from users against indexes previously
created from them, they then rank the relevant documents and end the process by displaying a results
listing accordingly.
In other words search engines are tools that let you explore databases containing the text from hundred of
millions of web pages. They are designed to make search as easy as possible for users, or at least the
major search engines2 are.
But with millions and millions of web pages out there on the Web, and more being added all the time,
how can search engines possibly collect them all?
The answer to that question is through spiders even though they don’t collect them all, just a part of it 3,
strictly depending from the different approaches search engines adopt.
1.2 Spiders and indexes
Search engines so gather their data by the use of spiders (or crawlers), which are robot software designed
to track down web pages, follow the links they might contain and add any new information they
encounter to a master search engine database or index. Each of the search engines, as mentioned before,
has its own way of doing things. Some, for example, program their spiders to search for only the titles of
web pages and the first few lines of the text. Others could index each single word contained in the web
page so that all of them will be searchable. The spiders’ work is, in some cases, complemented by the
work of human beings, who spend time visiting, selecting and classifying web sites based on their
content. An approach like this is used by Yahoo (yahoo.com) which still maintains its own staff of surfers
to perform this function.
The great thing of spiders is that they operate tirelessly around the clock, even though, some might take
longer than others to visit web pages so that some search engines might index them and others may do it
later; this is also one of the reasons why the same query to different search engines might give different
results.
There are also some problems regarding the indexing, examples are the use of application software like
Shockwave, Flash and text in graphics which make it invisible to the indexing spiders, so that any page
that make extensive use of these features will not be indexed properly unless it has also the text in HTML.
2
Major search engines are: Google, AllTheWeb, Yahoo, and MSN Search, followed by Lycos, AskJeeves and AOL
Search. “The Major search engines” - October 12, 2002 - www.searchenginewatch.com/links/major.html
3
Statistic about the size of search engines at www.searchenginewatch.com/reports/sizes.html
Page 5 of 16
1.3 Search interface
Increasingly the personalisation of the web search is what all major search engines are trying to achieve at
the moment.They do this by providing a set of configurable search pages, traditionally two: a simple form
with a few checkboxes or radio buttons, and an advanced form with more complex options.

Basic form
The simplest form of a basic search interface is a small form with a text field and a Search (submit) button
as the one shown in the screenshot below:
This form usually includes a number of default fields hidden from end-users, such as whether the search
engine should find pages that match all the search terms, or any one of them. Another common hidden
field is one that determines whether the search engine will match pages containing all the words or any of
the words.

Advanced form
The advanced search form will be usually used from only a small group of expert users. It can display
many possible searching options, such as limiting a search to headers, limiting the search to a specific
date-range, and whether to use substrings. The following screenshot shows the advanced search form of
Yahoo:
Page 6 of 16
This form would let search users choose whether to search the entire site, certain sections, or just the
newest files. Usually the available options depend on the abilities of the search engine and indexer.
1.4 Search features
Search engines have a number of search features for users.
The following could be some of the most common:
Boolean operators and emerging standards (+ for required word, - for disallowed word,
and "quotes" for exact phrase)
Graphical interface using radio buttons, checkboxes, etc.
Possibility of specific search zones (e.g.: Geography, Criminology, Computing sciences,
etc.)
Date range searching
Parentheses or multiple text-search fields
Field searching (host name, title, URL, date, size, metadata).
Language: limiting searches to documents in specific languages, or even cross-language
retrieval (e.g.: translating searches to match other vocabularies).
And more …
Page 7 of 16
For advanced features, some search engines also advertise natural-language queries, thesaurus and
synonym lists, graphic results visualization, concept mapping and so forth. In practice, most web search
users are used to process queries without them but for a more complete and accurate search experience
begin using them would not be a bad idea.
1.5 Search Results
The final step for a search engines after a search user query has been matched to the indexed database and
the most relevant documents are found is the display a search results listing. This consists mainly in a list
ordered in the way the search engine ranks its results. It is also, generally, possible to customise the
results in order to arrange them in a clear, useful and personalised manner. This screenshot shows an
example of results listing of web search for “web page ranking” on Google:
Generally all results are listed in order of relevancy, they include a the page title which is a link to that
web page, also some other information depending from the different search engines like, the first few
lines of the text, the modified date, the URL, and probably an evaluation of how closely a page matches
the search term based (not shown in Google), again, according to different criteria and ranking
algorithms.
1.6 Relevancy algorithms
They are the methodology or the procedures by which search engines calculate and rank search results.
Usually called ranking algorithms, they are different for each search engine but all seem to follow to some
Page 8 of 16
degree the same pattern. They could depend from a wide variety of factors including domain name,
matching keywords appearing near the top of the web pages, spiderable content, submission practices,
html code, and link popularity. They are generally top secret for each search engine and periodically
changed for various reasons. The main reason is to be aware from the job of extreme SEO professional
which try to reverse their ranking algorithms in order to get client’s web sites well positioned in the top
results listing page.
2 What is SEO?
2.1 Overview
SEO, which stands for search engine optimisation, is effectively the process of manipulating ranking
algorithms to improve search result positioning for a given web page.
In performing a web search the majority of the users still naively believe any search engine simply would
deliver from the web the best possible results/matches, according to their ranking algorithm. In fact, it
does not take long and much to find out that chances are, many company sites with high rankings will
have paid for the privilege, either directly to the search engine for a placement, or, through use of search
engine optimisation (SEO) specialists, to be in the top results listing.
2.2 Origins and establishment of SEO industry
SEO is one of new reality of the internet economy. A field that is probably been taking to light by the
american company iProspect when it was founded in 1996 (iProspec.com).They, intuitively, have built a
huge business on the basis that 80% of the web users would rely on the top six search engines (Yahoo,
Google, MSN, AOL, Lycos and AltaVista) to look for information, and obviously, which company would
not wish to be ‘positioned’ in their top results first page?
The discipline of SEO is also difficult and tricky as through a simple lack of understanding of what the
search engine "spiders" that trawl the web are looking for , the sites owners’ could risk a non-placement
in the top six search engines with the relative consequences and loss of benefits.
It is in fact being surveyed from the iProspect SEO pioneer company that a large number of search
engines users will assign brand value or equity to a top ranked web site, disregarding the fact that the
search engine’s mathematical algorithm is the cause. This demonstrates, especially for new internet users,
that top search engines listings transmit brand equity, so that, for example, lesser known brands or reseller
companies could increase their perceived brand equity just by improving their positioning in the top
search matches.4
SEO experts say that to effectively optimise a site, it is important to understand that search engines
essentially do two basic things: index text and follow links. If the site does not contain these, it could be
considered ‘invisible’, from a web search perspective.
Survey outcome at www.iprospect.com/branding-survey by Dr Amanda Watlingon , Fredrick Marckini – May
2002
4
Page 9 of 16
In USA the SEO industry is in a very advanced stage and also in Europe, the market already boasts of
dedicated SEO professionals, including Sticky Eyes, NetBooster, Search Engineers and Web Gravity, just
to mention some.
2.3 Optimising
An understanding of the work of the search engine optimising industry helps in interpreting the
“relevance ranking” the search engines deliver. Basically, any company that belongs a web site would
like people to find it easily, especially for e-commerce companies, the more customers they can attract the
better.
The first usual step is to ensure the web site to be indexed and so included in the database of the
directories and search engines. Optimising also synonymous of ‘positioning’ consist to go far beyond just
getting a website included in the database of directories and search engines, it consist in bringing it all the
way to the “top ten” results listing!
2.4 Link analysis approach
In a way search engines do all their best to assure relevancy of documents has got anything to do with
extreme optimisation. In fact Google was the first search engine to make use of ‘link analysis’ which now
plays an important role in all of the major search engines.
The basic principle of link analysis is to rate highest those pages that most other pages point to using the
search term. In other words, if 100 pages point to sbu.ac.uk homepage when they make a hyperlink on the
word sbu.ac.uk, then that will rank higher than wmin.ac.uk (Westminster university homepage) if only 20
pages point to that web site.
The strength and positive aspect of the link analysis approach, from a search user perspective, is that it
makes it much more difficult to optimise inappropriately, since that would require changing other people's
web pages. Previous relevance criteria were all determined by words and word patterns on the page itself.
Link analysis looks at many other pages to see where they link.
There are some drawbacks as well, in fact, link analysis approach seems to be an excellent ranking
mechanism for some searches, but if we take in consideration new born sites, they have a distinct
disadvantage. When someone puts up a new web site, he can submit it to several search engines, but it
takes time for a search engine to spider new sites. And even after that site is indexed, there may not be
other pages linking to the new site yet.
2.5 Spamming
Spamming, in general, is an attempt to feed misleading information or different web pages from the actual
one to search engines in order to gain favourable positioning.
Search engines view spamming seriously, as it compromises the quality of their results and especially of
their users results. Unfortunately there is no exact definition of what ‘spam’ is and what is not and it again
depends from the different search engines, they in fact often change their own definition of spamming
several times during a year.
Some example of spams are here listed 5 :
5
From Inktomi spam policy at www.inktomi.com/products/web_search/guidelines.html
Page 10 of 16
Pages which harm accuracy, diversity or relevance of search results
Pages whose sole purpose is to direct the user to another page
Pages which have substantially the same content as other pages
Sites with numerous, unnecessary virtual host names
Pages in great quantity, automatically generated or of little value
Pages using methods to artificially inflate search engine ranking
The use of text that is hidden from the user
Giving the search engine a different page than the public sees (cloaking)
Cross-linking sites excessively, to inflate a site's apparent popularity
Pages built primarily for the search engines
Misuse of competitor names
Multiple sites offering the same content
Pages which use excessive pop-ups, interfering with user navigation
Pages that are deceptive, fraudulent or provide a poor user experience
And more…
3 How to improve web searches?
3.1 The need of a regulation
If we try to see from a user, and so consumer perspective, it could be argued that SEO unfairly skews
search results in favour of companies, rather than users, but there seem to be a general consensus
nowadays that ethical SEO can actually help consumers find the services they want without affecting the
consumer’s rights.
Page 11 of 16
I personally would not take that for granted and put the web search environment in discussion, as the use
of search engines and web continues to grow, i believe that a lack of strict and specific regulation in this
matter gives both search engines and SEO industry a too wide room for acting out of sight.
Here I collected two very interesting views about the web search environment:
Henrik Hansen, director of marketing for enterprise search with Inktomi (Inktomi.com), says his
company's engineers actively work to combat unethical behaviour through increasingly sophisticated
anti-spamming algorithms and regular human intervention by a team of editors, who check search results
for accuracy and relevance.
Danny Sullivan, a well known search specialist and editor of online publication
(SearchEngineWatch.com), says "Paid listings are not going to go away for a long time; people are being
shown too many paid links in comparison to editorial content. Search engines are going to have to
provide a filter for search results as well as ads,"
The view of Henrik clearly indicate from a search engine perspective the need of a better regulated search
environment, as ideally search engines job is to try to give us end users the best possible answers in terms
of relevancy and not to worry in investment regarding setting anti-spamming policies.
The Sullivan opinion also give some ideas of what could and should be regulated regarding non clearly
highlighted pay-for-placement results ,or advertisements, otherwise how any user is suppose to know
what is and what is not a ‘real’ relevant result in the listing proposed?
and more from Danny Sullivan: “using techniques that try to trick the engines into doing what they want
is not where companies should be putting their web development effort. Properly done, SEO can be
highly effective, generating qualified traffic for site owners, improving search engine accuracy and
delivering relevant, useful information to users."
That is ideally the point where everyone would like to be, but can it really be possible without a specific
regulation?
3.2 An idea: ranking algorithm customisation
What I am going to propose here is just a hypothetical idea on how to personalise the subjective concept
of relevancy during web searches.
In order to do that it has to be clear the concept that there is no other means rather than us to better judge
which information is more relevant to our queries.
The way the search engines work at the moment, as previously referred, is to judge through a ranking
algorithm which document are more relevant for queries. The ranking algorithm, say for example
Google’s PageRank6 is a very complex formula which is unlikely any average user would be able to
deeply understand.
My idea would consist to let the user choose, in a customised fashion, the ranking algorithm for each web
search. This will consist in giving an option between a number of different algorithms each of which
would emphasise one or more different variables.
That’s the way it could look like:
6
This paper is where i learnt how complex is PageRank: The PageRank Citation Ranking: Bringing Order to
the Web (1998) Larry Page, Sergey Brin, R. Motwani, T. Winograd
Page 12 of 16
ALGO 1: The order in which the keyword terms appear. If a keyword appears early in the web
page to be ranked higher
ALGO 2: The frequency of keyword. The more times a keyword appears, to be ranked higher
ALGO 3: The occurrence of a keyword in the title. If the keyword that you enter appears in the
document's title, or meta tag fields, to be ranked higher
ALGO 4: Rare or unusual keywords. If they do not appear often in the index to be ranked higher
than common terms.
ALGO 5: Link popularity based algorithm. The more links point at that particular site or
keyword to be ranked higher
ALGO 6: Natural language processing based algorithm. Try to guess the meaning of the query to
be ranked higher
ALGO 7: Text analysis based algorithm. The text or content of the web page to be ranked higher
ALGO 8: Latest date. The documents with latest update to be ranked higher
ALGO 9: Size of the document. The documents with bigger(or smaller) size to be ranked higher
And so forth…
Where ALGOs are the different ranking algorithms for the user, to choose from.
The idea is then extended to the fact that more ALGOs can be selected simultaneously for the same
search, e.g.: a web search with a combination of ranking ALGOs 2, 3 and 7.
And, furthermore, the user would have a range of different weights to select for each ALGO.
In the previous example combination, the user could increase or decrease the weight given by the ALGOs
2, 3 and 7 by opting them to rank high or low.
Simply, as i have already mentioned, this is just a hypothetical idea, which could go towards solutions to
the nowadays extremely sophisticated problem of search engines and users to automate and personalise
the concept of relevancy.
Conclusions
Throughout this report i tried to follow the same pattern i did, once I was assigned this research task.
In order, acquired a broad understanding of search engines, knowledged of developments and commercial
sides of the field and then sketched ideas or conclusions on what could or should be done.
Those of us who use the "open web" as a research tool want timely and authoritative answers, without
advertising or other kind of influence getting in the way of the best possible answer available.
Using the Web effectively without general purpose search engines would be difficult, time consuming,
and in many cases impossible.
Once aware of that, question is: can the needs of all the communities (search engines, SEO, advertisers
and search users) coexist?
It is in everybody’s interest to make this happen, and I am positive that it will, but knowledge and
continuing education for both information professionals and users is the key to continue to use general
Page 13 of 16
purpose web search tools as enjoyable and effective resources.
References
Papers & Articles

Rappoport, Avi “Site Search That Doesn't Stink” Internet World Conference December 11, 2001
A very clear presentation slides about search engines and web search issues.
http://www.searchtools.com/slides/iw2001/index.html

Notess, Greg R. “The Never-Ending Quest: Search Engine Relevance” ONLINE
vol.24 n.3 – May 2000
An article about relevancy.
http://www.infotoday.com/online/OLtocs/OLtocmay00.html

Watlingon, Amanda & Marckini, Fredrick “Branding Survey” - May 2002.
An interesting survey result demonstrating how psychologically powerful are search
engines’ listing results on users.
www.iprospect.com/branding-survey

Grossan, Bruce “Search Engines.What they Are, How They Work, and Practical
Suggestions for Getting the Most Out of Them” February 21, 1997
A very good paper to get started with all main issues regarding search environment.
http://webreference.com/content/search/index.html

Price,Gary “Web Search Engine FAQs: Questions, Answers, and Issues”
Searcher Vol. 9 No. 9 — October2001
This is a complete article about features and secrets and all a user need to know when
search engine are used.
http://www.infotoday.com/searcher/oct01/price.htm
Ensor, Pat “Toolkit for the expert web searcher” LITA
A useful collection of resources about the search engines field
http://www.lita.org/committe/toptech/toolkit.htm#engines


Fifield, Craig “Effective Search Engine Design” SearchDay - November 7, 2002 Number 394
Daily newsletter from SearchEngineWatch about search engines design issues of
Google, Yahoo and Lycos.
http://searchenginewatch.com/searchday/02/sd1107-seusers.html

SearchEngineWatch staff “The major search engines” October 12, 2002
An article about the top search engines and the ones to watch.
http://www.searchenginewatch.com/webmasters/intro.html
Page 14 of 16

Sullivan, Danny “Intro to search engine optimisation” October 14, 2002
A one screen article about SEO.
http://www.searchenginewatch.com/webmasters/intro.html

Sullivan, Danny “How Search Engines work” October 14, 2002
An introduction about how search engine work.
http://www.searchenginewatch.com/webmasters/work.html

Sullivan, Danny “How Search Engines rank web pages” October 14, 2002
An introduction about search engines’ automation of relevancy concept.
http://www.searchenginewatch.com/webmasters/rank.html

Price, Gary “Specialized Search Engine FAQs” Search - Vol. 10 No. 9 - October 2002
A very recent article on specialized resources offered by three major search engines:
Google, AllTheWeb, and AltaVista.
http://www.infotoday.com/searcher/oct02/price.htm

Turau,Volker “Internationalization, Accessibility, and Ranking of Web Pages”
Technischer Report 0598 – June 1998
A very interesting survey-based report about how could accessibility of web pages on
the WWW be improved through new web designer techniques.
http://www-1.informatik.fh-wiesbaden.de/~turau/reports/fortune.html

Bianchini M., Gori M., Scarselli F. “PageRank: A Circuital Analysis” 2002
A research paper about pro and cons of PageRank.
http://www2002.org/CDROM/poster/165.pdf

Jansen, B. J.& Pooch, U. “A Review of Web Searching Studies and a Framework for
Future Research” Journal of the American Society of Information Science – 2000
An extensive research paper on what has been done in the field of web search research.
http://jimjansen.tripod.com/academic/pubs/wus.pdf

Inman, Dave “Introduction to IR”
Presentation slides overview of the various topics of Information Retrieval.
http://www.scism.sbu.ac.uk/inmandw/ir/IRintro.ppt

Larry Page, Sergey Brin, R. Motwani, T. Winograd “The PageRank Citation Ranking:
Bringing Order to the Web” Stanford Digital Library Technologies Project – 1998
In this paper the PageRank is proposed and fully explained by their popular creators.
http://citeseer.nj.nec.com/cache/papers/cs/7144/http:zSzzSzwwwdb.stanford.eduzSz~backrubzSzpageranksub.pdf/page98pagerank.pdf

Lifantsev ,Maxim “Rank Computation Methods for Web Documents” -1999
In this paper the author give a review of the web ranking system available and propose
the ‘voting model’, another way to estimate the relevancy on the web.
http://citeseer.nj.nec.com/cache/papers/cs/14194/http:zSzzSzwww.ecsl.cs.sunysb.eduz
SztrzSzTR76.pdf/lifantsev99rank.pdf
Page 15 of 16
URLs

A brief but complete tutorial on all search engines main issues.
http://www.internet-handbooks.co.uk/izone/search/intro_search.htm

A web guide about search tools, reviews, surveys and interesting stuff about search
environments.
http://www.searchtools.com

Information page about PageRank, the Google’s ranking algorithm.
http://www.google.com/technology/index.html

Spam policy from Inktomi search company.
www.inktomi.com/products/web_search/guidelines.html
Books

Glossebrenner Alfred & Emily “Search Engines for the world wide web” 3rd ed.
Peachpit Press 2001.
This recently revised book gives a detailed description of the major search engines.
Page 16 of 16
Download