Searching The World Wide Web

advertisement
Searching The World Wide Web
A Web search Engine is an application that enables
users to locate Web pages based on search
parameters that the user provides. Search engines are
complex entities and are comprised of large databases
that store information relating to Web pages.
form. Information is located through the use of
words and terms, which locate the information
precisely. Some examples of directory search
engines are shown in Table 1 below.
Table 1: Directory Search Engines
An American phone survey of Internet users in early
2004 found that 84% of online Americans had used
search engines at some stage. The survey found that
on any given day, more than half those using the
Internet were using search engines with most Internet
users indicating that they used Search Engines
several times in a week. Search engine use is second
only to email in terms of online activity.
This large amount of use is mirrored among
Australian users with Web searching being a very
common application in this country. One of the
reasons why it is so popular is that the engines are so
powerful and they work so successfully in locating
the information people are searching for on the Web.
Engine Name
URL
Hotbot
Lycos
Excite
Google
AltaVista
All the Web
Ask Jeeves
www.hotbot.com
www.lycos.com
www.excite.com
www.google.com
www.altavista.com
www.alltheweb.com
www.askjeeves.com
Figure 2: Ask Jeeves
A Directory Search Engine showing the results from a
keyword search http://www.askjeeves,com
Figure 1: A Web search engine showing the results from a
keyword search.
1 Types of Search engine
b. Topic Search Engines: The other form of
search engine are organised around topics and their
information is located by a process of refinement,
moving from one topic to the next until the required
information is sought. Some examples of topic
search engines are shown in Table 2.
Table 2: Topic Search Engines
There are so many search engines on the Web, it is
difficult to know which one to use. One way to
choose might be to consider the form of information
being sought. There are over 500 conventional
search engines that can be used on the Web. Each
has a particular characteristic or form that
distinguishes it from others. Some have been
designed for very narrow searching while others can
be used for any kind of searching. There are 2 main
forms of search engine, directories and lists.
Engine Name
Yahoo
Open Directory
About
Galaxy
Lii
LookSmart
URL
www.yahoo.com
dmoz.org
www.about.com
www.einet.net
www.lii.org
www.looksmart.com
a. Directory Search engines. These are search
engines that contain information organised as one
might find in a large encyclopaedia, in a directory
The World Wide Web
1
Info.com
www.info.com
Even though Meta search Engines use multiple
sites for any one search, it doesn’t necessarily mean
that better information is returned or that the best
sites have necessarily been discovered. It is still up
to the user to determine how successful a search
has been.
Figure 2: Topic Search Engine
A Topic Search Engine showing the topics through which
the search is conducted http://www.lii.org
c. Children’s Search Engines: A number of search
engines have specially designed features and
capabilities to enable their successful use by young
Web users. Even adults can find these engines useful
in the way they assist the user. Some of these are
shown in Table 3.
Figure 4: A metasearch engine
A metasearch engine enables the user to search many sites
simultaneously eg. www.info.com
Table 3: Children’s Search Engines
Engine Name
URL
Kids Click!
www.kidsclick.org
Yahooligans
www.yahooligans.com
Ask Jeeves for Kids
www.askjeeves.com
Figure 3: Ask Jeeves for Kids
A children’s search engine is designed for the younger Web
user eg. www.ajkids.com
d. Metasearch engines. A number of search engines
perform what is called meta-searching. They enable
a single search string to be used in a number of
different search engines simultaneously. Wellknown search engines of this form are shown in
Table 4.
Table 4: Meta Search Engines
Engine Name
URL
Metacrawler
wwww.metacrawler.com
Monstercrawler
www.monstercrawler.com
Dogpile
www.dogpile.com
Search Engines
2
2 Search Engine Features
Search engines offer many different features. Every
search engine seems to offer some feature or
capability that distinguishes it from others and
collectively, there are many different features which
are contained within all of them.
c. Information organisation. Some search engines
provide visual and semantic displays to make the
information they return more easily investigated.
Mooter provides a semantic net which arranges the
information into topics to facilitate its access.
a. Filters. Hotbot allows the user to preset filters.
Filters, as the name suggests are restrictions on the
search which limit the scope of the searching.
Typical filters include things like the language the
pages are written in, the age of the page, etc. Once a
filter has been set and saved, whenever the search
engine is used, the filters are automatically used with
whatever search is being undertaken.
Figure 7: Information Organisation
Mooter.com returns information organised into themes.
3 Choosing a Search Engine
Figure 5: Filters
Filters available in Hotbot to limit the scope of the search
to particular pages and page types.
One’s choice of Search Engine tends to depend
on the familiarity one has with any particular
engine and the success one has achieved in the
past using a Search Engine. In 2005, it appeared
from general information that the most
commonly used Search Engine was Google.
This Search Engine has the largest number of
indexed Web pages and the largest number of
users worldwide.
b. Display Options. Most search engines provide a
facility so that users can customise how the results
will be returned. Users can usually select what
information will be shown and how it will be
displayed.
Figure 8: Excite
Excite is a typical search engine. It offers far more than
simple keyword searching. It has many other features to
attract users. http://www.excite.com
Figure 6: Display Options
Hotbot also provides a facility for the user to customise
how results appear when they are returned.
Search Engines
Different people will have different needs when it
comes to selecting a search engine. Some of the
features that may influence a person’s choice in the
search engine they need will include:
• Search engine type eg. directory, index,
metasearch engine;
• Number of pages indexed;
3
•
•
•
•
•
•
•
•
Extra features eg. filters, customisations;
Amount of advertising;
Quality of relevance ranking;
Search options eg. advanced searching features;
Types of resources indexed eg. pages, images etc;
Source of content eg. availability of local content;
Currency of resources ie. Age of pages;
Speed of feedback, how long it takes to access,
search and retrieve the pages.
4 How Search Engines Work
The databases that comprise search engines are
typically created through a process of Web crawling.
The Search Engines employ software applications
called robots that crawl the Web seeking Web pages.
The robots extract information from each Web page
they encounter and enter this information into their
databases. Once a page has been indexed in this
fashion, it forms part of the set which can be accessed
by users using the search engine.
To gain some sense of how this all works, it is useful
to consider a particular search engine. If we consider
Google, the Google robot, called Googlebot is able to
crawl the entire Web in about 6 weeks. This crawl
adds new pages and enables Google to update
information on existing pages and to remove pages
that no longer exist. The Google database contains
information on nearly 5 billion Web resources
including over 4 billion Web pages.
Web Searching
When a user enters a search into the Google search
engine, the request is processed in a number of stages
as shown in Figure 1. The query goes to the Google
Web Server where it is checked and sent to the Index
servers. The Index servers do a lookup and send the
query off to the place where the page information is
stored, the Doc Servers. Information about the page
retrieved from the Doc Servers is sent back to the
user.
Page Ranking
When Google locates the various pages that match
the search string, it needs to work out a ranking by
which it will show the page. Ranking is very
important in terms of the success of a search
engine. Typically there will be many thousands of
pages that match the search string completely. The
best search engines show the most relevant pages
first. The different engines have different ways of
ranking the pages they locate.
Google uses a process called PageRank Technology
to decide which of the located pages will be the
most relevant. In the process, it considers not only
the content on the page but also how many pages
point to this page and through this means has a way
of calculating the most relevant page. In Google’s
own words:
PageRank Technology: PageRank performs
an objective measurement of the importance of
web pages by solving an equation of more than
500 million variables and 2 billion terms. Instead
of counting direct links, PageRank interprets a
link from Page A to Page B as a vote for Page B
by Page A. PageRank then assesses a page's
importance by the number of votes it receives.
PageRank also considers the importance of
each page that casts a vote, as votes from some
pages are considered to have greater value,
thus giving the linked page greater value.
Important pages receive a higher PageRank and
appear at the top of the search results. Google's
technology uses the collective intelligence of the
web to determine a page's importance. There is
no human involvement or manipulation of
results, which is why users have come to trust
Google as a source of objective information
untainted by paid placement.
Hypertext-Matching Analysis: Google's search
engine also analyzes page content. However,
instead of simply scanning for page-based text
(which can be manipulated by site publishers
through meta-tags), Google's technology
analyzes the full content of a page and factors in
fonts, subdivisions and the precise location of
each word. Google also analyzes the content of
neighbouring web pages to ensure the results
returned are the most relevant to a user's query.
http://www.google.com.au/corporate/tech.html
Search Criteria – Metadata, keywords..
Web crawlers search through the web looking for
selected criteria, such as metadata, keywords,
taglines, page content, page links and so on.
Metadata is the information that is contained in the
head of a website page that assists the search
technology to identify it.
Figure 9: How Google locates pages during a Web
search. http://www.google.com.au/corporate/tech.html
Search Engines
4
You can view the HTML code for each page and
look at the metadata from the browser. In the
browser click VIEW and SOURCE and you will see
the HTML for that page. The metatada is in the head
section.
search string
number of hits
and time taken
<head>
<META NAME="description" CONTENT="For
astrology charts, horoscope reports and star
signs.">
<META NAME="keywords"
CONTENT="astrology, horoscope compatibility,
daily horoscopes, predictions, horoscope, signs,
zodiac, soul mate, birthday horoscope, compatibility,
face reading, vedic astrology, new age, Australian
news, dreams, horoscope, love signs, zodiac sign,
compatibility, astrological, money, love, business,
career, daily, monthly, weekly, yearly, horoscopes,
aries, taurus, gemini, cancer, leo, virgo, libra, scorpio,
sagittarius, capricorn, aquarius, pisces, dadhichi, the
sun, moon, zodiac, compatibility charts">
Web news on
the topic
links sorted by
relevance
Figure 10: A Google Web search
Some strategies to improve your search returns:-
based on this, and other information, the search
engine will return the results:-
•
Astrology, Daily Horoscope & Zodiac Signs by
Astrology.com.au
For astrology charts, horoscope reports and star
signs.
www.astrology.com.au/
•
Search returns information with the page name, page
description (taken from metadata) and URL address.
It has found this page based on the metadata
keywords and by searching the first few lines of a
page. Sometimes it might trawl the whole page.
Therefore, it is vital when designing web pages for
the net that very specific keywords are used to
identify the page and give the page a higher rank in
the search returns.
•
Often ranking is based on updates, number of links
and so on, so it is important to ensure that the site is
dynamic (updated regularly), has links from and to
external sources that are not broken and so on.
•
Improving Search Strategy
There are many options when doing a simple search
using Google. Apart from choosing the search string
or keywords, the user can choose to search for
images, within Australia and from a raft of possible
search options. When the command to search has
been given, Google takes very little time to locate the
matching pages and to display them in its preferred
order of relevance.
Search Engines
•
•
•
•
•
•
Determine likely organisation that will
provide the information sought (eg) if it is
education then check the education databases
and online journals
Guess organisation's URL before trying search
engines (type in www. to the address bar
followed by most likely extension such as
.com, .edu.)
Search for organisation URL otherwise in
search engines
Use "phrase searching" and unique words as
much as possible
Use Boolean searching
Use truncation techniques
Use directories (Google, Yahoo!, Ask Jeeves)
for broad, general topics
Use multiple step approach – can be several
clicks to an answer
Use advanced search facilities wherever
possible, most search engines provide simple
and advanced search
Use specialised search engines for news,
education, research and so on.
Boolean Searching
Search engines support different searching
techniques, most support Boolean searching. This
is a list of additional words that may be used in the
search string to make the search return more
precise.
The Internet is a vast computer database. As such,
its contents must be searched according to the rules
of computer database searching. Much database
searching is based on the principles of Boolean
logic. Boolean logic refers to the logical
relationship among search terms, and is named for
the British-born Irish mathematician George Boole.
5
Boolean searching limits the search returns by
applying relationships by using three logical logical
operators:
• OR
• AND
• NOT
Figure 12: A Boolean search demonstrating how AND
works with 3 keywords
Suppose you were looking for information on poverty
and crime and how it relates to gender. If you typed
poverty crime rates gender for example, your returns
would show all pages showing the keywords in
isolation, so you would have a lot of pages that show
poverty, pages showing crime, pages showing gender
and so on. Hopefully at the top of your search you
may be lucky to have pages that show 2 or more of
your keywords, but how can you search more
specifically?
Use of the Boolean – AND
Type into the search window – poverty AND crime
Table 6: Search Return on specific Keywords
Search terms
poverty
crime
poverty AND crime
Poverty AND crime
AND gender
Results
783,447
2,962,165
1,677
76
Use of the Boolean – NOT
Used to limit search where logical relationships
may exist, but we are only interested in 1 part of
that relationship.
Figure 11: A Boolean search demonstrating how AND
works with 2 keywords
Figure 13: A Boolean search demonstrating how NOT
with 2 keywords
Table 5: Search Return on specific Keywords
for example, quite often if you type dogs into the
keyword search a lot of pages would return with
cats and dogs, as most common domestic pets, so to
limit just to dogs use the NOT operator, so the
keywords become dogs NOT cats.
Search terms
poverty
crime
poverty AND crime
Results
783,447
2,962,165
1,677
AND will only return documents that contain both
keywords, not pages with only 1 so limits the search
and provides a much more relevant list of pages.
By now adding gender to our Boolean search –
poverty AND crime AND gender we narrow down
our search even more and the pages now being
returned should be far more relevant to what we are
looking for. Obviously we can continue this process
by adding more search criteria such as year, by
country and so on and each time our search return
will be more specific and relevant. Table 6 shows
that the search has now narrowed down our returns to
76 from 1,677. (table 5)
Use of the Boolean – OR
Used to allow either one keyword or another, which
is what would normally occur, but when used with
other Boolean operators becomes very powerful
(eg) cats OR dogs AND domestic, cats OR dogs
NOT wild etc.
Phrase Searching
Phrase searching is used to search for words as
phrases. That is, the words must be side by side and
in the order given (Difference: keyword searching
words do not have to be next to each other).
For example if you searched for the phrase
"distance learning,"
Phrase searching – “distance learning”
Keyword search - distance and learning
By using quotes around the two words the database
searches for the exact phrase. Without the quotes
the database can search for the two words in any
order and may put an AND or OR between the two
words.
Search Engines
6
Truncation
Truncation will expand your search and can save
time. Truncation basically means using a "root word"
or partial word (beginning letters words have in
common) along with a "wildcard character."
For example, instead of doing the search: (educator
or educators or educational)
type, educat* and the database will retrieve any
article with a word starting with the letters educat.
Such as educate, education, educator, educators…
5 How to Reference Web Material
It is important to remember that any information you
access from any source, including the Web, cannot be
used in its original form in any work you create
and/or submit unless you make it perfectly clear from
where you have sourced it. If you cut and paste even
a sentence from someone else’s work and put it in
your work without acknowledging its source, you are
deemed to have plagiarised.
in your assignments, n.d.) and entered into the
reference list as:
How to use the Web in your assignments.
(n.d.) Retrieved February 7, 2004 from
http://www.ecu.edu.au/lds/docs/webreferences
.htm
Including Images in Own Work
Frequently, a person will want to use images and
diagrams sourced from the Web in his or her
assignment work. Once again, it is important to
show the source of the object. Usually when such
an object is used, it is sufficient to show the URL of
the object in the document where it is used.
Referencing Web Resources
If you want to include some work you have found on
the Internet (or anywhere else) in an assignment or
piece of work you can cut and paste the work into
your work but you must clearly identify the quotation
or piece of work and reference the Web page where
you found it. For example, when writing about
someone else’s ideas and they are paraphrased, the
referencing follows the format below:
Figure 11: The formation structure of a hurricane
http://www.nhoem.state.nh.us/mitigation/section_iii.htm
a. identify the source from which the ideas were
obtained in the body of the work
… Jones (2003) describes plagiarism as a very
serious offence for students….
b. cite the publication in the references section. A
typical Web reference looks like this:
Jones, A. (2003). How to use the Web in your
assignments. Retrieved February 7, 2004 from
http://www.ecu.edu.au/lds/docs/webreferences.h
tm
Formatting a Web Reference
There is a standard format that must be used when
referencing a Web source.
Name of author, (year document was written). Title
of document (in italics). Retrieved (date source was
accessed) from (URL of the document).
Sometimes documents have no authors but these
must still be referenced. To reference when there is
no apparent author and even no apparent date of
publication is still quite easy. For example, if the
document referenced above had no such details it
would be referenced in text as (How to use the Web
Search Engines
7
AltaVista
AltaVista provides the most comprehensive search
experience on the Web!
http://www.altavista.com/
Google
Enables users to search the Web, Usenet, and
images.
http://www.google.com/
Dogpile Web Search Home Page
Parallel searcher that queries a customizable list of
search engines and the Open Directory, then
displays results from each search source.
http://www.dogpile.com/
WiseNut
Index of 1.5 billion pages. Search results are
clustered into categories.
http://www.wisenut.com/
AlltheWeb.com
Search with a simple interface and huge database.
MetaCrawler Web Search Home Page
Searches the leading engines in one click and
returns only the best results from those search
engines.
http://www.metacrawler.com/
Mamma Metasearch
\Mamma.com collects only the top results from the
best search engines on the Internet.
http://www.mamma.com/
Teoma - Search with Authority
Teoma, delivers three types of search responses.
Results: Relevant web pages.
http://www.teoma.com/
ProFusion
Select from a list of search engines or let ProFusion
choose the fastest.
http://www.profusion.com/
Ixquick Metasearch
Ixquick submits your search to the major search
engines and finds sites that are universally ranked in
the top ten!
http://www.ixquick.com/
Beaucoup! 2,000+ Search Engines
A directory listing thousands of engines, directories
and indices,
http://www.beaucoup.com/
WebCrawler Web Search Home Page
Returns the best results from these leading engines:
http://www.webcrawler.com/
Excite
Portal offering a search service including search of a
directory from the ODP, news, and links.
http://www.excite.com/
All Search Engines.Com
Lists all major search engines and hundreds of other
search engines by category.
http://www.allsearchengines.com/
Welcome to Lycos!
Portal with search powered by Fast, channels, and a
directory.
http://www.lycos.com/
Search.com - Metasearch Search Engine
Search.com searches the best Search Engines
http://www.search.com/
Search Engine Colossus
Search Engines
Directory of hundreds of search engines, organised
by country and topic.
http://www.searchenginecolossus.com/
Homepage HotBot Web Search
A powerful conventional search engine
http://hotbot.lycos.com/
Australia and New Zealand Web Enquiry Research
System
Search engine powered by Yahoo and Google.
http://www.anzwers.com.au/
Web Wombat
Search engine featuring topic searches. Free webbased email and hourly weather
http://www.webwombat.com.au/
Ask Jeeves - Ask.com
Find it faster with Smart Search. Introducing Map
Search.
http://www.askjeeves.com/
AllSearchEngines.co.uk !! Index of Internet Search
Engines and Web
Metasearch with a choice of English UK or
worldwide search engines.
http://www.allsearchengines.co.uk/
Internet Sleuth Web Search Yellow Pages Find
People
The Internet Sleuth...find things faster using several
different search engines in one meta-search.
http://www.isleuth.com/
Search Engine Watch: Tips About Internet Search
Engines & Search
Guide to search engine registration and ranking
issues, providing current news and analysis.
http://www.searchenginewatch.com/
http://library.albany.edu/internet/boolean.html
Boolean searching – how it works
Find UK :: UK search plus free SMS and Email
Find UK information and search all the UK search
engines from one site...
http://www.find-uk.com/
Netscape.co.uk
Web portal including directory, search engine, news,
web tools, and other resources.
http://www.netscape.co.uk/
Oneupweb
Parallel search tool using AltaVista, Thunderstone,
Wisenut, Yahoo, Lycos, Looksmart and Fast/All the
Web.
http://www.1blink.com/
The Amazing Picture Machine
Web sites designed for students ... Kid's Image
Search Tools: http://www.kidsclick.org/psearch.html.
Classroom Clipart: http://classroomclipart.com/
http://www.ncrtec.org/picture.htm
Department of Information & Communications:
Internet Search Tools
Collection of search tools from the Department of
Information and Communications at the Manchester
Metropolitan University.
http://www.mmu.ac.uk/h-ss/dic/main/search.htm
OneSeek.com
No per-click charges! Lock in your keywords at this
special introductory rate now. Search Keyword(s)
http://www.oneseek.com/
SearchEngines.com
Finding Credible Info. Search Engines 101. Optimal
Design. Keywords: Titles, Meta tags and more.
http://www.searchengines.com/
8
Test Your Knowledge
1. Describe how search engines work to index the
pages on the WWW.
2. Describe the process a search engine undertakes
when it completes a search.
3. What makes one search engine better than
another?
4. What aspects need to be considered when
comparing search engines against each other?
5. What is meant by the term page ranking?
Describe how the Google page rank works?
6. Describe the various types of search engine.
7. List some of the important features a search
engine needs to display.
8. Is it permissible to copy material found on the
Web into university assignments? Explain your
answer.
9. Describe some of the options that search engines
provide to enable searches to be made more
specific than simply keywords alone.
10. Describe the correct procedure for referencing
material sourced from the Web. Give an
example to illustrate your answer.
Search Engines
9
Download