Topic 1: Web Searching

advertisement
Topic 1: Web Searching
A Web search Engine is an application that enables
users to locate Web pages based on search
parameters that the user provides. Search engines are
complex entities and are comprised of large databases
that store information relating to Web pages.
form. Information is located through the use of
words and terms which locate the information
precisely. Some examples of directory search
engines include are shown in Table 2.1.05
Table 2.1: Directory Search Engines
An American phone survey of Internet users in early
2004 found that 84% of online Americans had used
search engines at some stage. The survey found that
on any given day, more than half those using the
Internet were using search engines with most Internet
users indicating that they used Search Engines
several times in a week. Search engine use is second
only to email in terms of online activity.
This large amount of use is mirrored among
Australian users with Web searching being a very
common application in this country. One of the
reasons why it is so popular is that the engines are so
powerful and they work so successfully in locating
the information people are searching for on the Web.
Engine Name
URL
Hotbot
Lycos
Excite
Google
AltaVista
All the Web
Ask Jeeves
www.hotbot.com
www.lycos.com
www.excite.com
www.google.com
www.altavista.com
www.alltheweb.com
www.askjeeves.com
Figure 2.2: Ask Jeeves
A Directory Search Engine showing the results from a
keyword search http://www.askjeeves,com
Figure 2.1: A Web search engine showing the results from a
keyword search.
2.1 Types of Search engine
b. Topic Search Engines: The other form of
search engine are organised around topics and their
information is located by a process of refinement,
moving from one topic to the next until the required
information is sought. Some examples of topic
search engines are shown in Table 2.2.
Table 2.2: Topic Search Engines
There are so many search engines on the Web, it is
difficult to know which one to use. One way to
choose might be to consider the form of information
being sought. There are over 500 conventional
search engines that can be used on the Web. Each
has a particular characteristic or form that
distinguishes it from others. Some have been
designed for very narrow searching while others can
be used for any kind of searching. There are 2 main
forms of search engine, directories and lists.
Engine Name
Yahoo
Open Directory
About
Galaxy
Lii
LookSmart
URL
www.yahoo.com
dmoz.org
www.about.com
www.einet.net
www.lii.org
www.looksmart.com
a. Directory Search engines. These are search
engines that contain information organised as one
might find in a large encyclopedia, in a directory
Topic 2: Search Engines
1
Even though Meta search Engines use multiple
sites for any one search, it doesn’t necessarily mean
that better information is returned or that the best
sites have necessarily been discovered. It is till up
to the user to determine how successful a search
has been.
Figure 2.2: Topic Search Engine
A Topic Search Engine showing the topics through which
the search is conducted http://www.lii.org
c. Children’s Search Engines: A number of search
engines have specially designed features and
capabilities to enable their successful use by young
Web users. Even adults can find these engines useful
in the way they assist the user. Some of these are
shown in Table 2.3.
Figure 2.4: A metasearch engine
A metasearch engine enables the user to search many sites
simultaneously eg. www.info.com
Table 2.3: Children’s Search Engines
Engine Name
URL
Kids Click!
www.kidsclick.org
Yahooligans
www.yahooligans.com
Ask Jeeves for Kids
www.askjeeves.com
Figure 2.3: Ask Jeeves for Kids
A children’s search engine is designed for the younger Web
user eg. www.ajkids.com
d. Metasearch engines. A number of search engines
perform what is called meta-searching. They enable
a single search string to be used in a number of
different search engines simultaneously. Well known
search engines of this form are shown in Table 2.4.
Table 2.4: Meta Search Engines
Engine Name
URL
Metacrawler
wwww.metacrawler.com
Monstercrawler
www.monstercrawler.com
Dogpile
www.dogpile.com
Info.com
www.info.com
Topic 2: Search Engines
2
2.2 Search Engine Features
Search engines offer many different features. Every
search engine seems to offer some feature or
capability that distinguishes it from others and
collectively, there are many different features which
are contained within all of them
c. Information organisation. Some search engines
provide visual and semantic displays to make the
information they return more easily investigated.
Mooter provides a semantic net which arranges the
information into topics to facilitate its access.
a. Filters. Hotbot allows the user to preset filters.
Filters, as the name suggests are restrictions on the
search which limit the scope of the searching.
Typical filters include things like the language the
pages are written in, the age of the page, etc. Once a
filter has been set and saved, whenever the search
engine is used, the filters are automatically used with
whatever search is being undertaken.
Figure 2.7: Information Organisation
Mooter.com returns information organised into themes.
2.3 Choosing a Search Engine
Figure 2.5: Filters
Filters available in Hotbot to limit the scope of the search
to particular pages and page types.
One’s choice of Search Engine tends to depend
on the familiarity one has with any particular
engine and the success one has achieved in the
past using a Search Engine. In 2005, it appeared
from general information that the most
commonly used Search Engine was Google.
This Search Engine has the largest number of
indexed Web pages and the largest number of
users worldwide.
b. Display Options. Most search engines provide a
facility so that users can customise how the results
will be returned. Users can usually select what
information will be shown and how it will be
displayed.
Figure 2.8: Excite
Excite is a typical search engine. It offers far more than
simple keyword searching. It has many other features to
attract users. http://www.excite.com
Figure 2.6: Display Options
Hotbot also provides a facility for the user to customise
how reseaults appear when they are returned.
Topic 2: Search Engines
Different people will have different needs when it
comes to selecting a search engine. Some of the
features that may influence a person’s choice in the
search engine they need will include:
• Search engine type eg. directory, index,
metasearch engine;
• Number of pages indexed;
3
•
•
•
•
•
•
•
•
Extra features eg. filters, customisations;
Amount of advertising;
Quality of relevance ranking;
Search options eg. advanced searching features;
Types of resources indexed eg. pages, images etc;
Source of content eg. availability of local content;
Currency of resources ie. Age of pages;
Speed of feedback, how long it takes to access,
search and retrieve the pages.
2.4 How Search Engines. Work
The databases that comprise search engines are
typically created through a process of Web crawling.
The Search Engines employ software applications
called robots, that crawl the Web seeking Web pages.
The robots extract information from each Web page
they encounter and enter this information into their
databases. Once a page has been indexed in this
fashion, it forms part of the set which can be accessed
by users using the search engine.
To gain some sense of how this all works, it is useful
to consider a particular search engine. If we consider
Google, the Google robot, called Googlebot is able
to crawl the entire Web in about 6 weeks. This crawl
adds new pages and enables Google to update
information on existing pages and to remove pages
that no longer exist. The Google database contains
information on nearly 5 billion Web resources
including over 4 billion Web pages.
Web Searching
When a user enters a search into the Google search
engine, the request is processed in a number of stages
as shown in Figure 2.1. The query goes to the
Google. Web Server where it is checked and sent to
the Index servers. The Index servers do a lookup and
send the query off to the place where the page
information is stored , the Doc Servers. Information
about the page retrieved from the Doc Servers is sent
back to the user.
Page Ranking
When Google locates the various pages that match
the search string, it needs to work out a ranking by
which it will show the page. Ranking is very
important in terms of the success of a search
engine. Typically there will be many thousands of
pages that match the search string completely. The
best search engines show the most relevant pages
first. The different engines have different ways of
ranking the pages they locate.
Google uses a process called PageRank Technology
to decide which of the located pages will be the
most relevant. In the process, it considers not only
the content on the page but how many pages point
to this page and through this means has a way of
calculating the most relevant page. In Google’s
own words:
PageRank Technology: PageRank performs
an objective measurement of the importance of
web pages by solving an equation of more than
500 million variables and 2 billion terms. Instead
of counting direct links, PageRank interprets a
link from Page A to Page B as a vote for Page B
by Page A. PageRank then assesses a page's
importance by the number of votes it receives.
PageRank also considers the importance of
each page that casts a vote, as votes from some
pages are considered to have greater value,
thus giving the linked page greater value.
Important pages receive a higher PageRank and
appear at the top of the search results. Google's
technology uses the collective intelligence of the
web to determine a page's importance. There is
no human involvement or manipulation of
results, which is why users have come to trust
Google as a source of objective information
untainted by paid placement.
Hypertext-Matching Analysis: Google's search
engine also analyzes page content. However,
instead of simply scanning for page-based text
(which can be manipulated by site publishers
through meta-tags), Google's technology
analyzes the full content of a page and factors in
fonts, subdivisions and the precise location of
each word. Google also analyzes the content of
neighboring web pages to ensure the results
returned are the most relevant to a user's query.
http://www.google.com.au/corporate/tech.html
There are many options when doing a simple search
using Google. Apart from choosing the search
string or keywords, the user can choose to search
for images, within Australia and from a raft of
possible search options. When the command to
search has been given, Google takes very little time
to locate the matching pages and to display them in
its preferred order of relevance.
Figure 2.9: How Google locates pages during a Web
search. http://www.google.com.au/corporate/tech.html
Topic 2: Search Engines
4
search string
number of hits
and time taken
Web news on
the topic
links sorted by
relevance
Figure 2.10: A Google Web search
2.5 How to Reference Web Material
It is important to remember that any information you
access from any source, including the Web, cannot be
used in its original form in any work you create
and/or submit unless you make it perfectly clear from
where you have sourced it. If you cut and paste even
a sentence from someone else’s work and put it in
your work without acknowledging its source, you are
deemed to have plagiarised.
Name of author, (year document was written). Title
of document (in italics). Retrieved (date source was
accessed) from (URL of the document).
Sometimes documents have no authors but these
must still be referenced. To reference when there is
no apparent author and even no apparent date of
publication is still quite easy. For example, if the
document referenced above had no such details it
would be referenced in text as (How to use the
Web in your assignments, n.d.) and entered into the
reference list as:
How to use the Web in your assignments.
(n.d.) Retrieved February 7, 2004 from
http://www.ecu.edu.au/lds/docs/webreferences
.htm
Including Images in Own Work
Frequently, a person will want to use images and
diagrams sourced from the Web in his or her
assignment work. Once again, it is important to
show the source of the object. Usually when such
an object is used, it is sufficient to show the URL of
the object in the document where it is used.
Referencing Web Resources
If you want to include some work you have found on
the Internet (or anywhere else) in an assignment or
piece of work you can cut and paste the work into
your work but you must clearly identify the quotation
or piece of work and reference the Web page where
you found it. For example, when writing about
someone else’s ideas and they are paraphrased, the
referencing follows the format below:
Figure 2.11: The formation structure of a hurricane
http://www.nhoem.state.nh.us/mitigation/section_iii.htm
a. identify the source from which the ideas were
obtained in the body of the work
… Jones (2003) describes plagiarism as a very
serious offence for students….
b. cite the publication in the references section. A
typical Web reference looks like this:
Jones, A. (2003). How to use the Web in your
assignments. Retrieved February 7, 2004 from
http://www.ecu.edu.au/lds/docs/webreferences.h
tm
Formatting a Web Reference
There is a standard format that must be used when
referencing a Web source.
Topic 2: Search Engines
5
AltaVista
AltaVista provides the most comprehensive search
experience on the Web!
http://www.altavista.com/
Google
Enables users to search the Web, Usenet, and
images.
http://www.google.com/
Dogpile Web Search Home Page
Parallel searcher that queries a customizable list of
search engines and the Open Directory, then
displays results from each search source.
http://www.dogpile.com/
WiseNut
Index of 1.5 billion pages. Search results are
clustered into categories.
http://www.wisenut.com/
AlltheWeb.com
Search with a simple interface and huge database.
MetaCrawler Web Search Home Page
Searches the leading engines in one click and
returns only the best results from those search
engines.
http://www.metacrawler.com/
Mamma Metasearch
\Mamma.com collects only the top results from the
best search engines on the Internet.
http://www.mamma.com/
Teoma - Search with Authority
Teoma, delivers three types of search responses.
Results: Relevant web pages.
http://www.teoma.com/
ProFusion
Select from a list of search engines or let ProFusion
choose the fastest.
http://www.profusion.com/
Ixquick Metasearch
Ixquick submits your search to the major search
engines and finds sites that are universally ranked in
the top ten!
http://www.ixquick.com/
Beaucoup! 2,000+ Search Engines
A directory listing thousands of engines, directories
and indices,
http://www.beaucoup.com/
WebCrawler Web Search Home Page
Returns the best results from these leading engines:
http://www.webcrawler.com/
Excite
Portal offering a search service including search of a
directory from the ODP, news, and links.
http://www.excite.com/
All Search Engines.Com
Lists all major search engines and hundreds of other
search engines by category.
http://www.allsearchengines.com/
Welcome to Lycos!
Portal with search powered by Fast, channels, and a
directory.
http://www.lycos.com/
Search.com - Metasearch Search Engine
Search.com searches the best Search Engines
http://www.search.com/
Search Engine Colossus
Topic 2: Search Engines
Directory of hundreds of search engines, organised
by country and topic.
http://www.searchenginecolossus.com/
Homepage HotBot Web Search
A powerful conventional search engine
http://hotbot.lycos.com/
Australia and New Zealand Web Enquiry Research
System
Search engine powered by Yahoo and Google.
http://www.anzwers.com.au/
Web Wombat
Search engine featuring topic searches. Free webbased email and hourly weather
http://www.webwombat.com.au/
Ask Jeeves - Ask.com
Find it faster with Smart Search. Introducing Map
Search.
http://www.askjeeves.com/
AllSearchEngines.co.uk !! Index of Internet Search
Engines and Web
Metasearch with a choice of English UK or
worldwide search engines.
http://www.allsearchengines.co.uk/
Internet Sleuth Web Search Yellow Pages Find
People
The Internet Sleuth...find things faster using several
different search engines in one meta-search.
http://www.isleuth.com/
Search Engine Watch: Tips About Internet Search
Engines & Search
Guide to search engine registration and ranking
issues, providing current news and analysis.
http://www.searchenginewatch.com/
Find UK :: UK search plus free SMS and Email
Find UK information and search all the UK search
engines from one site...
http://www.find-uk.com/
Netscape.co.uk
Web portal including directory, search engine, news,
web tools, and other resources.
http://www.netscape.co.uk/
Oneupweb
Parallel search tool using AltaVista, Thunderstone,
Wisenut, Yahoo, Lycos, Looksmart and Fast/All the
Web.
http://www.1blink.com/
The Amazing Picture Machine
Web sites designed for students ... Kid's Image
Search Tools: http://www.kidsclick.org/psearch.html.
Classroom Clipart: http://classroomclipart.com/
http://www.ncrtec.org/picture.htm
Department of Information & Communications:
Internet Search Tools
Collection of search tools from the Department of
Information and Communications at the Manchester
Metropolitan University.
http://www.mmu.ac.uk/h-ss/dic/main/search.htm
OneSeek.com
No per-click charges! Lock in your keywords at this
special introductory rate now. Search Keyword(s)
http://www.oneseek.com/
SearchEngines.com
Finding Credible Info. Search Engines 101. Optimal
Design. Keywords: Titles, Meta tags and more.
http://www.searchengines.com/
6
Test Your Knowledge of Topic 2
1. Describe how search engines work to index the
pages on the WWW.
2. Describe the process a search engine undertakes
when it completes a search.
3. What makes one search engine better than
another?
4. What aspects need to be considered when
comparing search engines against each other?
5. What is meant by the term page ranking?
Describe how the Google page rank works?
6. Describe the various types of search engine.
7. List some of the important features a search
engine needs to display.
8. Is it permissible to copy material found on the
Web into university assignments? Explain your
answer.
9. Describe some of the options that search engines
provide to enable searches to be made more
specific than simply keywords alone.
10. Describe the correct procedure for referencing
material sourced from the Web. Give an
example to illustrate your answer.
Topic 2: Search Engines
7
Download