Topic 1: Web Searching A Web search Engine is an application that enables users to locate Web pages based on search parameters that the user provides. Search engines are complex entities and are comprised of large databases that store information relating to Web pages. form. Information is located through the use of words and terms which locate the information precisely. Some examples of directory search engines include are shown in Table 2.1.05 Table 2.1: Directory Search Engines An American phone survey of Internet users in early 2004 found that 84% of online Americans had used search engines at some stage. The survey found that on any given day, more than half those using the Internet were using search engines with most Internet users indicating that they used Search Engines several times in a week. Search engine use is second only to email in terms of online activity. This large amount of use is mirrored among Australian users with Web searching being a very common application in this country. One of the reasons why it is so popular is that the engines are so powerful and they work so successfully in locating the information people are searching for on the Web. Engine Name URL Hotbot Lycos Excite Google AltaVista All the Web Ask Jeeves www.hotbot.com www.lycos.com www.excite.com www.google.com www.altavista.com www.alltheweb.com www.askjeeves.com Figure 2.2: Ask Jeeves A Directory Search Engine showing the results from a keyword search http://www.askjeeves,com Figure 2.1: A Web search engine showing the results from a keyword search. 2.1 Types of Search engine b. Topic Search Engines: The other form of search engine are organised around topics and their information is located by a process of refinement, moving from one topic to the next until the required information is sought. Some examples of topic search engines are shown in Table 2.2. Table 2.2: Topic Search Engines There are so many search engines on the Web, it is difficult to know which one to use. One way to choose might be to consider the form of information being sought. There are over 500 conventional search engines that can be used on the Web. Each has a particular characteristic or form that distinguishes it from others. Some have been designed for very narrow searching while others can be used for any kind of searching. There are 2 main forms of search engine, directories and lists. Engine Name Yahoo Open Directory About Galaxy Lii LookSmart URL www.yahoo.com dmoz.org www.about.com www.einet.net www.lii.org www.looksmart.com a. Directory Search engines. These are search engines that contain information organised as one might find in a large encyclopedia, in a directory Topic 2: Search Engines 1 Even though Meta search Engines use multiple sites for any one search, it doesn’t necessarily mean that better information is returned or that the best sites have necessarily been discovered. It is till up to the user to determine how successful a search has been. Figure 2.2: Topic Search Engine A Topic Search Engine showing the topics through which the search is conducted http://www.lii.org c. Children’s Search Engines: A number of search engines have specially designed features and capabilities to enable their successful use by young Web users. Even adults can find these engines useful in the way they assist the user. Some of these are shown in Table 2.3. Figure 2.4: A metasearch engine A metasearch engine enables the user to search many sites simultaneously eg. www.info.com Table 2.3: Children’s Search Engines Engine Name URL Kids Click! www.kidsclick.org Yahooligans www.yahooligans.com Ask Jeeves for Kids www.askjeeves.com Figure 2.3: Ask Jeeves for Kids A children’s search engine is designed for the younger Web user eg. www.ajkids.com d. Metasearch engines. A number of search engines perform what is called meta-searching. They enable a single search string to be used in a number of different search engines simultaneously. Well known search engines of this form are shown in Table 2.4. Table 2.4: Meta Search Engines Engine Name URL Metacrawler wwww.metacrawler.com Monstercrawler www.monstercrawler.com Dogpile www.dogpile.com Info.com www.info.com Topic 2: Search Engines 2 2.2 Search Engine Features Search engines offer many different features. Every search engine seems to offer some feature or capability that distinguishes it from others and collectively, there are many different features which are contained within all of them c. Information organisation. Some search engines provide visual and semantic displays to make the information they return more easily investigated. Mooter provides a semantic net which arranges the information into topics to facilitate its access. a. Filters. Hotbot allows the user to preset filters. Filters, as the name suggests are restrictions on the search which limit the scope of the searching. Typical filters include things like the language the pages are written in, the age of the page, etc. Once a filter has been set and saved, whenever the search engine is used, the filters are automatically used with whatever search is being undertaken. Figure 2.7: Information Organisation Mooter.com returns information organised into themes. 2.3 Choosing a Search Engine Figure 2.5: Filters Filters available in Hotbot to limit the scope of the search to particular pages and page types. One’s choice of Search Engine tends to depend on the familiarity one has with any particular engine and the success one has achieved in the past using a Search Engine. In 2005, it appeared from general information that the most commonly used Search Engine was Google. This Search Engine has the largest number of indexed Web pages and the largest number of users worldwide. b. Display Options. Most search engines provide a facility so that users can customise how the results will be returned. Users can usually select what information will be shown and how it will be displayed. Figure 2.8: Excite Excite is a typical search engine. It offers far more than simple keyword searching. It has many other features to attract users. http://www.excite.com Figure 2.6: Display Options Hotbot also provides a facility for the user to customise how reseaults appear when they are returned. Topic 2: Search Engines Different people will have different needs when it comes to selecting a search engine. Some of the features that may influence a person’s choice in the search engine they need will include: • Search engine type eg. directory, index, metasearch engine; • Number of pages indexed; 3 • • • • • • • • Extra features eg. filters, customisations; Amount of advertising; Quality of relevance ranking; Search options eg. advanced searching features; Types of resources indexed eg. pages, images etc; Source of content eg. availability of local content; Currency of resources ie. Age of pages; Speed of feedback, how long it takes to access, search and retrieve the pages. 2.4 How Search Engines. Work The databases that comprise search engines are typically created through a process of Web crawling. The Search Engines employ software applications called robots, that crawl the Web seeking Web pages. The robots extract information from each Web page they encounter and enter this information into their databases. Once a page has been indexed in this fashion, it forms part of the set which can be accessed by users using the search engine. To gain some sense of how this all works, it is useful to consider a particular search engine. If we consider Google, the Google robot, called Googlebot is able to crawl the entire Web in about 6 weeks. This crawl adds new pages and enables Google to update information on existing pages and to remove pages that no longer exist. The Google database contains information on nearly 5 billion Web resources including over 4 billion Web pages. Web Searching When a user enters a search into the Google search engine, the request is processed in a number of stages as shown in Figure 2.1. The query goes to the Google. Web Server where it is checked and sent to the Index servers. The Index servers do a lookup and send the query off to the place where the page information is stored , the Doc Servers. Information about the page retrieved from the Doc Servers is sent back to the user. Page Ranking When Google locates the various pages that match the search string, it needs to work out a ranking by which it will show the page. Ranking is very important in terms of the success of a search engine. Typically there will be many thousands of pages that match the search string completely. The best search engines show the most relevant pages first. The different engines have different ways of ranking the pages they locate. Google uses a process called PageRank Technology to decide which of the located pages will be the most relevant. In the process, it considers not only the content on the page but how many pages point to this page and through this means has a way of calculating the most relevant page. In Google’s own words: PageRank Technology: PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms. Instead of counting direct links, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement. Hypertext-Matching Analysis: Google's search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Google also analyzes the content of neighboring web pages to ensure the results returned are the most relevant to a user's query. http://www.google.com.au/corporate/tech.html There are many options when doing a simple search using Google. Apart from choosing the search string or keywords, the user can choose to search for images, within Australia and from a raft of possible search options. When the command to search has been given, Google takes very little time to locate the matching pages and to display them in its preferred order of relevance. Figure 2.9: How Google locates pages during a Web search. http://www.google.com.au/corporate/tech.html Topic 2: Search Engines 4 search string number of hits and time taken Web news on the topic links sorted by relevance Figure 2.10: A Google Web search 2.5 How to Reference Web Material It is important to remember that any information you access from any source, including the Web, cannot be used in its original form in any work you create and/or submit unless you make it perfectly clear from where you have sourced it. If you cut and paste even a sentence from someone else’s work and put it in your work without acknowledging its source, you are deemed to have plagiarised. Name of author, (year document was written). Title of document (in italics). Retrieved (date source was accessed) from (URL of the document). Sometimes documents have no authors but these must still be referenced. To reference when there is no apparent author and even no apparent date of publication is still quite easy. For example, if the document referenced above had no such details it would be referenced in text as (How to use the Web in your assignments, n.d.) and entered into the reference list as: How to use the Web in your assignments. (n.d.) Retrieved February 7, 2004 from http://www.ecu.edu.au/lds/docs/webreferences .htm Including Images in Own Work Frequently, a person will want to use images and diagrams sourced from the Web in his or her assignment work. Once again, it is important to show the source of the object. Usually when such an object is used, it is sufficient to show the URL of the object in the document where it is used. Referencing Web Resources If you want to include some work you have found on the Internet (or anywhere else) in an assignment or piece of work you can cut and paste the work into your work but you must clearly identify the quotation or piece of work and reference the Web page where you found it. For example, when writing about someone else’s ideas and they are paraphrased, the referencing follows the format below: Figure 2.11: The formation structure of a hurricane http://www.nhoem.state.nh.us/mitigation/section_iii.htm a. identify the source from which the ideas were obtained in the body of the work … Jones (2003) describes plagiarism as a very serious offence for students…. b. cite the publication in the references section. A typical Web reference looks like this: Jones, A. (2003). How to use the Web in your assignments. Retrieved February 7, 2004 from http://www.ecu.edu.au/lds/docs/webreferences.h tm Formatting a Web Reference There is a standard format that must be used when referencing a Web source. Topic 2: Search Engines 5 AltaVista AltaVista provides the most comprehensive search experience on the Web! http://www.altavista.com/ Google Enables users to search the Web, Usenet, and images. http://www.google.com/ Dogpile Web Search Home Page Parallel searcher that queries a customizable list of search engines and the Open Directory, then displays results from each search source. http://www.dogpile.com/ WiseNut Index of 1.5 billion pages. Search results are clustered into categories. http://www.wisenut.com/ AlltheWeb.com Search with a simple interface and huge database. MetaCrawler Web Search Home Page Searches the leading engines in one click and returns only the best results from those search engines. http://www.metacrawler.com/ Mamma Metasearch \Mamma.com collects only the top results from the best search engines on the Internet. http://www.mamma.com/ Teoma - Search with Authority Teoma, delivers three types of search responses. Results: Relevant web pages. http://www.teoma.com/ ProFusion Select from a list of search engines or let ProFusion choose the fastest. http://www.profusion.com/ Ixquick Metasearch Ixquick submits your search to the major search engines and finds sites that are universally ranked in the top ten! http://www.ixquick.com/ Beaucoup! 2,000+ Search Engines A directory listing thousands of engines, directories and indices, http://www.beaucoup.com/ WebCrawler Web Search Home Page Returns the best results from these leading engines: http://www.webcrawler.com/ Excite Portal offering a search service including search of a directory from the ODP, news, and links. http://www.excite.com/ All Search Engines.Com Lists all major search engines and hundreds of other search engines by category. http://www.allsearchengines.com/ Welcome to Lycos! Portal with search powered by Fast, channels, and a directory. http://www.lycos.com/ Search.com - Metasearch Search Engine Search.com searches the best Search Engines http://www.search.com/ Search Engine Colossus Topic 2: Search Engines Directory of hundreds of search engines, organised by country and topic. http://www.searchenginecolossus.com/ Homepage HotBot Web Search A powerful conventional search engine http://hotbot.lycos.com/ Australia and New Zealand Web Enquiry Research System Search engine powered by Yahoo and Google. http://www.anzwers.com.au/ Web Wombat Search engine featuring topic searches. Free webbased email and hourly weather http://www.webwombat.com.au/ Ask Jeeves - Ask.com Find it faster with Smart Search. Introducing Map Search. http://www.askjeeves.com/ AllSearchEngines.co.uk !! Index of Internet Search Engines and Web Metasearch with a choice of English UK or worldwide search engines. http://www.allsearchengines.co.uk/ Internet Sleuth Web Search Yellow Pages Find People The Internet Sleuth...find things faster using several different search engines in one meta-search. http://www.isleuth.com/ Search Engine Watch: Tips About Internet Search Engines & Search Guide to search engine registration and ranking issues, providing current news and analysis. http://www.searchenginewatch.com/ Find UK :: UK search plus free SMS and Email Find UK information and search all the UK search engines from one site... http://www.find-uk.com/ Netscape.co.uk Web portal including directory, search engine, news, web tools, and other resources. http://www.netscape.co.uk/ Oneupweb Parallel search tool using AltaVista, Thunderstone, Wisenut, Yahoo, Lycos, Looksmart and Fast/All the Web. http://www.1blink.com/ The Amazing Picture Machine Web sites designed for students ... Kid's Image Search Tools: http://www.kidsclick.org/psearch.html. Classroom Clipart: http://classroomclipart.com/ http://www.ncrtec.org/picture.htm Department of Information & Communications: Internet Search Tools Collection of search tools from the Department of Information and Communications at the Manchester Metropolitan University. http://www.mmu.ac.uk/h-ss/dic/main/search.htm OneSeek.com No per-click charges! Lock in your keywords at this special introductory rate now. Search Keyword(s) http://www.oneseek.com/ SearchEngines.com Finding Credible Info. Search Engines 101. Optimal Design. Keywords: Titles, Meta tags and more. http://www.searchengines.com/ 6 Test Your Knowledge of Topic 2 1. Describe how search engines work to index the pages on the WWW. 2. Describe the process a search engine undertakes when it completes a search. 3. What makes one search engine better than another? 4. What aspects need to be considered when comparing search engines against each other? 5. What is meant by the term page ranking? Describe how the Google page rank works? 6. Describe the various types of search engine. 7. List some of the important features a search engine needs to display. 8. Is it permissible to copy material found on the Web into university assignments? Explain your answer. 9. Describe some of the options that search engines provide to enable searches to be made more specific than simply keywords alone. 10. Describe the correct procedure for referencing material sourced from the Web. Give an example to illustrate your answer. Topic 2: Search Engines 7