Searching The World Wide Web A Web search Engine is an application that enables users to locate Web pages based on search parameters that the user provides. Search engines are complex entities and are comprised of large databases that store information relating to Web pages. form. Information is located through the use of words and terms, which locate the information precisely. Some examples of directory search engines are shown in Table 1 below. Table 1: Directory Search Engines An American phone survey of Internet users in early 2004 found that 84% of online Americans had used search engines at some stage. The survey found that on any given day, more than half those using the Internet were using search engines with most Internet users indicating that they used Search Engines several times in a week. Search engine use is second only to email in terms of online activity. This large amount of use is mirrored among Australian users with Web searching being a very common application in this country. One of the reasons why it is so popular is that the engines are so powerful and they work so successfully in locating the information people are searching for on the Web. Engine Name URL Hotbot Lycos Excite Google AltaVista All the Web Ask Jeeves www.hotbot.com www.lycos.com www.excite.com www.google.com www.altavista.com www.alltheweb.com www.askjeeves.com Figure 2: Ask Jeeves A Directory Search Engine showing the results from a keyword search http://www.askjeeves,com Figure 1: A Web search engine showing the results from a keyword search. 1 Types of Search engine b. Topic Search Engines: The other form of search engine are organised around topics and their information is located by a process of refinement, moving from one topic to the next until the required information is sought. Some examples of topic search engines are shown in Table 2. Table 2: Topic Search Engines There are so many search engines on the Web, it is difficult to know which one to use. One way to choose might be to consider the form of information being sought. There are over 500 conventional search engines that can be used on the Web. Each has a particular characteristic or form that distinguishes it from others. Some have been designed for very narrow searching while others can be used for any kind of searching. There are 2 main forms of search engine, directories and lists. Engine Name Yahoo Open Directory About Galaxy Lii LookSmart URL www.yahoo.com dmoz.org www.about.com www.einet.net www.lii.org www.looksmart.com a. Directory Search engines. These are search engines that contain information organised as one might find in a large encyclopaedia, in a directory The World Wide Web 1 Info.com www.info.com Even though Meta search Engines use multiple sites for any one search, it doesn’t necessarily mean that better information is returned or that the best sites have necessarily been discovered. It is still up to the user to determine how successful a search has been. Figure 2: Topic Search Engine A Topic Search Engine showing the topics through which the search is conducted http://www.lii.org c. Children’s Search Engines: A number of search engines have specially designed features and capabilities to enable their successful use by young Web users. Even adults can find these engines useful in the way they assist the user. Some of these are shown in Table 3. Figure 4: A metasearch engine A metasearch engine enables the user to search many sites simultaneously eg. www.info.com Table 3: Children’s Search Engines Engine Name URL Kids Click! www.kidsclick.org Yahooligans www.yahooligans.com Ask Jeeves for Kids www.askjeeves.com Figure 3: Ask Jeeves for Kids A children’s search engine is designed for the younger Web user eg. www.ajkids.com d. Metasearch engines. A number of search engines perform what is called meta-searching. They enable a single search string to be used in a number of different search engines simultaneously. Wellknown search engines of this form are shown in Table 4. Table 4: Meta Search Engines Engine Name URL Metacrawler wwww.metacrawler.com Monstercrawler www.monstercrawler.com Dogpile www.dogpile.com Search Engines 2 2 Search Engine Features Search engines offer many different features. Every search engine seems to offer some feature or capability that distinguishes it from others and collectively, there are many different features which are contained within all of them. c. Information organisation. Some search engines provide visual and semantic displays to make the information they return more easily investigated. Mooter provides a semantic net which arranges the information into topics to facilitate its access. a. Filters. Hotbot allows the user to preset filters. Filters, as the name suggests are restrictions on the search which limit the scope of the searching. Typical filters include things like the language the pages are written in, the age of the page, etc. Once a filter has been set and saved, whenever the search engine is used, the filters are automatically used with whatever search is being undertaken. Figure 7: Information Organisation Mooter.com returns information organised into themes. 3 Choosing a Search Engine Figure 5: Filters Filters available in Hotbot to limit the scope of the search to particular pages and page types. One’s choice of Search Engine tends to depend on the familiarity one has with any particular engine and the success one has achieved in the past using a Search Engine. In 2005, it appeared from general information that the most commonly used Search Engine was Google. This Search Engine has the largest number of indexed Web pages and the largest number of users worldwide. b. Display Options. Most search engines provide a facility so that users can customise how the results will be returned. Users can usually select what information will be shown and how it will be displayed. Figure 8: Excite Excite is a typical search engine. It offers far more than simple keyword searching. It has many other features to attract users. http://www.excite.com Figure 6: Display Options Hotbot also provides a facility for the user to customise how results appear when they are returned. Search Engines Different people will have different needs when it comes to selecting a search engine. Some of the features that may influence a person’s choice in the search engine they need will include: • Search engine type eg. directory, index, metasearch engine; • Number of pages indexed; 3 • • • • • • • • Extra features eg. filters, customisations; Amount of advertising; Quality of relevance ranking; Search options eg. advanced searching features; Types of resources indexed eg. pages, images etc; Source of content eg. availability of local content; Currency of resources ie. Age of pages; Speed of feedback, how long it takes to access, search and retrieve the pages. 4 How Search Engines Work The databases that comprise search engines are typically created through a process of Web crawling. The Search Engines employ software applications called robots that crawl the Web seeking Web pages. The robots extract information from each Web page they encounter and enter this information into their databases. Once a page has been indexed in this fashion, it forms part of the set which can be accessed by users using the search engine. To gain some sense of how this all works, it is useful to consider a particular search engine. If we consider Google, the Google robot, called Googlebot is able to crawl the entire Web in about 6 weeks. This crawl adds new pages and enables Google to update information on existing pages and to remove pages that no longer exist. The Google database contains information on nearly 5 billion Web resources including over 4 billion Web pages. Web Searching When a user enters a search into the Google search engine, the request is processed in a number of stages as shown in Figure 1. The query goes to the Google Web Server where it is checked and sent to the Index servers. The Index servers do a lookup and send the query off to the place where the page information is stored, the Doc Servers. Information about the page retrieved from the Doc Servers is sent back to the user. Page Ranking When Google locates the various pages that match the search string, it needs to work out a ranking by which it will show the page. Ranking is very important in terms of the success of a search engine. Typically there will be many thousands of pages that match the search string completely. The best search engines show the most relevant pages first. The different engines have different ways of ranking the pages they locate. Google uses a process called PageRank Technology to decide which of the located pages will be the most relevant. In the process, it considers not only the content on the page but also how many pages point to this page and through this means has a way of calculating the most relevant page. In Google’s own words: PageRank Technology: PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms. Instead of counting direct links, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement. Hypertext-Matching Analysis: Google's search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Google also analyzes the content of neighbouring web pages to ensure the results returned are the most relevant to a user's query. http://www.google.com.au/corporate/tech.html Search Criteria – Metadata, keywords.. Web crawlers search through the web looking for selected criteria, such as metadata, keywords, taglines, page content, page links and so on. Metadata is the information that is contained in the head of a website page that assists the search technology to identify it. Figure 9: How Google locates pages during a Web search. http://www.google.com.au/corporate/tech.html Search Engines 4 You can view the HTML code for each page and look at the metadata from the browser. In the browser click VIEW and SOURCE and you will see the HTML for that page. The metatada is in the head section. search string number of hits and time taken <head> <META NAME="description" CONTENT="For astrology charts, horoscope reports and star signs."> <META NAME="keywords" CONTENT="astrology, horoscope compatibility, daily horoscopes, predictions, horoscope, signs, zodiac, soul mate, birthday horoscope, compatibility, face reading, vedic astrology, new age, Australian news, dreams, horoscope, love signs, zodiac sign, compatibility, astrological, money, love, business, career, daily, monthly, weekly, yearly, horoscopes, aries, taurus, gemini, cancer, leo, virgo, libra, scorpio, sagittarius, capricorn, aquarius, pisces, dadhichi, the sun, moon, zodiac, compatibility charts"> Web news on the topic links sorted by relevance Figure 10: A Google Web search Some strategies to improve your search returns:- based on this, and other information, the search engine will return the results:- • Astrology, Daily Horoscope & Zodiac Signs by Astrology.com.au For astrology charts, horoscope reports and star signs. www.astrology.com.au/ • Search returns information with the page name, page description (taken from metadata) and URL address. It has found this page based on the metadata keywords and by searching the first few lines of a page. Sometimes it might trawl the whole page. Therefore, it is vital when designing web pages for the net that very specific keywords are used to identify the page and give the page a higher rank in the search returns. • Often ranking is based on updates, number of links and so on, so it is important to ensure that the site is dynamic (updated regularly), has links from and to external sources that are not broken and so on. • Improving Search Strategy There are many options when doing a simple search using Google. Apart from choosing the search string or keywords, the user can choose to search for images, within Australia and from a raft of possible search options. When the command to search has been given, Google takes very little time to locate the matching pages and to display them in its preferred order of relevance. Search Engines • • • • • • Determine likely organisation that will provide the information sought (eg) if it is education then check the education databases and online journals Guess organisation's URL before trying search engines (type in www. to the address bar followed by most likely extension such as .com, .edu.) Search for organisation URL otherwise in search engines Use "phrase searching" and unique words as much as possible Use Boolean searching Use truncation techniques Use directories (Google, Yahoo!, Ask Jeeves) for broad, general topics Use multiple step approach – can be several clicks to an answer Use advanced search facilities wherever possible, most search engines provide simple and advanced search Use specialised search engines for news, education, research and so on. Boolean Searching Search engines support different searching techniques, most support Boolean searching. This is a list of additional words that may be used in the search string to make the search return more precise. The Internet is a vast computer database. As such, its contents must be searched according to the rules of computer database searching. Much database searching is based on the principles of Boolean logic. Boolean logic refers to the logical relationship among search terms, and is named for the British-born Irish mathematician George Boole. 5 Boolean searching limits the search returns by applying relationships by using three logical logical operators: • OR • AND • NOT Figure 12: A Boolean search demonstrating how AND works with 3 keywords Suppose you were looking for information on poverty and crime and how it relates to gender. If you typed poverty crime rates gender for example, your returns would show all pages showing the keywords in isolation, so you would have a lot of pages that show poverty, pages showing crime, pages showing gender and so on. Hopefully at the top of your search you may be lucky to have pages that show 2 or more of your keywords, but how can you search more specifically? Use of the Boolean – AND Type into the search window – poverty AND crime Table 6: Search Return on specific Keywords Search terms poverty crime poverty AND crime Poverty AND crime AND gender Results 783,447 2,962,165 1,677 76 Use of the Boolean – NOT Used to limit search where logical relationships may exist, but we are only interested in 1 part of that relationship. Figure 11: A Boolean search demonstrating how AND works with 2 keywords Figure 13: A Boolean search demonstrating how NOT with 2 keywords Table 5: Search Return on specific Keywords for example, quite often if you type dogs into the keyword search a lot of pages would return with cats and dogs, as most common domestic pets, so to limit just to dogs use the NOT operator, so the keywords become dogs NOT cats. Search terms poverty crime poverty AND crime Results 783,447 2,962,165 1,677 AND will only return documents that contain both keywords, not pages with only 1 so limits the search and provides a much more relevant list of pages. By now adding gender to our Boolean search – poverty AND crime AND gender we narrow down our search even more and the pages now being returned should be far more relevant to what we are looking for. Obviously we can continue this process by adding more search criteria such as year, by country and so on and each time our search return will be more specific and relevant. Table 6 shows that the search has now narrowed down our returns to 76 from 1,677. (table 5) Use of the Boolean – OR Used to allow either one keyword or another, which is what would normally occur, but when used with other Boolean operators becomes very powerful (eg) cats OR dogs AND domestic, cats OR dogs NOT wild etc. Phrase Searching Phrase searching is used to search for words as phrases. That is, the words must be side by side and in the order given (Difference: keyword searching words do not have to be next to each other). For example if you searched for the phrase "distance learning," Phrase searching – “distance learning” Keyword search - distance and learning By using quotes around the two words the database searches for the exact phrase. Without the quotes the database can search for the two words in any order and may put an AND or OR between the two words. Search Engines 6 Truncation Truncation will expand your search and can save time. Truncation basically means using a "root word" or partial word (beginning letters words have in common) along with a "wildcard character." For example, instead of doing the search: (educator or educators or educational) type, educat* and the database will retrieve any article with a word starting with the letters educat. Such as educate, education, educator, educators… 5 How to Reference Web Material It is important to remember that any information you access from any source, including the Web, cannot be used in its original form in any work you create and/or submit unless you make it perfectly clear from where you have sourced it. If you cut and paste even a sentence from someone else’s work and put it in your work without acknowledging its source, you are deemed to have plagiarised. in your assignments, n.d.) and entered into the reference list as: How to use the Web in your assignments. (n.d.) Retrieved February 7, 2004 from http://www.ecu.edu.au/lds/docs/webreferences .htm Including Images in Own Work Frequently, a person will want to use images and diagrams sourced from the Web in his or her assignment work. Once again, it is important to show the source of the object. Usually when such an object is used, it is sufficient to show the URL of the object in the document where it is used. Referencing Web Resources If you want to include some work you have found on the Internet (or anywhere else) in an assignment or piece of work you can cut and paste the work into your work but you must clearly identify the quotation or piece of work and reference the Web page where you found it. For example, when writing about someone else’s ideas and they are paraphrased, the referencing follows the format below: Figure 11: The formation structure of a hurricane http://www.nhoem.state.nh.us/mitigation/section_iii.htm a. identify the source from which the ideas were obtained in the body of the work … Jones (2003) describes plagiarism as a very serious offence for students…. b. cite the publication in the references section. A typical Web reference looks like this: Jones, A. (2003). How to use the Web in your assignments. Retrieved February 7, 2004 from http://www.ecu.edu.au/lds/docs/webreferences.h tm Formatting a Web Reference There is a standard format that must be used when referencing a Web source. Name of author, (year document was written). Title of document (in italics). Retrieved (date source was accessed) from (URL of the document). Sometimes documents have no authors but these must still be referenced. To reference when there is no apparent author and even no apparent date of publication is still quite easy. For example, if the document referenced above had no such details it would be referenced in text as (How to use the Web Search Engines 7 AltaVista AltaVista provides the most comprehensive search experience on the Web! http://www.altavista.com/ Google Enables users to search the Web, Usenet, and images. http://www.google.com/ Dogpile Web Search Home Page Parallel searcher that queries a customizable list of search engines and the Open Directory, then displays results from each search source. http://www.dogpile.com/ WiseNut Index of 1.5 billion pages. Search results are clustered into categories. http://www.wisenut.com/ AlltheWeb.com Search with a simple interface and huge database. MetaCrawler Web Search Home Page Searches the leading engines in one click and returns only the best results from those search engines. http://www.metacrawler.com/ Mamma Metasearch \Mamma.com collects only the top results from the best search engines on the Internet. http://www.mamma.com/ Teoma - Search with Authority Teoma, delivers three types of search responses. Results: Relevant web pages. http://www.teoma.com/ ProFusion Select from a list of search engines or let ProFusion choose the fastest. http://www.profusion.com/ Ixquick Metasearch Ixquick submits your search to the major search engines and finds sites that are universally ranked in the top ten! http://www.ixquick.com/ Beaucoup! 2,000+ Search Engines A directory listing thousands of engines, directories and indices, http://www.beaucoup.com/ WebCrawler Web Search Home Page Returns the best results from these leading engines: http://www.webcrawler.com/ Excite Portal offering a search service including search of a directory from the ODP, news, and links. http://www.excite.com/ All Search Engines.Com Lists all major search engines and hundreds of other search engines by category. http://www.allsearchengines.com/ Welcome to Lycos! Portal with search powered by Fast, channels, and a directory. http://www.lycos.com/ Search.com - Metasearch Search Engine Search.com searches the best Search Engines http://www.search.com/ Search Engine Colossus Search Engines Directory of hundreds of search engines, organised by country and topic. http://www.searchenginecolossus.com/ Homepage HotBot Web Search A powerful conventional search engine http://hotbot.lycos.com/ Australia and New Zealand Web Enquiry Research System Search engine powered by Yahoo and Google. http://www.anzwers.com.au/ Web Wombat Search engine featuring topic searches. Free webbased email and hourly weather http://www.webwombat.com.au/ Ask Jeeves - Ask.com Find it faster with Smart Search. Introducing Map Search. http://www.askjeeves.com/ AllSearchEngines.co.uk !! Index of Internet Search Engines and Web Metasearch with a choice of English UK or worldwide search engines. http://www.allsearchengines.co.uk/ Internet Sleuth Web Search Yellow Pages Find People The Internet Sleuth...find things faster using several different search engines in one meta-search. http://www.isleuth.com/ Search Engine Watch: Tips About Internet Search Engines & Search Guide to search engine registration and ranking issues, providing current news and analysis. http://www.searchenginewatch.com/ http://library.albany.edu/internet/boolean.html Boolean searching – how it works Find UK :: UK search plus free SMS and Email Find UK information and search all the UK search engines from one site... http://www.find-uk.com/ Netscape.co.uk Web portal including directory, search engine, news, web tools, and other resources. http://www.netscape.co.uk/ Oneupweb Parallel search tool using AltaVista, Thunderstone, Wisenut, Yahoo, Lycos, Looksmart and Fast/All the Web. http://www.1blink.com/ The Amazing Picture Machine Web sites designed for students ... Kid's Image Search Tools: http://www.kidsclick.org/psearch.html. Classroom Clipart: http://classroomclipart.com/ http://www.ncrtec.org/picture.htm Department of Information & Communications: Internet Search Tools Collection of search tools from the Department of Information and Communications at the Manchester Metropolitan University. http://www.mmu.ac.uk/h-ss/dic/main/search.htm OneSeek.com No per-click charges! Lock in your keywords at this special introductory rate now. Search Keyword(s) http://www.oneseek.com/ SearchEngines.com Finding Credible Info. Search Engines 101. Optimal Design. Keywords: Titles, Meta tags and more. http://www.searchengines.com/ 8 Test Your Knowledge 1. Describe how search engines work to index the pages on the WWW. 2. Describe the process a search engine undertakes when it completes a search. 3. What makes one search engine better than another? 4. What aspects need to be considered when comparing search engines against each other? 5. What is meant by the term page ranking? Describe how the Google page rank works? 6. Describe the various types of search engine. 7. List some of the important features a search engine needs to display. 8. Is it permissible to copy material found on the Web into university assignments? Explain your answer. 9. Describe some of the options that search engines provide to enable searches to be made more specific than simply keywords alone. 10. Describe the correct procedure for referencing material sourced from the Web. Give an example to illustrate your answer. Search Engines 9