Module 3: Internet Browsers, Search Engines, and the Web Overview In the previous module, we discussed communicating over the Internet with e-mail, chat, newsgroups, and weblogs. In this module, we continue our discussion of Internet services, particularly those that allow us to find and view information through the Internet. We will then discuss some important advice for critically evaluating those resources. Report broken links or any other problems on this page. Copyright © by University of Maryland University College . Module 3: Internet Browsers, Search Engines, and the Web Objectives After completing this module, you should be able to: describe what search engines are and how they work evaluate various search engines demonstrate proficiency in using the Internet for research and understanding how to evaluate those resources demonstrate the correct methodology for citing Internet resources using the APA style guide Report broken links or any other problems on this page. Copyright © by University of Maryland University College . Module 3: Internet Browsers, Search Engines, and the Web Commentary Topics 1. 2. 3. 4. The World Wide Web Finding Information on the Web Evaluating Resources from the Web Citing Internet Resources The World Wide Web We should first emphasize that although the Internet and the World Wide Web are related and work together, they are different entities. The Internet is the vast network of networks that connects the world's computers through various protocols. The Internet had always been a text-based environment. The World Wide Web is the graphical interface of the Internet. Through the Internet we can: use e-mail, which relies on SMTP connect to newsgroups, which use NNTP or RSS transfer files, which use FTP Module 3: Internet Browsers, Search Engines, and the Web The World Wide Web is just one of the ways in which information can be disseminated over the Internet. It uses HTTP, one of the protocols of the Internet. The hypertext transfer protocol (HTTP) describes how Web pages are transferred between computers on the World Wide Web. The World Wide Web, which we will refer to simply as the Web, uses Web browsers to allow us to access and view Web pages that are connected through hyperlinks. Web documents can contain text, graphics, sounds, and even video. Many Web browser programs have been created through the years; Tim Berners-Lee, who developed the HTTP protocol, wrote the first Web browser on a NeXT computer in 1990 and named it WorldWideWeb. Most people, or at least those few who had Internet access in the early 1990s, likely didn't experience a Web browser until 1993, when Mark Andreessen and Erik Bina released Mosaic, the first browser with cross-platform support. Mosaic introduced support for sound, small video clips, forms, bookmarks, and history files and quickly became the most popular noncommercial Web browser. Marc Andreessen left the Mosaic group to form his own company. That company was Netscape. The Netscape browser, introduced in December of 1994, became the most commonly used browser worldwide. Very soon after Mosaic and Netscape, other browsers were introduced: Arena, Lynx, Cello, Opera, Internet in a Box, Navipress (which became AOLPress), Mozilla (originally a Netscape creation), and then Internet Explorer (with the introduction of Microsoft Windows 95) (Stewart, 2004). Many browser applications were created throughout the 1990s to assist with proprietary operating systems and applications. Below is a list of some of these Web browsers: Historical Web Browsers (Stewart, 2004) Active Worlds NetAttache Air_Mosaic NETCOMplete Amiga NetCruiser EI*Net NetManage Chameleon EmailSiphon NetPositive Enhanced NCSA Mosaic PlanetWeb GetRight Quarterdeck WebC HotJava SPRY_Mosaic IBM WebExplorer Spyglass Enhanced Mosaic internetMCI TueV Mosaic for X IWENG WWWC MacWeb (To experience seven of the older browser interfaces, go to the Déjà Vu Web site and try the browser emulator.) In 2002, the Netscape Corporation released an open source browser called Mozilla. The newest version of the Mozilla browser was named Firefox in 2004. Module 3: Internet Browsers, Search Engines, and the Web Today, Internet Explorer and Netscape are considered the most popular browsers, but Mozilla Firefox is beginning to increase its market presence. According to 2005 monthly browser usage statistics from W3Schools, Firefox has moved into second place behind Internet Explorer. Figure 3.1 Average Browser Usage (January–October 2005) (Statistics from W3Schools, 2005) Internet Explorer 6 is still the dominating browser, which is understandable because Windows is currently the most popular operating system and IE comes bundled with it. In addition to using IE 6, according to W3Schools browser statistics, most users set their displays to 800 x 600 pixels or more with a color depth of at least 65K colors. This is important to note when building Web pages because your design and interactive Web page capabilities may appear differently when a user has different screen settings or is using a different browser than what you used to design your pages. We will discuss this more in Module 4: Creating Web Pages. Finding Information on the Web Many tools are available to search the Internet; some are old technologies and others are new. Finding the information you are looking for can, at various times, either be very easy or seem nearly impossible. One reason for this is that the Web is so vast. Another reason is that, unlike the Library of Congress or your local library, the Web is not indexed in any standard methodology. In addition, when you search the Web, you are usually guessing what words will be in the document or file. Keep in mind that currently there is no way for anyone to search the entire Web. However, there are tools and technologies available to help you to search large portions of it. Two of the oldest search tools are Gopher and WAIS. Gopher was developed at the University of Minnesota and named after the school's mascot; it was pre-World Wide Web. With Gopher, you found files by looking through a sequence of menus until you found something that appealed to you. Gopher allowed you to find and access a file on the Internet no matter where it resided. WAIS (Wide Area Information Servers), invented by Brewster Kahle, did essentially the same thing, but it did the searching for you. Today, the Web has essentially replaced Gopher and WAIS as a means of searching the Internet, using tools such as search engines, subject directories, meta-search engines, and specialized search tools. UMUC's Library Services link contains a discussion of the different types of search tools and basic Internetsearching techniques. Because UMUC makes this information available to you, please use the link to learn more about Internet search tools. We will discuss three tools used today to search the Web: search engines, Web directories, and the invisible Web. Search Engines and Web Directories Module 3: Internet Browsers, Search Engines, and the Web On the Web, there are hundreds of millions of pages of information available in a wide variety of formats and about many thousands of topics. However, finding the information you are after and, more important, determining if what you find is valid and accurate can be a challenge. To help you find information on the Web, you can use search engines. Initially, we had only two types of search engines: crawler-based search engines such as Google and humanindexed directories such as Open Directory. Each crawler-based search engine may provide different information results depending on the sites they have indexed, but they all follow the same basic process: They search the Web by using words as the search criteria. They store an index of the words they find and then match the words with the location (URL) where the words were found. They allow you to look for words or word combinations stored in the index. In other words, to find information on Web sites, these crawler-based search engines send out spiders (also called a crawler or a bot). Essentially, a spider goes to the Web site, gets a list of words on the site, brings this list (along with the site's URL) back to the search engine, and the search engine then indexes the words in a database. These standard search engines find results by calculating mathematical relevancy of the term, by the frequency and placement of the term in documents, and by the occurrence of the term in the descriptions of the Web pages. Standard search engines may therefore be difficult to use for broad topics. All of this implies that when you use a search engine, you are not searching the Web in real time; you are actually searching the search engine's database. Human-indexed directories, also known as Web-based directories, do not use spiders to find their Web site resources. A human-indexed directory depends on the webmaster to submit his or her site with a brief description. The directory editor reviews and edits that description and decides its relevance in the directory database. We then search the directory by keyword or word combinations to find our results. Many people argue that these human-indexed directories may provide fewer unscrupulous results because the sites are actually reviewed before being included in the directory. However, changes to the site after its listing would not be updated in the directory description unless a new submission was made. Web-based directories use categories and subcategories that humans create. The humans have reviewed the material and assigned subject headings. Web-based directories are good for broad searches and for comparing and contrasting Web pages on similar subjects. A special form of human-indexed directory is called a Rated Subject Guide. Rated Subject Guides are reviewed and rated on content and the pages are checked for authenticity. One of these is Teoma, which ranks not only general popularity, but also sites based on the number of same-subject pages that reference it, to determine a site's level of authority. Many search engines today use a combination of the crawler-based and human-indexed formats. For example, according to SearchEngineWatch.com, MSN's search feature is "more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries" (Sullivan, 2002). We all have a favorite search engine or directory we frequently use when searching online. As of November 2005, Google.com dominates the search engine market. Below is a chart from Search Engine Watch showing the most-used search engines as of November 2005. Figure 3.2 Most-Used Internet Search Engines as of November 2005 (Sullivan, 2005) Module 3: Internet Browsers, Search Engines, and the Web We may have a favorite search engine that we use for all of our searches, but it may not be providing the extent of complete resources we believe we are getting. Each search engine may find different results for the same search topic. As we have mentioned before, there is no way to search the entire Web at one time. Some search engines search only the meta-tags posted at the top of the Web site's HTML coding, others search the text contents, and still others search only the page titles. Although many argue that Web directories are more relevant, they list only sites that their human catalogers choose to include. To get more results, we may want to use a meta-search engine. Meta-Search Engines Metacrawlers or meta-search engines allow searches of multiple engines simultaneously. Duplicate "hits" are then consolidated before the search results are displayed. Meta-search engines do not maintain their own databases, but instead access other search engines and directories already available. Some common metasearch engines are WebMetaSearch, Dogpile, Metacrawler, Search.com, and Mamma.com. It should be noted, however, that meta-search engines do not bring back all of the pages from each of the individual search engines they search; the top 10 to 100 hits from each may be listed. Although this is sufficient for most searches, you must consult individual search engines if you want to go beyond these top hits. So, should we use a meta-search engine instead of an individual search engine? This really depends on what you are seeking. For a specific, obscure search term, start with a meta-search engine because it will search many sites at the same time. On the other hand, if you are reasonably confident that your favorite search engine will return the results you are looking for, starting with that search engine may be faster. Meta-search engines use the least common denominator, and this makes them inferior for complex searches. For current information on search engines and related directories, go to the Search Engine Watch. It provides useful information about search engines and how they work. Boolean Logic and the Boolean Operators Most search engines use some form of Boolean logic. Boolean logic uses the three Boolean operators AND, OR, and NOT. AND narrows the number of items found. For example, searching for the key words women AND computers will find all articles that mention both women and computers. The operator AND is the default for search engines when multiple terms are entered. The Boolean operator OR broadens a search. Searching for women OR computers will find all articles that mention women and all articles that mention computers and all articles that mention both women and computers. Module 3: Internet Browsers, Search Engines, and the Web The Boolean operator NOT narrows searches. The search phrase women NOT computers will find all articles that mention women but not women and computers. The search phrase computers not women will find all articles that mention computers but not computers and women. To see an excellent explanation of the Boolean operators and how search engines work, go to Search Engines by Debbie Flanagan of Learn the Web Skills. Advanced Search Features Whether you use a search engine or a Web-based directory, advanced search options are available to you. Going to the advanced options page allows you to set parameters to your search. Examples of parameters you can set include: limit limit limit limit limit by by by by by date media type language domain (.gov or .com or .net) pages (must include types of pages, or media on pages) Some search engines have tabs on the main page that allow you to easily look for particular media types. For example, in AltaVista, you can choose the Images tag to find only images, the MP3/Audio tag to see only links to sound files, or the Video tab to see links to movies and videos. Using the Internet for Research We cannot deny that the Internet is extremely useful when doing research. Internet usage around the world has increased dramatically, not only in numbers but also in frequency of use. According to the table below, 70 million people in the United States log on to the Internet every day! This is nearly 20 million more than just a few years ago in 2000. Growth of Daily Activities on the Internet (United States) (Pew, 2005) Activity 2000 2004 (in millions) (in millions) Go online 52 70 Use e-mail 45 58 Get news 19 35 Check the weather 14 25 Do research for a job 14 24 Research a product before buying it 12 19 9 24 10 15 Do research for school or training 9 14 Get travel information 6 10 Get health or medical information 6 7 Look for religious and spiritual information 3 6 Buy a product 3 4 Participate in an online auction 3 4 Look for political news or information Send an instant message Module 3: Internet Browsers, Search Engines, and the Web The Invisible Web Finding the academic resources you need may not be as easy as doing a search on your favorite search engine or Web directory. This is where the "Invisible Web" comes in. The "Invisible Web" is made up of thousands of searchable databases that are not readily accessible from search engines or subject directories. Many of these specialized searchable databases are accessible through specific Web page search boxes. Databases are considered specialized subject guides, and some may require a fee or membership before you can search. For example, Librarians Index, AcademicInfo, and Infomine are particularly relevant databases for academic research, but their specific results will not appear if the same search term is used on a regular search engine. Other important databases include MdUSA, the University System of Maryland's database of peer-reviewed resources, and The EServer, now based at Iowa State University. The Lexis/Nexis database is the premiere business and legal database and is an invaluable resource for business and law students. Although it is quite expensive to use this database, UMUC students can gain access to it via the UMUC library database's link to Lexis/Nexis Academic. Some search engines may find these specialized resources if we remember to add the term database to our search terms. If the resource database uses the word database in its own pages, you are likely to find it in Google. The word database is also useful in searching a topic in many directories because they frequently use it to describe searchable databases in their listings. Some examples of databases that can be found by adding the word database to our search terms include "plane crash database," "languages database," and "toxic chemicals database." A directory of these "Invisible Web" resources can also be found at these sites: Invisible Web Internet Directories Internet Bibliographies Vortals A unique type of search engine is a vortal. The term vortal comes from the words vertical portal. A vortal searches for information from a designated slice of the Web. There are specialty vortals for just about any type of industry, field, or topic. Examples include: Bpubs.com: a vortal dedicated exclusively to indexing free business-related articles and publications DealTime: a search engine used for comparing products, prices, and stores ECalibration.com: a search engine that allows searches for calibration professionals Medical Matrix: a ranked and rated peer-reviewed medical search engine Evaluating Resources from the Web It has been said that the best feature of the Web is its ability to quickly find thousands of resources. It has also been said that the worst thing about the Web is its ability to quickly find thousands of resources. How could this be both its best and worst characteristics? The answer is in the questionable credibility and relevance of all the information available. Anyone can claim to be an expert and post facts and statistics that are completely inaccurate. How do we know if the information we find is credible? We review the material by asking the following questions, based on the guidelines provided by UMUC's Evaluate Internet Resources page: Authority Module 3: Internet Browsers, Search Engines, and the Web Is it clear who is sponsoring the creation and maintenance of the page? Is information available describing the purpose of the sponsoring organization? Is there a way of verifying the legitimacy of the page's sponsor? For instance, is a phone number or address available to contact for more information? Is it clear who developed and wrote the material? Are his or her qualifications for writing on this topic clearly stated? Is there contact information for the author of the material? Accuracy Are the sources for factual information given so they can be verified? Is it clear who is responsible for the accuracy of the information presented? If statistical data are presented in graphs or charts, are they labeled clearly? Are there errors you can substantiate in the data presented? Objectivity Is the page and the information included provided as a public service? Is it free of advertising? If there is advertising on the page, is it clearly separated from the informational content? Are there any other signs of bias? Currency Are there dates on the page to indicate the following: when the page was written? when the page was first placed online? when the page was last revised or edited? Are there any other indications that the material is updated frequently to ensure currency of the data? If the information is published in print in different editions, is it clear what edition the page is from? Are the links on the page up-to-date? Coverage Is there an indication that the page has been completed and is not still under construction? If there is a print equivalent to the Web page, is there clear indication of whether the entire work or only a portion of it is available on the Web? If the material is from a work that is out of copyright (as is often the case with a dictionary or thesaurus), has there been an effort to update the material to make it more current? Is there any other evidence of omissions? Does it cover the subject adequately? Citing Internet Resources After we have found the resources we were looking for and have deemed them to be relevant and credible, we must know how to document them in our writing. Just as you would cite printed resources when you write, you must cite your Internet resources. Your citation format will depend on your resource and whether you are citing in the American Psychological Association (APA) or Modern Language Association (MLA) format. All students taking UMUC computer classes are expected to use the APA format. If you are citing an article from an online database, your citation will be very similar to the print version of the article with the addition of the electronic information, including the date you accessed the resource online and the name of the database you used to find the article. Module 3: Internet Browsers, Search Engines, and the Web If you are citing a Web page, you must try to determine the name of the author, the title of the page, the publication date or revision date, the exact URL, and any sponsoring organizations or publishers. UMUC's Office of Information and Library Services provides references on citing specific Internet resources in both APA and MLA style formats. Click on the following links for specific guidelines to cite your resource: Internet citations in APA style Internet citations in MLA style Realize that any use of material from a Web page must be recognized. All text (in whole or paraphrased), images, or multimedia that you find on the Internet and use on a Web page of your own creation must be cited. As a student, you have a lot of leeway. Basically, you may use any text, image, or other multimedia as long as you give the proper citation information, unless there are specific instructions that you may not use the material. An example of not being able to use images can be found on the Universal Studios site. Under no circumstances is anyone allowed to use the images on this site without specific permission from Universal. If you create everything on your site yourself, you should place a copyright statement on the page. An example of a copyright statement may be seen on the SureThing site. References Pew Internet and American Life Project. (2005). Internet: The mainstreaming of online life. Retrieved December 16, 2005, from http://www.pewinternet.org/PPF/r/148/report_display.asp Stewart, W. (2004, Nov.). Web Browser History. The living Internet. Retrieved November 17, 2005, from http://www.livinginternet.com/w/wi_browse.htm Sullivan, D. (2002, Oct. 14). How search engines work. Retrieved November 15, 2005, from http://searchenginewatch.com/webmasters/article.php/2168031 Sullivan, D. (2005, Aug. 23). Neilsen net ratings search engine ratings. Retrieved November 18, 2005, from http://searchenginewatch.com/reports/article.php/2156451 W3Schools. (2005). Browser Statistics. Browser information. Retrieved November 17, 2005, from http://www.w3schools.com/browsers/browsers_stats.asp Return to top of page Report broken links or any other problems on this page. Copyright © by University of Maryland University College . These are the popups for this section, in order of their appearance. Popup 1: SMTP—Simple Mail Transfer Protocol, a protocol for sending e-mail messages between servers. Most e-mail systems that send mail over the Internet use SMTP to send messages from one server to another; the messages can then be retrieved with an e-mail client using either POP or IMAP. Popup 2: Network News Transfer Protocol (NNTP)—An Internet protocol used to post and retrieve messages on newsgroups. NNTP servers manage the worldwide network of newsgroups and include the server at your ISP. Module 3: Internet Browsers, Search Engines, and the Web An NNTP client is included as part of your browser or e-mail client. You can also use a separate program called a newsreader. Newsgroups allow you to find the answer to virtually any question you may have. NNTP is the main protocol used by computers for managing the notes posted on newsgroups Popup 3: RSS—RDF Site Summary (formerly called Rich Site Summary or Really Simple Syndication) is a method of describing news or other Web content that is available for "feeding" (distribution or syndication) from an online publisher to Web users. News is only one form of content that can be distributed with an RSS feed. Other possibilities include discussion forum excerpts, software announcements, and any form of content retrievable with a URL. Popup 4: File Transfer Protocol (FTP)—The simplest, but not the only way to exchange files between computers on the Internet. Its main purpose is to upload and download programs and other files from and to your computer to and from other computers. Popup 5: HyperText Transfer Protocol (HTTP)—The set of rules for exchanging text, graphic images, sound, video, and other multimedia files between a Web browser and Web server. The main purpose of HTTP is to transfer displayable Web pages and related files to your Web browser and other devices. A key concept of HTTP is that a file can contain references (or links) to other files. When you click on the links, the file referenced is sent to your browser. Popup 6: Berners-Lee, Tim—Invented the World Wide Web in 1989. A graduate of Oxford University in England, BernersLee currently holds the 3Com Founders chair at the Laboratory for Computer Science at MIT. He also directs the World Wide Web Consortium. In 1990, he wrote the first Web browser. Popup 7: Andreessen, Mark—Cofounder of Netscape Communications, Inc., and was its vice president of technology. At the University of Illinois, Andreessen created the National Center for Supercomputing Applications' (NCSA) Mosaic graphical browser for the World Wide Web. After leaving NCSA in 1994, he worked for Enterprise Integration Technologies/Terisa Systems before joining with James Clark to form Netscape Communications. Andreessen led the company in creating Netscape Navigator, a widely used Internet browser. Netscape Communications has also created a complementary pair of World Wide Web servers for the Unix platform, one of which uses Netscape's Secure Sockets Layer (SSL) technology to offer fully secure two-way communications via the Web. Popup 8: Module 3: Internet Browsers, Search Engines, and the Web Mosaic—An Internet information browser and World Wide Web client developed in 1993. NCSA Mosaic was developed at the National Center for Supercomputing Applications at the University of Illinois in UrbanaChampaign. Popup 9: Cross-platform—A term that refers to applications, formats, or devices that work on different computer hardware and software systems. Popup 10: http://www.dejavu.org/ Popup 11: Open source—Any program whose source code is made available for use or modification as users or other developers see fit. Popup 12: http://www.w3schools.com/ Popup 13: Gopher—An Internet protocol that allows an Internet user to receive text files or lists from servers all over the world; these lists contain the location of files on the Internet. Popup 14: WAIS—Wide Area Information Server (WAIS) lets you search through Internet archives and look for articles containing groups of words. Popup 15: http://www.umuc.edu/library/guides/internet.html Popup 16: Module 3: Internet Browsers, Search Engines, and the Web Internet search engines—Unique sites on the Web specifically designed to help you find information located on other sites. Popup 17: http://www.google.com/ Popup 18: http://dmoz.org/ Popup 19: Spiders—Small programs that go from one Web site to another, gathering information about the site. This process is called Web crawling. Popup 20: http://teoma.com/ Popup 21: http://search.looksmart.com/ Popup 22: http://www.inktomi.com/ Popup 23: http://searchenginewatch.com/webmasters/article.php/2168031 Popup 24: Meta-tags—HTML tags used to describe the contents of a Web page. Module 3: Internet Browsers, Search Engines, and the Web Popup 25: http://searchenginewatch.com/links/article.php/2156241 Popup 26: http://www.webmetasearch.com/ Popup 27: http://www.dogpile.com/ Popup 28: http://www.metacrawler.com/ Popup 29: http://www.search.com/ Popup 30: http://www.mamma.com/ Popup 31: http://www.learnwebskills.com/search/engines.html Popup 32: http://lii.org/ Module 3: Internet Browsers, Search Engines, and the Web Popup 33: http://www.academicinfo.net/ Popup 34: http://infomine.ucr.edu/search.phtml Popup 35: http://www.umuc.edu/library/database/ Popup 36: http://eserver.org/ Popup 37: http://www.lexis.com/ Popup 38: http://www.ire.org/inthenews_archive/aviation.html Popup 39: http://www.ethnologue.com/ Popup 40: http://www.epa.gov/tri/ Module 3: Internet Browsers, Search Engines, and the Web Popup 41: http://www.invisible-web.net/ Popup 42: http://www.mlb.ilstu.edu/ressubj/subject/intrnt/research.htm#Using Internet Directories Popup 43: http://www.mlb.ilstu.edu/ressubj/subject/intrnt/research.htm#Using Internet Bibliographies Popup 44: http://bpubs.com/ Popup 45: http://dealtime.com/ Popup 46: http://www.ecalibration.com/ecalibration/index.htm Popup 47: http://medmatrix.org/reg/login.asp Popup 48: http://www.umuc.edu/library/guides/evaluate.html Module 3: Internet Browsers, Search Engines, and the Web Popup 49: http://www.umuc.edu/library/citationguides.html#apa Popup 50: http://www.umuc.edu/library/citationguides.html#mla Popup 51: http://www.universalstudios.com/ Popup 52: http://www.surething.com/ST/Page.asp?PageCode=TERMS Module 3: Internet Browsers, Search Engines, and the Web Due dates for the following assignments and directions for submitting assignments are listed in the course syllabus. Tasks 1. Using a search tool of your choice, perform a search for a site that provides information about local politics and one that has information about national politics. List the URLs and provide a brief description of each. 2. Use two different search engines to find information pertaining to software development in India. List the top five sites found in each of the search engines. What appears to be the difference in the two resulting lists? 3. Use a search engine of your choice to find a searchable database of your choice. What database did you find? Provide the URL and a short explanation of what materials the database provides. 4. Using a search tool of your choice, perform a search and identify three Web sites that provide "free stuff." List the sites and briefly describe what they provide. 5. Provide an example of an Internet resource citation in APA format. 6. Find the Firefox browser Web site and download the browser. Find a Web page that reviews the Firefox browser. What features does the review say Firefox has that Internet Explorer does not have? Install and open Firefox and test some of the features mentioned in the review you read. Write a short paper that describes your experience in using the Firefox browser and your opinion of it. Review Questions 1. Please explain the following statement: "Search engines do not search the Web, but rather search indexes." 2. What is a meta-search engine? 3. What are Web directories and how do they compare with search engines? Module 3: Internet Browsers, Search Engines, and the Web 4. What are the characteristics of AltaVista that differentiate it from other search engines? Report broken links or any other problems on this page. Copyright © by University of Maryland University College .