CMP 101: Searching the Internet Learning Assignment Searching the Internet Learning Assignment How do you find information on the Web? Before you start This tutorial assumes you have completed the Introduction to the Internet assignment and that you know how to navigate the web and use browser commands. Objectives Students will be able to: Identify current web search tools and describe the benefits and drawbacks of each. Use a directory to narrow a topic; navigate within a directory using breadcrumbs and cross references. Identify key words and create a search query using basic and advanced keyword search techniques. Analyze search results and refine search queries. Locate and use specialized sites designed for searching deep web resources. Introduction FAQ What is the World Wide Web? The World Wide Web refers to the billions of documents stored and accessed via the Internet and viewed by a special program called a web browser. Web documents contain hyperlinks that allow readers to jump from one document to another with a click of the mouse. Hyperlinks can also be gateways to audio and video broadcasts, animations, and other types of media. This depends a lot on the type of information you are looking for. For example, if you need reliable or primary sources, your search will be very different than if you are looking for stats on your favorite football team. Today’s common surface web search tools include search sites, metasearch sites, and directories. What is the difference between Surface Web and the Deep Web? The surface web (or visible web) refers to material that you can find using most search sites currently available on the web. The deep web refers to information sources that cannot be found using a typical search engine. They may be subscription-based resources or pages that are created dynamically based on certain criteria specified by the user (username and password, table of data, etc…). Special sites and directory listings are used to access deep web content. I already know how to use a search engine, why do I need to know about anything else? Good research involves the use of multiple resources in multiple formats. In this exercise you will learn more about performing multiple searches for electronic resources on the web. Some of what you learn here you will be able to apply to other formats (other electronic databases, print, audio, etc…). NOTE: The instructions in this packet are based on the Internet Explorer installation on campus computers. If you are using a different computer, your screens may be different. Revised: 2/8/2016 Page 1 of 19 CMP 101: Searching the Internet Learning Assignment Directories Figure 1: Directory Example – Yahoo! A directory is a categorical list of resources. In the case of web directories, the categories and the resources presented are the result of human reviewers and editors rather than computer programs. When to use: Narrow down a topic Collect synonyms or keywords for unfamiliar/broad topics Seek resources for specialized information Benefits: Sites included in a directory are generally more popular and more reliable. Categorized listing makes it easy to narrow down unfamiliar topics. Very good for generalized questions. Drawbacks: Compared to most search engines, include a very small number of web sites. Not good for finding specific topics or information. Usually less searching capability. Figure 2: Directory Example – Open Directory Examples: Yahoo! (dir.yahoo.com), Open Directory (dmoz.org), Yahoo! Kids (kids.yahoo.com). When you see this icon, write or type your answers on the attached answer sheet. If you see questions without this icon, your instructor may ask you to discuss your findings in class or online. Revised: 2/8/2016 Page 2 of 19 CMP 101: Searching the Internet Learning Assignment Using Directories Figure 3: Yahoo! Directory Results List 1. Logon to a computer, open a web browser, and go to the Yahoo! directory at dir.yahoo.com. 2. Click on the Science category link. The results list consists of a list of related sub-categories and a list of related sites. 3. Click on the Agriculture link. Each click gives a narrower topic. This is referred to drilling down, focusing in on, or narrowing a topic. 4. Continue to narrow the topic by clicking on Biotechnology and then Genetic Engineering (see Figure 3). How many sites are listed in this category? Notice the breadcrumbs above the results list. You can use these links to jump back to a previous category. 5. In the breadcrumbs, click Agriculture to return to the Agriculture category. 6. Click Food Safety@. The “@” symbol indicates a cross-reference which means the subcategory is listed in multiple places within the directory. Note the change to the breadcrumbs. Figure 4: Browser History 7. In the breadcrumbs, click Directory to return to the main list of categories. Click and hold the Back button until the history list appears (see Figure 4). Click Biotechnology < Agriculture… to return to the Biotechnology category. 8. Open a new tab. Go to dmoz.org and use the same category links from steps 2, 3, and 4 to locate the Biotechnology subcategory. What are the differences in the results of the two searches. Close the dmoz.org tab when finished. Revised: 2/8/2016 Page 3 of 19 CMP 101: Searching the Internet Learning Assignment Search Engines FAQ What is a search engine? A search engine is collection of software designed to collect words from web pages, rank and index the words, and create a database that can be searched. This means that when using a search engine, you are actually searching the database created from the web. Clarification: Many people use the term “search engine” when referring to a search site and/or the software associated with creating a search database. In this handout the term search site will refer to the web site you visit to perform a search and search engine will refer to the software used to create and query the database. How does a search engine work? There are four main parts to a search engine: Spiders or crawlers: Small programs that find new and modified web pages and collect information to send back to the search engine’s indexing programs. Indexing programs: Store and index pages in the search engine’s database based on words on the page as well as other meta-information which might include URLs on the page, metatags or special terms embedded in the page’s programming code, and other much more sophisticated data. The search engine itself: Identifies and retrieves the information queried by the user. Also ranks pages based on many factors (popularity, advertisements, location of searched words, etc…), not on reliability. The interface: How the user interacts with the search engine. Revised: 2/8/2016 How do you access a search engines? There are currently four main large-scale search engines: Yahoo! (yahoo.com), Google (google.com), Bing (bing.com), and Ask (ask.com). Other search engines typically use one of these to perform specialized searches. How do you use a search engine? Begin by defining what you are looking for. Searches typically fall into one of 3 categories: Answer a specific question. Example: In what city will the 2014 Olympics be held? Personal research. Example: Research products before making a large purchase. Academic research: Example: Write a paper for science class. Then determine keywords to use. Keywords are nouns, verbs, and sometimes adjectives that describe the information you are searching for. Choose words that most clearly define your question or topic. Leave out words like “a”, “the” and “for” unless they are part of a phrase. Consider synonyms or related terms. Finally, enter your search query. The search query is the actual keywords and search commands submitted to the search site. Type words in the order you would expect them to appear. Use advanced search techniques like phrase searching, wildcard searching, and Boolean searching. Use focus options such as images, news, or videos. Try multiple searches, intermixing synonyms and modifying word order. Page 4 of 19 CMP 101: Searching the Internet Learning Assignment Using Search Engines When to use Figure 5: Bing search results Looking for specific information Quick overview of a topic Need advanced searching capabilities Benefits Fast answers (but are they complete/reliable?) Advanced / intuitive search capabilities Automatic synonym and Boolean searches Drawbacks May not provide access to complete information on a topic May be difficult to distinguish sponsored links and advertisements from search results Ranking can be misleading Let’s first take a look at the behavior of search engines based on a simple query. We want to know in what city the 2014 Winter Olympics will be held. 1. Decide on keywords by choosing the most important words that identify the question. They include, in no specific order: city, Olympics, Winter, and 2014 2. Use your web browser to go to bing.com. Enter key words in the order you would expect them to appear in the results list. 3. Type 2014 Olympics in the search box and press enter. Review the descriptions for first two pages of the results list, and comment on what you find (see Figure 6). Figure 6: Commenting on search results When asked to comment on search results, consider the following: The number of sites found. Do all sites in the results list answer your question? Are answers contradictory or are they all the same? Do you recognize what might be reliable sources? 4. Change your query to 2014 Olympics city. Does that change the results? Try 2014 Winter Olympics? Revised: 2/8/2016 Page 5 of 19 CMP 101: Searching the Internet Learning Assignment Using Search Engines 1. This time we want to discover Shakespeare’s birthday. First, decide on the key words (include synonyms or related terms). Figure 7: Yahoo! search 2. Go to Yahoo! and perform the search using the keywords you selected. Take a moment to review the results list. 3. Try the search using a different search site and compare the results. When is Shakespeare’s birthday? 4. Practice: on the answer sheet, underline the keywords that could be used to search for the answer to each of the following. Then perform the search to locate and write down the answer. a. In which stock exchange is the FTSE 100 index used to measure stock market performance? b. According to legend, an acorn kept on your window sill will supposedly keep your house safe from what? c. In 1856-1857 Mark Twain wrote several letters to the Keokuk Post in Iowa. What name did he sign on those letters? d. Patent #2,026,082 is based on what board game patented in 1904? TIPS: leave out symbols and punctuation. For this question, you may have to perform more than one search. Write down any extra keywords used. Revised: 2/8/2016 Page 6 of 19 CMP 101: Searching the Internet Learning Assignment Advanced search Advanced searching involves making use of special commands or forms to force the search engine to provide more specific or more general results. Many advanced search techniques supported by most search sites include: Wildcard searching: Use an asterisk to replace unknown words or spelling. Phrase searching: Type the search query in quotes to have the search engine find the exact phrase. Boolean searching: Use “and”, “or” , “and not”, “+”, and “-“ Figure 8: Wildcard search 1. Go to google.com, type a penny * earned for the search query and press enter. How many of the results on the first two pages reference the saying: “A penny saved is a penny earned”. 2. Try again with the query a penny * * * * earned with a space between each asterisk. Are the results different? Figure 9: Phrase search Type 1 asterisk for each missing word for better results. 3. Try the same searches on at least one other search engine. Do you get the same results? 4. Phrase searching is used to locate an exact phrase, like the title of an article. It’s also good for forcing the search engine to include key words it might otherwise ignore (like “Star Wars I”). Return to Google and type “Organic foods: Are they safer? More nutritious?” including the quotes. What is the original source of the article? The results list many sites with this phrase. You should always try to find the original source of the information rather than relying on an author’s “claim” that an article came from a particular source. Revised: 2/8/2016 Page 7 of 19 CMP 101: Searching the Internet Learning Assignment Boolean search Some search engines support the use of Boolean operators: AND, OR, NOT or AND NOT, or symbols: + or – to broaden or narrow a search. Many search engines perform Boolean searches automatically or provide a form for users to customize Boolean searches. Many research database search tools are not yet that sophisticated. Use AND or + to narrow a search and force the search engine to include all keywords in the search results. This is the default for many search engines. Use OR to broaden a search and allow synonyms to be included in the results. Use NOT, AND NOT, or – to narrow a search and force the search engine to exclude keywords from the search results. What’s the point? Knowing how to identify key words and synonyms and modify searches using Boolean operators can save you a lot of time when researching a topic. While Internet search engines have become very adept at interpreting your search query, online libraries and databases are not as sophisticated. As search technologies progress, you will always benefit from being able to identify key words and modify your search to make more efficient use of whatever technology is available. Figure 10: Boolean search 1. Say, for example, you are writing a paper examining the relationship between poverty and crime. Return to Google, search poverty AND crime and review the results. Try removing the “AND” and just search for poverty crime Tip: Use the Back and Forward buttons to compare the results of both searches. You should see that all of the results are very similar. Many search engines use the AND operator by default. 2. Now go to Ask.com and search for poverty AND crime AND kids. Review the results that appear under “More Answers” Notice that the search engine automatically searches for synonyms for “kids” (example: child and juvenile) 3. Add –“single parent” to the search query (see Figure 10). Use the minus sign (-) to exclude keywords from the search results. In this case you are not interested in cases involving single-parent households so you remove results that include the phrase “single parent”. Revised: 2/8/2016 Page 8 of 19 CMP 101: Searching the Internet Learning Assignment Metasearch Engines Metasearch engines perform keyword searches on multiple search engines and provide the results in one list. When to use: Determining the scope of a topic (how much/what kind of information is available) Perform a quick search across several search engines Benefits: Fast method for conducting a broad search Some have special features such as clustering Drawbacks: Usually only return the first 10 or 20 results from each source. Some rely heavily on paid listings Advanced search operators may not be supported. Figure 11: Learn more about a search engine Examples: Dogpile (www.dogpile.com), Metacrawler (www.metacrawler.com), Ixquick (ixquick.com), Yippy (yippy.com). There are many, many more. 1. Go to dogpile.com. Click About Dogpile (page bottom). With any search tool, it’s a good idea to learn something about how the tool works. You can avoid duplication and save time. 2. In the About page (see Figure 11) you learn (at the time of this writing) that Dogpile uses Google, Bing, and Yahoo!. Now you know that if you do a search on Dogpile, you may not need to repeat the search at these sites separately. However, remember that Dogpile will limit the results list so if you need to dig deeper another search may be required. 3. Click the links for Metasearch 101 and FAQs and any other links you might find useful. Write down something you learned about Dogpile or searching in general. Revised: 2/8/2016 Page 9 of 19 CMP 101: Searching the Internet Learning Assignment Metasearch Engines 1. Figure 12: Dogpile search results Click in the search box and type the key words video game addiction. In reviewing the results list, sponsored (paid) sites are listed first, followed by the non-sponsored sites. Each item also displays the source search engine. 2. Click the News tab above the search box (see Figure 12). Many search sites and metasearch sites allow you to filter search results to a specific type. 3. Click the Web tab to return to web results. In the Are you looking for? section you may find related terms to help you in your search. 4. Click Dependence on Technology link. If you were writing a paper on video game addiction, the information from this search may help relate video game addiction to other social behaviors. 5. Now go to ixquick.com. Click About at the top of the page. Figure 13: Ixquick search results On the About page, in the More accurate search results section, you learn, among other things, that stars are used to rank the search results based on the number of search engines. 6. Use the Back button to return to the Ixquick home page and search for video game addiction. 7. Point to the stars next to the first non-sponsored site (see Figure 13). A screen tip appears showing the search engines used. Note that some of the search engines are metasearch engines too. Revised: 2/8/2016 Page 10 of 19 CMP 101: Searching the Internet Learning Assignment Clustered Search Figure 14: Yippy search results When doing research, some metasearch sites can help by displaying and grouping related terms. 1. Go to Yippy.com. Search for video game addiction. Yippy not only searches the web, it presents logical “groups” or “categories” called “clouds” to organize the information. This is sometimes referred to clustering. 2. Click the details link above the results list (see Figure 14). Information about the source of the results is listed. 3. Click the details link again to collapse the list. In the results list point to each icon next to a result site to learn their function. Try using the Preview icon to view the result site directly in the results list. 4. In the clusters panel on the left, review the “clouds”. Click the plus sign (+) next to any one of the categories listed. The search results now focus on the category you selected. 5. Click the Sources tab in the clusters panel Now you can see which sites come from which search engines and can filter your search results to a specific search engine. 6. Explore the other two tabs, sites and time to view changes to the results. Revised: 2/8/2016 Page 11 of 19 CMP 101: Searching the Internet Learning Assignment Specialized Search Tools Many government and private sector sites provide access to documents and data that is not available from the most common search engines. For example, if you wanted to gather statistics about a particular school, you might search the web for Education Statistics. You would quickly find the National Center for Education Statistics where you can build queries on the data available there. One such query tool is the Elementary/ Secondary Information System (ElSi) Example: you want to compare pupil/teacher ratios for your elementary (or one in your area). Figure 15: NCES Search 1. Go to nces.ed.gov (National Center for Education Statistics). Click in the search box in the upper right corner of the page, type ELSI, and press Enter. Many web sites allow you to search within the site. 2. Click ELSI - Elementary and Secondary Information System. Review the page for any notices you should be aware of (sometimes data is not up-to-date or contains errors). 3. Click the Begin button next to quickFacts. Use the options presented to learn more about the pupil/teacher ratio for a school you attended or one in your area for the most recent years available. Compare that to the oldest dates available. Since the page is generated dynamically (when you ask for it), surface web search tools cannot index it. Revised: 2/8/2016 Page 12 of 19 CMP 101: Searching the Internet Learning Assignment Find Specialized Search Tools Some deep web resources are indexed in directories maintained by specialists in the field. To find some of these deep web search tools use your favorite search engine and add the following keywords (or synonyms) to the topic you are interested in. Remember, anybody can post anything on the Internet, so always be skeptical of your search results. Figure 16: Many Search Tools are available on the Web “web directory”, “resources” or “internet resources” or “web resources” “library”, or “portal” “pathfinder” for lists of printed resources 1. Use bing.com to perform a search for each of the following. Use different tabs so you can compare results. agriculture agriculture portal agriculture “web resources” agriculture pathfinder Are all of the search results the same or different? How would this help if you were writing a paper about agriculture? 2. Find your own search tools: Use a search engine of your choice to search for the following types of search tools. Write the URL of a site for each and a brief description how the search site works or how search results are arranged. a. Good search engines for students b. Visual search engines c. Deep web search engine d. Reference tools (dictionary, thesaurus, encyclopedia, almanac, etc…) Revised: 2/8/2016 (Search results or related topics are presented visually) Page 13 of 19 CMP 101: Searching the Internet Learning Assignment Use Research Databases Most colleges now offer electronic databases that allow you to search periodicals. You can use some of the same techniques to search these databases. Figure 17: Academic One File 1. Go to the Wor-Wic web site (www.worwic.edu), point to Quick Links, click Library Services, and then under Research Databases, click By Subject. 2. We’ll be looking for information about Video Game Addiction so let’s start with Social Sciences and choose the first database on the list, Academic OneFile. 3. Perform a keyword search on Video Game Addiction. 4. When the results are displayed, change the search options to limit the search results to peer-reviewed publications published in the last year. 5. Open another tab, repeat step 1 to go to the Library Services page, select the Health, Medicine & Nursing category and choose the Health and Wellness… database. Use the Advanced search to find full text articles on Video Game Addiction published in the last year. Figure 18: Health & Wellness Resource Center 6. How do the two searches compare? Practice modifying your search by removing the date filter (look for “revise search” or “remove limit”). 7. Click one of the article titles to open the full document. Look for a “print” link on the page and use it to print page 1 only. Use the print link to open the article in a format more suitable for printing. NOTE that some articles may open in Adobe reader which is also suitable for printing. Revised: 2/8/2016 Page 14 of 19 CMP 101: Searching the Internet Learning Assignment More search examples and suggestions Historical search (chronicled, retrospective, varied opinions) Identify keywords and use a metasearch site such as DogPile or IxQuick. Consider synonyms, plural vs. singular spellings, different spellings, etc… Examples: Emergence of smartphones Consider synonyms for “emergence”: “rise”, “development”, and “evolution”. Consider different spellings: “smartphone” (singular), “smart phones” (two words), “smart phone” (two words, singular) information. Keywords like “trends” or “discoveries” may provide more current data. Personal Research (large purchases, medical information, restoration project) Identify keywords and use either a metasearch site or a search site. With personal research you may rely on “comparisons” (keyword “compare” or wildcard “compar*”), “consumer feedback”, and professional “reviews” so include those keywords in your search. If you know of a reliable source of information on a product, you can include that too. Examples: smartphone review, smartphone consumer feedback, consumer reports smartphone Perform multiple searches: Make note of other related terms and keywords you find. Make use of clustering search engines to help you find more related terms. Identify keywords: Write your topic in a sentence to identify the key words. Make a list of synonyms or related topics. You might add the keywords “history” or “historical” to obtain background Revised: 2/8/2016 Search for search tools: Remember that experts in the field sometimes create directories of materials related to your topic. Use some the suggestions on the previous page(s) to locate those resources. You can also add keywords like “tutorial” or “beginner” or “guide” to lead you towards introductory information on some topics. Need reliable sources? Make use of specialized search tools like sweetsearch.com, ipl2.org, or your school’s research databases. Feeling overwhelmed? Research is gathering information – all kinds of information. It can be overwhelming at times. It may help to organize and categorize the information you have found (just like a directory) and decide whether or not it fits into the scope of your project. More practice. Complete the More Practice questions at the end of the answer sheet as directed by your instructor. Page 15 of 19 CMP 101: Searching the Internet Learning Assignment Resources: Reference Materials Columbia Encyclopedia: Wikipedia: Dictionary: Search Engines Google: google.com Yahoo!: yahoo.com Bing: bing.com Ask: ask.com Directories Internet Public Library: Open Directory: Yahoo!: Yahoo! Kids: Best of the Web: Spanish Dictionary: Reference Material: ipl2.org dmoz.org dir.yahoo.com kids.yahoo.com botw.org A few specialized directories there are so many out there on just about any topic Virtual Library: vlib.org Open Access Books: www.doabooks.org Queen Victoria’s Journals: www.queenvictoriasjournals.org Business.com: business.com Online books: onlinebooks.library.upenn.edu Revised: 2/8/2016 encyclopedia.com wikipedia.org yourdictionary.com merriam-webster.com diccionarios.com refdesk.com infoplease.com Research tools & search engines Sweet Search: sweetsearch.com Finding Dulcinea: findingdulcinea.com Jstor: jstor.org Noodle: noodletools.com Several tools and helpful information on conducting research Page 16 of 19 CMP 101: Searching the Internet Learning Assignment Searching the Internet – Answer Sheet Pg, Step Description 3, 4 Number of sites in Genetic Engineering category 3, 8 Compare category listings (Yahoo! and dmoz.org) 5, 3 Comment on results for 2014 Olympics keyword search 6, 1 Keywords for Shakespeare’s birthday 6, 4a In which stock exchange is the FTSE 100 index used to measure stock market performance? 6, 4b 6, 4c 6, 4d 7, 4 Type or write answers below According to legend, an acorn kept on your window sill will supposedly keep your house safe from what? In 1856-1857 Mark Twain wrote several letters to the Keokuk Post in Iowa. What name did he sign on those letters? Patent #2,026,082 is based on what board game patented in 1904 (remember to write down any extra keywords used)? Source of the article Organic foods: Are they safer? More nutritious? Revised: 2/8/2016 Page 17 of 19 CMP 101: Searching the Internet Learning Assignment 9, 3 Write down something you learned about Dogpile or searching 13, 1 Are all of the search results the same or different? How would this help if you were writing a paper about agriculture? 13, 2a Search engine for students 13, 2b Visual search engine 13, 2c Deep web search engine 13, 2d Reference site Revised: 2/8/2016 Page 18 of 19 CMP 101: Searching the Internet Learning Assignment More Practice 1. For each of the following, indicate what kind of search (keyword, wildcard, phrase, directory, deep web) would be most appropriate and why. Try your search strategy to see how well it works. Searching for Asnwer Reasoning Ideas for fun things to do on the weekend. Directory search You can focus in on activities of interest. With the other searches, many different activities would be included in the search results. It would be harder to weed out the ones you aren’t interested in. Title of a poem when you only know a few words that aren’t necessarily next to one another in the poem. Keyword or wildcard You are looking for something specific and have some of the words in the poem. If you know the order of the words, you can use wildcards, otherwise, just use a keyword search. Your turn… 1. How to treat a bee sting 2. Write a paper for astronomy class 3. Obtain education statistics for US schools by state or region. 4. Information on Richard I 2. Go to www.agoogleaday.com and see how many questions you can answer. Be sure to use the Google a day search site so as not to give away the answer. Revised: 2/8/2016 Page 19 of 19