Conquering the Invisible Web Presented by Bonnie Shucha University of Wisconsin Law Library bjshucha@wisc.edu August 10, 2005 The Problem Most searchers only locate 0.03% - 1 in 3,000 - of the Web pages available to them Even advanced searchers, using largest search engines, can only access about 16% of Web content Diagrams from http://brightplanet.com/technology/deepweb.asp Why? Because 84% of the information available on the Internet is found only on the “invisible Web,” a.k.a. “deep Web,” and is not searchable using a general search engine such as Google Invisible 84% Statistics from The Deep Web: Surfacing Hidden Value, http://www.press.umich.edu/jep/07-01/bergman.html Visible 16% The Invisible Web The Visible Web Visible Web page exists in “static” or unchanging form Exists as a “physical” file on a computer Most in .htm or .html format Similar to a word processed document in .doc or .wpd format The Visible Web Static Web pages considered “visible” because standard search engines can index them and display them as search results Indexing & the Visible Web Search engine spider crawls Web starting with already indexed static pages Spider encounters link to a new static Web page Webmaster registers new static Web page with search engine Spider follows link Spider adds new Web page to search engine’s index Content rendered “visible” The Invisible Web Invisible Web content is “dynamic” or changing Contains bits of information stored in a database and pulled together on-the-fly into a Web page at your request Page doesn’t exist until you request it Similar to a mail merged document The Invisible Web Dynamic Web Page Your search results Database Author Publication B. Shucha Searching Wisconsin Smarter Lawyer J. Doe Common Law J.Q. Public Legal Tech Tips 1. B. Shucha, “Searching Smarter,” Wisconsin Lawyer. 2. J.Q. Public, “Legal Tech Tips,” ABA Journal. Title Marquette Law Review ABA Journal The Invisible Web Because this content is dynamic, or “physically” nonexistent, most search engines are unable to retrieve it, thereby rendering it “invisible” Indexing & the Invisible Web Spider crawls Web starting with already indexed static pages Spider encounters database Query is required to access “dynamic” data Spider incapable of generating query Spider stops and cannot index data in database Content rendered “invisible” The Invisible Web Other types of Invisible Web Content Very recent static pages which haven’t yet been indexed Password protected data Invisible Web Content 95% of invisible Web content is free and available to the public Quality of content often exceeds that of visible Web content From The Deep Web: Surfacing Hidden Value, http://www.press.umich.edu/jep/07-01/bergman.html Invisible Web Content Legal & Governmental Materials Available in the Public Domain Case law Statutes Bills Regulations Patents Briefs Census Data Government Reports Invisible Web Content Business Data SEC filings Stock quotes Company profiles Invisible Web Content General Information Address & phone directories Flight schedules Dictionaries Maps Invisible Web Content NOT freely available on Web (usually) For Profit Publications Public domain documents with editorial enhancements Other material that is someone’s intellectual property Finding Invisible Web Content To find ANY information, consider where an authoritative source might be found Print? Visible Web? Invisible Web? Subscription Database? Phone Call? Next, consider the quickest, most cost- effective way to get the information Finding Invisible Web Content If you determine that it may be available on the invisible Web, how do you find it? By knowing where to look! Finding Invisible Web Content A great deal of excellent legal and business information is freely available on the Internet Much of it is contained within databases and is, therefore, invisible to most conventional search engines Finding Invisible Web Content The most effective way to access this information is using the database’s own search box The search box is usually found on a static, visible Web page that is accessible using a conventional search engine Finding Invisible Web Content Search Strategy DON’T search for specific information using a conventional search engine DO use a conventional search engine to search for a database that may contain the information you seek THEN use the search box for that database to search for the specific information Finding Invisible Web Content “The point is that often the key to the answer is not locating the answer itself as the first step, but locating the right database in which to search for it.” Diana Botluk, Mining Deeper into the Invisible Web, http://www.llrx.com/features/mining.htm Search Exercises Attempt to locate the following: 1. Wisconsin Statute 758.01 2. My email address 3. Brief from WI Court of Appeals case, Docket 99-2588 4. The name of the person next to you 5. Subtitle C of the U.S. Internal Revenue Code 6. “Contract Law in Wisconsin” (State Bar of Wisconsin CLE Book) Reviewing Search Exercises Were you able you find the information? Was it from an authoritative source? What was your search strategy? Why did you choose this strategy? What were the costs associated with your search? Did you choose quickest, most cost effective method to locate the information? Group Exercise - #3 Locate brief from WI Court of Appeals case, Docket 99-2588 One strategy: Open Google or another search engine Search for database that may contain the brief “Wisconsin briefs” or “Wisconsin court of appeals brief” In the Wisconsin Briefs database, search for the brief (follow instructions) “992588” Group Exercise - #5 Locate Subtitle C of U.S. Internal Revenue Code One strategy: Open Google or another search engine Search for database that may contain the code “Internal Revenue Code” On the Internal Revenue Code page, browse to Subtitle C Group Exercise - #5 Locate Subtitle C of U.S. Internal Revenue Code Another strategy: Open Google or another search engine Search for database that may contain the code Search the GPO Access U.S. Code database, “United States Code” “Internal Revenue Code subtitle C” In the IRC section, note citation to subtitle C Go back to search box, and enter citation “26USC3101” Invisible Web Resources Federal Law American Factfinder, http://factfinder.census.gov Population, housing, economic, and geographic data from the U.S. Census FedStats, http://www.fedstats.gov Statistics from United States government agencies FindLaw's Cases & Codes, http://findlaw.com/casecode/ Links to databases of federal and state cases and legislation Invisible Web Resources Federal Law FindLaw's Supreme Court Center, http://supreme.lp.findlaw.com/supreme_court/resources.html Recent U.S. Supreme Court opinions, orders, briefs, docket, and more GPO Access, http://www.gpoaccess.gov U.S. Code, C.F.R., and so on, from the Government Printing Office Thomas, http://thomas.loc.gov U.S. legislation and other congressional information Invisible Web Resources Wisconsin Law WisBar State and Federal Legal Resources, http://www.wisbar.org/AM/Template.cfm?Section=Legal_Research Links to Wisconsin and federal resources Wisconsin Briefs, http://library.law.wisc.edu/elecresources/databases/wb/index.php Supreme Court & Court of Appeals briefs Wisconsin Legislative Drafting Records, http://library.law.wisc.edu/~draftingrecords Written materials, letters, and memoranda given to or created by the legislative drafting attorney Invisible Web Resources Wisconsin Law Wisconsin Legislature Infobases, http://folio.legis.state.wi.us/ Wisconsin statutes, acts, bills, and more Wisconsin Online Court Records, http://www.wicourts.gov/casesearch.htm Status information for Wisconsin cases (WSCCA.i & CCAP) Invisible Web Resources Journals, News, and More Badgerlink, http://www.badgerlink.net Scholarly & popular journals and newspapers Available to Wisconsinites Legaltrac, http://wsll.state.wi.us/enterlt.html Index of legal periodicals Available to WSLL card holders Invisible Web Resources Journals, News, and More MPL Database for Remote Use, http://www.mpl.org/files/great/bookmark.cfm?Category=82 D&B Million Dollar Database, CQ Researcher, etc. Available to Milwaukee PL card holders Yahoo Search Subscriptions, http://search.yahoo.com/subscriptions WSJ, LexisNexis, Consumer Reports, etc. Text available by subscription Invisible Web Resources General Invisible Web Directories CompletePlanet, http://www.completeplanet.com Direct Search, http://www.freepint.com/gary/direct.htm ProFusion, http://www.profusion.com Librarian's Index to the Internet, http://lii.org Presentation based on the article: Bonnie Shucha, Searching Smarter: Finding Legal Resources on the Invisible Web, Wisconsin Lawyer, September 2004, at 19, at http://tinyurl.com/dthen. © Bonnie Shucha Reference & Electronic Services Librarian University of Wisconsin Law Library bjshucha@wisc.edu http://wisblawg.blogspot.com