Invisible Web - Bucks County Community College

advertisement
Exploring the
Invisible Web
Kevin R. Morgan Ed.D
Professor/Designer
St Petersburg College
Introduction: The Visible and Invisible Web
“I readily believe there are more invisible than visible things
in the universe.”
(Burnett, 1692) motto in Coleridge’s The Rime of the Ancient Mariner
There are vast amounts of information available online,
but even more information exists beyond the grasp of the
general search engine.
There is a much larger universe of invisible information in
databases and directories which can’t be accessed by general
purpose search engines, but it is nevertheless online, free, and
of the highest academic standards.
The Information Age and the WWW
The Internet: Network to Knowledge
• A 21st century library of Alexandria
• A content of rich and interactive information network
• Internet- information resources for teaching and learning
Utilizing the WWW for research and learning
The World
. Wide Web is estimated to contain
over 3 billion documents. (Barker, 2003)
The Invisible Web is estimated to be
2-50 more times bigger than the visible web.
What is the Invisible Web?
The “Invisible Web” is a metaphor used to describe
The vast depth or domain of information that lies beyond
the visibility of our tools for gathering information.
It is not really invisible, just passed over or missed.
The Invisible Web includes the following and more:
• content that has been excluded from general purpose
search engines and Web directories.
• examples include databases from universities, libraries,
organizations, businesses, and government agencies.
• a substantial part of the total Internet.
How Big is the Invisible Web ?
One study conducted by search company BrightPlanet,
estimated that the inaccessible part of the web is about
500 times larger than what search engines already provide
access.
They estimated about 500 billion pages of information
available on the web, and only 1/500 of that information
could be reached via general search engines. (Sulivan, 2000)
More conservative estimates place the Invisible Web
at 2-50 times bigger than the visible web. (Barker, 2003)
Even using the most conservative estimates, the Invisible
Web represents a considerable quantity of information that lies
beneath the surface of the Web. It is deeper than we thought!
Invisible Web
100-500% larger
than Visible Web
6 billion- 30 billion documents ???
Visible Web
Government Directories
Educational Research
3 billion documents
Library of
Congress
Institutional Directories
I
Captured in
General Purpose
Search Engines
Eric
Scientific Research
Colleges and Universities
Specialized Search
Engines and Directories
Organizations
Public/Private
The Web: Visible and Invisible Information
The visible Web is made up of HTML Web pages
that search engines have chosen to include in their indices.
Google, Alta Vista, Look Smart and others general purpose
search engines all cover the surface of the Web but are limited
in going into the deeper reaches of cyber space.
There is an even greater amount of invisible information
in databases which can’t be directly accessed by general
purpose search engines, but it is never the less online and
freely available to the savvy searcher.
Search Engines: Robots, Knowbots and Spiders
Search Engines do not really search the Web directly.
Computer robot programs, referred to sometimes as
"crawlers" or "knowledge-bots" or "knowbots" are used
by search engines to roam the World Wide Web.
Most large search engines operate several robots or
spiders all the time. Even so, the Web is so enormous
that it can take six months for spiders to cover it, resulting
in a degree of "out-of-datedness" in results. (Barker 2003)
Spiders or crawlers are programmed to retrieve general
information by avoiding unfriendly or dangerous URLs that
can trap them in endless loops of information or spider traps.
Reasons for Invisibility of Some Pages
There are certain types of pages that search engine companies
routinely exclude by policy to save time and money.
Some pages present technical barriers to web crawlers and
are passed over by general browsers for time and efficiency.
For example, A spider or crawler will back off when
encountering a question mark (?) in a URL.
To save time and money, spiders are programmed to
avoid or exclude many sites, including educational,
Governmental, and organizational databases.
Visibility and Invisibility
Visible Web
Invisible Web
Educator’s Reference Desk
ERIC Database
The Library of Congress
Special Collections
URLs ending in edu, org, gov
a page has a ? in its URL
General Search Engines
and Subject Gateways
Institutions and Organizations
Internal directories
It is very difficult to predict what sites will or won't be part of the
Invisible Web. As Search Engines change their policies, what is
invisible today can become visible tomorrow. Many sites are
already hybrid- with both visible and invisible components.
The Value in Using the Invisible Web
Invisible Web resources offer the highest level of authority
as educational institutions and government organizations
maintain a high level of quality control over their information.
Specialized search interfaces provide more control
over search input and output with increased precision.
Comprehensive resources allow searchers to perform
exhaustive searches within a specific subject area and
keep up-to-date and current.
The search can yield exhaustive results of timely content.
Invisible Web databases have the most current
information available online as they are updated often.
Understanding the Invisible Web
The data found in the Invisible Web cannot be accessed
easily via general purpose search engines.
The Invisible Web is not the sole solution to all one’s
information needs. It should be used in conjunction with
other informational sources, including general searches.
Invisible Web resources clearly identify who is providing
the information, making it easy to judge the authority of the
content and its provider.
Targeted crawlers offer more comprehensive coverage of
their subjects than general purpose search engines.
Finding the Subject Databases and Directories
Much of the Invisible Web is made up of the contents of
thousands of specialized databases accessible online.
Have a clear subject in mind to find the best specialized
databases for your subject of study or field of research.
Many databases can be found by using the word, database
after a subject term, such as “humanities database” or
“history database.”
Another tip is to search using the words web directory and
then your topic. If a directory web page refers to itself
using the words "web directory," you will locate it.
Searching Tip: Use Subject Gateways
Searching through subject databases and web directories may
be unfruitful for the novice searcher or student. Many of
These independent searches can end in blocked access.
Problem: Many of the databases are password protected.
Solution: An easier and more fruitful method for finding
databases relating to a specific subject area is to use some
of the gateway sites that have already been organized by
subject and content.
These subject gateways are organized from general and specific,
enabling students, educators, and researchers to finding valuable
visible and invisible sources on the Internet.
Educational Gateways
Infomine provides a gateway to scholarly Internet
resource collections: http://infomine.ucr.edu/
Academic Info also provides an educational subject directory
and subject gateways: http://www.academicinfo.net/
The Educator’s Reference Desk has become the new access
gateway to the ERIC databases: http://www.eduref.org/
The Alliance for Life-Long Learning offers online classes
from Stanford, Yale, and Oxford Universities and provides
a library of online resources through its Academic Subject
directories that meet the highest academic standards:
http://www.alllearn.org/er/directories.cgi
General Purpose Subject Gateways
Use the Invisible Web Directory from Sherman and Price’s
companion site to The Invisible Web:
http://www.invisible-web.net/
See this multi-subject guide to specialized search engines:
http://www.searchability.com/
Explore CompletePlanet to link to over 103,000 searchable
databases and specialty search engines :
http://www.completeplanet.com/
Evaluating Invisible Web Resources
The Librarians Index to the Internet provides an annotated
directory with cross-reference links to both visible and
invisible content: http://lii.org/
ResearchBuzz provides daily updates on search engines,
new software, browser technology Web directories and
databases: http://researchbuzz.com
The Scout Report provide academics, researchers, librarians,
and the K-12 community with valuable online information:
http://scout.cs.wisc.edu/index.php
The Internet Resources Newsletter is a monthly newsletter
for academics, students, scientists, and social scientists:
http://www.hw.ac.uk/libWWW/irn/irn.html
References
Barker, (2003) “Recommended Search Engines: Table of Features” UC Berkley
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
Sherman & Price, (2001) The Invisible Web: Uncovering Information
Sources Search Engines Can’t Find. CyberAge
Sullivan, (2000) “Invisible Web Gets Deeper”, The Search Engine Report,
August 2000. http://searchenginewatch.com/sereport/article.php/2162871
Exploring the Invisible Web
Contact Information
Dr. Kevin R. Morgan
St. Petersburg College: eCampus
Seminole, Florida
morgank@spcollege.edu
Download