Artifact #6 - Kim Patton Capstone

advertisement
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
1
Introduction
Search engines are an everyday part of modern life. We search for information
on a whim, for work, for school, and other various reasons. This phenomenon has
become so commonplace we have even turned the name of one of the most widely
used search engines into an action verb for searching the web, causing the term
“Google” to be part of everyday conversation. Pew Institute reported in “Search Engine
Use 2012,” that 73% of all Americans use search engines, and has increased by over
1/3 since 2002 (Purcell, Brenner, & Rainie, 2012). Additionally, 54% of U.S. adults
report utilizing a search engine at least once a day.
The number of users of the World Wide Web (WWW) is large. The Internet
Systems Consortium reported in their July 2012 domain survey that there were over 900
million domains in the WWW (Internet Systems Consortium, 2012). On top of that, there
are hundreds of search engines to find and organize these domains in a useful way.
With the Web growing so rapidly, and so much information being produced, it is
important to understand how individuals find the information they need.
History of Search Engines
Although the World Wide Web seems like it has been around forever, the history
of search engines is actually quite young. The first tool used to search the Internet was
the program Archie, which was created in 1990 by computer science students, Alan
Emtage, Bill Heelan, and J. Peter Deutsch, at McGill University (Asadi & Jamali, 2004;
Seymour, Frantsvog, & Kumar, 2011). The name was derived from the word archive
without the “V”. The program worked by downloading directory listings from all files
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
2
located on FTP sites, and arranging them in a database that could be searched by
name.
In 1991 Mark McCahill from the University of Minnesota created Veronica and
Jughead, which searched file names in the Gopher index system (Seymore et al.,
2011). Veronica facilitated the keyword search of menu titles, and Jughead obtained
menu information from the Gopher servers.
After the Web materialized, robots were created to keep up with all the emerging
websites (Seymore et al., 2011; Vlachynsky, 2010). The first robot, Wanderer, was
developed in 1993 by Matthew Gray (Asadi & Jamali, 2004). The function of Wanderer
was to count and index all the pages of the Web in Wandex, but it was not intended as
a search tool (Seymore et al., 2011; Vlachynsky, 2010). Also in 1993, at the University
of Geneva, Oscar Nierstrasz created a series of scripts that would mirror and rewrite the
pages of the Web in a standard format which developed into the W3Catalog (Seymore
et al., 2011).
The next search engine of the Web was also developed in 1993 and was called
Aliweb (Seymore et al., 2011). Aliweb depended on web administrators to notify the
system and submit an index file instead of using a robot to index the information. This
helped with reducing bandwidth overload and provided more information to users for
searching. The problem with this system was that many administrators failed to keep
information up to date, which resulted in an incomplete and small database
(Vlachynsky, 2010).
After this failed attempt, in December 1993 Jump Station was released (Seymore
et al., 2011). This system again used a robot to find webpages. It was the first Web tool
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
3
to combine crawling, indexing, and searching. The index could be searchable by
keyword with a simple Web form, although the results were not ranked. Although the
project was innovative, it did not sustain due to lack of funding (Vlachynsky, 2010).
Ranking became the next feature that search engines started touting. RBSE,
created by NASA, was one of the first to start ranking websites (Vlachynsky, 2010).
However, it was not intended as a public use tool. Excite was a more complicated tool
that could detect relationships between words, which improved searching. WebCrawler,
released in 1994, further enhanced searching by scanning the whole webpage for
keywords instead of just titles or descriptions (Seymore et al., 2011; Vlachynsky, 2010).
Lycos was the first advanced search engine that offered features such as a large
index, links, pieces of the websites, and ranking (Seymore et al., 2011; Vlachynsky,
2010). Similar systems followed like AltaVista, AskJeeves, Infoseek, Magellan, Northern
Light, and OpenText (Vlachynsky, 2010). In 1995 Yahoo! joined the group. The flaw
with Yahoo! is that it operated on its web directory and not on the full webpages, which
was exclusive and cost money to be included.
1996 saw competition between search engines when Netscape offered a deal for
a single search engine use in their web browser. However, interest was so large that
Netscape contracted with the five major search engines on a rotation for $5 million per
year, including Excite, Infoseek, Lycos, Magellan, and Yahoo! (Seymore et al., 2011).
Less than 5 years later Google joined the group as a leader (Seymore et al.,
2011). Google became very popular due to their ability to rank websites based on
association with backlinks. Over the years Google has perfected its algorithm using
backlinks, relevancy, age, and many other indicators. They have also been favored due
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
4
to their simple design. As of 2010, Google accounted for about 2/3 of the searches in
the United States (Vlachynsky, 2010). By 2012, 83% of users reported Google as their
most common search engine, with Yahoo! in second at 6% (Purcell et al., 2012).
The last popular search engine to date has been Bing, which was released in
2009 by Microsoft (Seymore et al., 2011). By July of that year Yahoo! and Microsoft
signed a deal where Yahoo! Search would use the Bing technology.
How Do They Work?
Search engines in the Web and pre Web Internet worked differently (Vlachynsky,
2010). For the pre Web internet, users had to know the exact names of files because
the search engines of the past were only indexers, listing files on an FTP server. This
caused problems as the number of files increased, slowing down searches and causing
confusion between similarly named files.
Initial Web search engines were based on the old FTP retrieval method of
indexing. This worked initially because there were only a few websites on the Web.
However, as the number of websites exponentially grew, it was harder and harder to
keep the indexes up to date (Asadi & Jamali, 2004).
Common search engines work by using web crawlers to follow all links and
collect information for their indexes. Some store part of the websites in the system for
quick and easy retrieval, where others store every aspect of every page which is useful
if pages should change or update (Kuyoro, 2012). Modern search engines have evolved
from specific keyword searches to a combination of Boolean operators to proximity
searches to concept-based searching.
Evaluation
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
5
Because of both the way they are indexed and the method used for retrieval,
most online search engines are only somewhat effective since the crawlers used for
most of the indexing can only interpret text, and pictures and graphics are lost unless
they are accompanied by a caption (Taylor & Joudry, 2009). This automated method of
indexing also fails to discern a site’s purpose, history, policies, and bias.
As previously mentioned, retrieval for online search engines is usually completed
by using keyword searching. Sites that have more instances of the searched term or
more users who’ve chosen the site when using similar terms are ranked higher, no
matter if the content is fitting or not. This type of retrieval can work well for topical
searches, but falls short when a specific document or site is desired.
Despite this logical lack of effectiveness, perceived effectiveness by the users of
online search engines is quite a different story. In a study conducted by the Pew
Research Center, 91% of respondents said that when using an online search engine
they always or almost always find the information for which they are searching (Purcell,
2012). This is no small thing, given that 91% of online adults use a search engine to find
information. There is no doubt this method of search is widely used and growing more
accessible by the day. Websites are also capable of changing minute-by-minute as
information changes happen or news unfolds. This makes it possible users of online
search engines to cull from the most up-to-date resources available.
It has already been established that keyword searching is not the most effective
method of search, but online search engines also suffer from a lack of ability to tell the
difference between homographs or connect synonymous search terms (Taylor &
Joudry, 2009). Another, much more ominous weakness is also emerging: targeted
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
6
searching. Companies are beginning to appreciate the amount of people using online
search engines on a regular basis and the boon it would be to get their products or
services in front of all these potential customers. These companies are paying to get
their results listed first, no matter how relevant they are to the user’s search. This profitmaximizing trend by the search engines is causing the quality of results to suffer (Ahuja,
2010).
Because the evolution of search engines now allows them to remember users
and modify their experiences, which includes directing users to certain sites and
organizing search results based on past usage, search engine development and
assessment is ongoing, requiring companies to constantly evaluate search engine
users. Understanding what they need, want, and how they use the internet is critical in
keeping search engines fresh and competitive.
Exploring Search Engine Users
Student Users
Ismail (2011), studied ways identify the information needs of novice
researchers in order to create a supportive research environment. His subjects for the
study were first year postgraduate students. He considered these students to be early
stage researchers because their experiences are confined to conducting small scale
research projects for class assignments and final year projects during their
undergraduate studies, which is considered limited.
Students overwhelmingly prefer to use search engines, even when it means
sometimes being overwhelmed by the results or not knowing how to discern the
credibility of the sites that were found (Ismail, 2011). Participants in Ismail’s (2011)
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
7
study typically began their search process by searching for keywords or subject matter.
Their second choice was the title or author of a specific document. These same
participants did not spend much time looking for resources, and quickly became
discouraged if they either found too many resources or could not find enough. The
results of this study confirm that students need more instruction and guidance on search
engine usage and web searches (Ismail, 2011).
Georgas (2013) explored the use of both federated searches and Google in a
comparison study of student information seekers at Brooklyn College. Federated
searching is the use of a program that enables users to search several databases at a
time using a single search term. This method was once thought to be the library’s
answer to Google because it allowed a one-stop shopping method for users (Georgas,
2013). However, federated search technology is a costly tool, and as Google continues
to provide a constantly evolving free and inexpensive search method, libraries must be
able to justify the cost of providing this service. The question is, which method do
students really prefer? Georgas (2013) examined literature that addressed if students
prefer federated searching or Google; if students are able to identify relevant research
resources using both a federated search tool and Google; and if students possess
adequate information literacy skills to use each of these search tools effectively. One
study asked librarians to respond about their students’ preferences. The librarians
surveyed stated that federated searching did have its drawbacks such as not providing
seamless searching, being slow, and needing improvements. Even so, much like the
instructors at Brooklyn College, they thought it was the best rival to Google (Georgas,
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
8
2013). However, there is often the problem of information overload with search engines
such as Google.
A study by the Research Libraries Group found that federated searching was
viewed as a good tool for students to “get started finding stuff,” and not a good tool for
“advanced research.” Other studies mentioned by Georgas (2013) looked at what
students think about federated searching; student feedback on the implementation of
cross-database searching; and satisfaction in individual databases and online search
engines. Students stated that they actually preferred using a federated search tool over
Google, as they found it more efficient and would recommend it to a friend. Students
want efficiency and ease-of-use, but they realize the limitations of Google.
General Users
o
Most users confident in their search abilities
o
only 6% say they are not too or not all confident.
o
gotten so much information in a set of results that you feel overwhelmed
(38%)
o
According to pew all age groups, races, and sexes use search engines.
More common daily use by young and educated.
User Perceptions of Search Engines
Beyond how search engine users utilize search engines, many have opinions
about how search engines perform and employ their private information. The Pew
Institute reported that most users are happy with their search engine experiences, and
two-thirds feel that search engines are fair and unbiased, with younger users having
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
9
more faith in the purity of their results (Purcell et al., 2012). However, only 28% of users
feel that the information provided in searches is accurate or trustworthy.
Over half of adult searchers said that their search results had gotten more useful and
relevant over time, although it is not known if this is a result of search engines collecting
information and tailoring search results, or if searchers’ skills improve as they learn what
search methods are most effective.
Many users are not excited about search engines collecting user information and
tailoring searches based on this information. Sixty-five percent of participants in the Pew
report view personalized search results as a bad idea, citing the exclusion of potentially
useful or important information as the main reason (Purcell et al., 2012). When asked
about their thoughts on targeted advertising as a result of personalized searching, 68%
of internet users have negative feelings.
Future
The future of online search engines and how they will affect users is unknown.
Changes in the world of online search engines are swift, making their future difficult to
predict. However, there are a few possibilities rising to the top. The push to create a
Semantic Web, where information on the internet would be semantically defined and
connected to relevant data, would give online search engines a much more structured
source from which to pull, drastically improving search results (Taylor & Joudry, 2009).
Additionally, as noted by Palatnik (2007), social bookmarking is driving up the
effectiveness of online search. Search engines are beginning to factor these usercreated tags into their results. In a similar vein, Google now “recruits hundreds of
individuals to manually assess the quality of content on specific URLs” (Purtell, 2012).
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
10
These assessments are also used to alter the algorithms used by online search
engines. Both of these mark a trend back toward human-indexed content, the method
employed by the successfully indexed collections.
References
Ahuja, B. (2010). The future of search engines if paid search is given more importance
than organic search. Search Engine Journal. Retrieved from
http://www.searchenginejournal.com/future-of-search-engines-if-paidsearch/24394/
Asadi, S., & Jamali, H. R. (2004). Shifts in search engine development: A review of
past, present and future trends in research on search engines. Webology, 1(2).
Retrieved from http://www.webology.org/2004/v1n2/a6.html
Georgas, H.(2013). Google vs. the Library: Student Preferences and Perceptions When
Doing Research Using Google and a Federated Search Tool. portal: Libraries
and the Academy 13(2), 165-185.
Internet Systems Consortium. (2012). Internet Domain Survey, July, 2012:Number of
Hosts advertised in the DNS. Retrieved from
http://ftp.isc.org/www/survey/reports/current/
Ismail, M., & Kareem, S. (2011). Identifying how novice researchers search, locate,
choose and use web resources at the early stage of research. Malaysian Journal
of Library & Information Science, 16(3), 67-85.
Kuyoro, S. O., Okolie, S. O., Kanu , R. U., & Awodele, O. (2012). Trends in Web-Based
Search Engine. Journal of Emerging Trends in Computing and Information
Sciences, 3(6), 942-948.
A. McCoy, S. Velasquez, K. Patton. Understanding Online Search Engines
Exploring the history, the users and how they work.
11
Palatnik, P. (2009). Are social powered search engines the future of search? Search
Engine Journal. Retrieved from http://www.searchenginejournal.com/are-socialpowered-search-engines-the-future-of-search/4912/
Purcell, K., Brenner, J. & Rainie, L. (2012). Search engine use 2012. Pew Internet and
American Life Project. Retrieved from
http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012.aspx
Purtell, M. (2012). Reverse engineering human rating to predict the future of search.
Search Engine Journal. Retrieved from
http://www.searchenginejournal.com/predict-the-future-of-search/52930/
Seymour, T., Frantsvog, D., & Kumar, S. (2011). History of search engines.
International Journal of Management & Information Systems, 15(4), 47-58.
Taylor, A. & Joudry, D. (2009). The organization of information. Westport, CT : Libraries
Unlimited.
Vlachynsky, M. (2010). Principles and History of the Web Search Engines. Retrieved
from http://www.econoir.sk/web/stuff/Principles-History-Web-Search-Engines.pdf
Download