Document 14120698

advertisement
International Research Journal of Computer Science and Information Systems (IRJCSIS) Vol. 2(2) pp. 34-39, March, 2013
Available online http://www.interesjournals.org/IRJCSIS
Copyright © 2013 International Research Journals
Full Length Research Paper
Arabic users’ attitudes toward Web searching using
paraphrasing mechanisms
Abeer ALDayel1 and Mourad Ykhlef2
1
Department of Information Technology, CCIS, King Saud University Riyadh, Kingdom of Saudi Arabia
2
Department of Information System, CCIS, King Saud University Riyadh, Kingdom of Saudi Arabia
Accepted March 29, 2013
The main purpose of this article is to study the Arabic web users’ behaviors toward search engines and
the need for query paraphrasing as an enhanced tool for Arabic information retrieval. We need to know
more about how Arabic users search the Web and what information they need from search engines in
order to improve the effectiveness of Arabic information retrieval. Hundreds of online Arabic users
responded to an online survey. The users provided information on the type of search engines they used
(Arabic and/or Multilanguage), search frequency, and the factors affecting their usage of search
engines. The results show that when searching the Web to find information in Arabic, users tend to use
query paraphrasing while searching and that they mostly use Multilanguage search engines to find
information in Arabic instead of using Arabic-specific search engines. As far as we know, this study is
the first to investigate Arabic Web users’ searching needs and behaviours from a user perspective. We
believe that the findings from this study will provide some insight for further research regarding Arabic
Web searching.
Keywords: Arabic Web searching, information retrieval, query paraphrasing.
INTRODUCTION
The Arabic language is the religious language of more
than 1.5 billion Muslims worldwide. It is ranked 6th
among the top 10 languages on the Internet, including
English, Spanish, Japanese, German, French, Russian,
Korean, and Chinese (see Figure 1). Figure 2 shows that
about 17.5% of all Internet users speak Arabic(Internet
World Stats, 2011).However, while the number of Arabic
Web users is growing rapidly, Arabic language search
engines, which would target this growing audience, are
still very underdeveloped.
The information retrieval process is affected by the
language used and how a search engine handles the
characteristics of this language (Moukdad, 2004). The
Arabic language has around 5 million words that are
derived from approximately 11,300 roots, compared to
400,000 key words and a total of 1.3 million words in
English. Therefore, Arabic language has almost three
times more words than English because it has hundreds
of derivatives that can be taken from a single root
(Al-Maimani et al., 2011). Another distinct property of the
*Corresponding
Author
Email:
aabeer@ksu.edu.sa
Arabic language is polysemy, which means the words
have multiple meanings. This fact raises many
challenges in the retrieval process, as users may not
have an effective search experience or may quickly
become discouraged and not invest additional effort into
thinking of other key words that would help their search.
Another problem of using the Arabic language online is
that different regions within the Arab World use different
words to explain the same thing. This has caused
difficulty in finding and retrieving relevant documents on
the Web. For example, for the term “newspaper,” the
word “‫ ”صحيفة‬is used in some Arabic countries while the
word “"‫ جريدة‬is used in others (Al-Maimani et al., 2011).
Information retrieval researchers have realized that it
is hard for Web users to formulate effective search
requests. Also, most of the available search engines
often return search results that have no relation to the
user’s request, meaning that many of the results that
contain the words used in the query are not relevant. A
hypothesis emerged stating that “the search is likely to be
more accurate and precise if it is based on meanings
rather than on words" (Apresjan et al., 2009).
Paraphrasing is one way to improve the retrieval process
ALDayel and Ykhlef 35
and to move the querying to a new level, “the meaning
level”. Query paraphrasing for information retrieval is also
known as query reformulation and can be described as a
restatement of a text. This can be accomplished by
replacing words with their synonyms, using hyponyms, or
changing the order of the words.
In this article, we investigate the need for Arabic
search engines and users' overall perceptions of query
paraphrasing as a query refinement approach .A total of
463 survey responses were collected from an online
questionnaire, and the answers of the respondents
helped us to gain an understanding of how Arabic users’
experiences affect the individual use of search engines.
The remainder of this article is structured as follows:
First, we present some background information about the
Arabic language and Web search engines. Next, we
present the related research. Then, we outline the
methods and data used in this research. Next, we
discuss the results and the analysis strategy. Finally, we
summarize our study and suggest some future directions
for research.
Arabic language specifications
Figure 1. Top 10 languages on the Internet (Internet World Stats,
2011)
Figure 2. Internet use of Arabic speakers (Internet World Stats,
2011)
Certainly, the current search engines such as Google
or Yahoo offer an efficient way to browse Arabic web
content. However, the retrieval quality is highly reliant
upon the users developing potential query paraphrases
and searching the retrieved documents to determine
which are relevant.
Arabic is the fourth most widely spoken language
worldwide after Chinese, Spanish, and English. There are
more than 221 million speakers of Arabic from 57
different countries, which is about 3.2% of the world’s
population (Lewis et al., 2009). With the spread of Islam
at the beginning of 7th century, Arabic also spread rapidly
throughout the Middle East and North Africa, as it is the
language of Islam and occupies a sacrosanct place in the
religious psyche of Muslims. The Arabic script evolved
from the Nabataean–Aramaic model, and its vocabulary
is continuing to evolve through the integration of words
from various dialects and the creation of new words
necessary to express new concepts or objects (European
Commission, 2011).
Arabic is considered to be the largest member of the
Semitic language branch. The Semitic languages are
distinguished by their noncontact morphology, in which
word roots are not themselves syllables or words, but
instead are isolated sets of consonants. Words are
composed of roots by filling in the vowels between the
root consonants. For example, in Arabic, the root
meaning "eat" has the form A – K – L. From this root,
words are formed by filling in the vowels, e.g. AKLAH is
"food", yaAKLis "he eats," etc.(European Commission,
2011).
There are three forms of Arabic: classical, modern
standard and colloquial (spoken). The classical and
modern standard Arabic are both called al-fusha
(‫)الفصحى‬, which means “the most eloquent.”In preIslamic times in Arabia, classical Arabic was the most
prevalent form of the language. With the rise of Islam,
classical Arabic became the prominent language of
scholarship and religious devotion as it was the language
of the Qur'an. Modern standard Arabic is derived from
classical Arabic and is often used in conversations
between Arabic speakers of different dialects but not in
ordinary conversation. Most printed matter in Arabic
books and newspapers is written in modern standard
Arabic. Colloquial (spoken) Arabic refers to different
36 Int. Res. J. Comput. Sci. Inform. Syst.
dialects of Arabic that vary from country to country or
between regions within a country (Lewis et al., 2009).
Arabic consists of 28 letters and is written from right
to left. The written form is based on consonants (letters)
and vowel signs (diacritics) (over and underscores are
used with letters to indicate proper pronunciation
(Moukdad and Large, 2007). One of the most challenging
characteristics in Arabic is that some of conjunctions are
not separated from the following word by a space, which
results in a large number of entries being clustered
together alphabetically in index files. Another distinct
characteristic is that Arabic plurals are formed irregularly
through a complete reformulation of the word (Moukdad
and Large, 2007).
Web search engines
Information retrieval technology provides knowledge
management that guarantees access to large corpora of
unstructured text and is the basic technology behind Web
search engines (Mandl, 2008).The emergence of Web
search engines came about by the rapidly growing
number of Web sites in the mid-1990s. Alan Emtage
created Archie, the first tool for Internet searching. Archie
was based on a creating database of filenames for all
files located on public FTP sites. Mark McCahill, at the
University of Minnesota, then created Gopher, which was
based on indexing documents’ plain text instead of the
file name. In 1993, Matthew Gray developed the first real
search engine called Wandex. This program was the first
to crawl the Web, index files, and allow users to search
for them. Another one of the first search engines was
Aliweb, which was developed by the United Kingdom’s
Martijn Koster in 1993 and is still in use today (Inkpen,
2007).
By analyzing various points of Website entry, it can be
seen that search engines receive the most Internet traffic.
Figure 3 compares the type of visitor sources for
Websites: search engines, direct and referring site link.
Eleven percent of Internet users type a specific Website
address directly into their browser. Six percent of Internet
users access Websites via a link from a different site.
Eighty-three percent of Internet users use search engines
to search for a Website; this Percentage demonstrates
the necessity of search engines as main portal for the
Internet(OWL STAT, 2013b).
Figure 3. Analysis of various points of Website entry
Figure 4. Search engine usage statistics
In 2000, Google's search engine was dominating the
Internet search market. With an innovation called “page
ranking” and the concept of “link popularity,” Google was
able to provide better results compared with other search
engines. Based on Inktomi's search engine model, Yahoo
established a search engine in 2004 using the combined
technologies of its acquisitions. Microsoft had also
launched the MSN search engine in the fall of 1998 using
search results from Inktomi. In 2004, Microsoft started
using its own crawler called MSNBOT. In 2009,Bing was
launched by Microsoft (Kuyoro Shade, Okolie Samuel,
and Kanu Richmond, 2012). Figure 4 shows the market
share for the top search engines in use today. This
analysis shows that Google is the most utilized search
engine with 81.11% of market (OWL STAT, 2013a).
Related work
The need for analyzing the quality of the current search
engines is important and has been discussed in many
studies. These studies can by divided into three
categorize: analyzing the search engines’ log file,
examining Web users’ behaviours, and evaluating the
performance of the existing search engines. The most
popular approach is to analyze the log file of each search
engine. This approach falls under Web usage mining and
focuses on how users use the search engines on the
Web to satisfy their information needs. Several papers
have been published in this area, including the work of
(Pu, Chuang, and Yang, 2002) and (Chau, Fang, and
Yang, 2007).The study conducted by(Chau et al., 2007)
analyzed the search-query logs of a search engine that
focused on Chinese Web users.
Many researchers have investigated the searching
behaviour of Web users. A number of studies have
looked at English search engines, such as the work of
(Spink, Bateman, and Jansen, 1999), who investigated
Web users’ searching behaviours on the EXCITE Web
search engine. They focused on successive searching
behaviour as a user conducting related searches over
time on the same or evolving topic. They using an
interactive survey
accessed
through
EXCITE’s
homepage. In addition, (Liaw and Huang, 2003) deve-
ALDayel and Ykhlef 37
loped an individual attitude model toward search engines
and examined its effectiveness via a questionnaire
distributed to 120 search engine users. They found that
computer experience and quality of the search systems
can predict individual motivation and technology
acceptance toward search engines. In 2006, these same
researchers’ used a questionnaire with 161 responses to
study individual information retrieval behaviours and their
evaluations on the usability and effectiveness of search
engines.
Another line of investigation has been the nonEnglish searching behaviour of Web users.(Zhang and
Lin, 2007)and (Bar-Ilan and Gutman, 2003) investigated
the multiple language support features of specific Internet
search engines. (Bar-Ilan and Gutman, 2003) examined
four languages: Russian, French, Hungarian, and
Hebrew. For each language, a set of search terms were
run on three general search engines (AltaVista, FAST,
and Google) as well as some local search engines.
In addition, a few studies have investigated the
searching behaviour of Arabic Web users. (Moukdad,
2004) compared the performance of three general Arabic
search engines based on their ability to retrieve
morphologically related Arabic terms. (Tawileh et al.,
2010) evaluated the performance of Arabic information
retrieval by using 50 randomly Arabic queries to test 5
Web search engines: Araby, Ayna, Google, MSN, and
Yahoo.
Most of the studies to date have focused on English
search engines. However, the information needs and
search behaviours of Arabic users can be very different
from those of English users because of the different
natures of the languages and cultural differences. To
address this gap in the research, we present a study of
overall Arabic Web user behaviour toward search
engines.
RESEARCH DESIGN
The online questionnaire was written in Arabic and
directed at Arabic online users. The questionnaire was
composed of three components: search engine usability,
search engine effectiveness, and the needs for query
paraphrasing in Arabic information retrieval. It included
demographic information such as gender, age, and
educational status. To measure how satisfied Arabic
users are with current search engines, the users were
also asked about the type of search engine they usually
used (i.e. specific Arabic or Multilanguage) and the
frequency that they used search engines to search for
Arabic content. The questionnaire was designed to
collect data related to possible factors affecting the usage
of search engines. In general, search engines should
satisfy the following measurements to create a powerful
information retrieval system: precision and recall.
Precision can be defined as the fraction of retrieved
documents that are relevant to a user’s query, and recall
is the fraction of the relevant documents that are
successfully retrieved. Prior researchers have offered
evidence that these measurements are important factors
for determining the efficiency of a search engine’s
retrieval capabilities.
The questionnaire also included two multiple choice
questions to examine the third component (the needs for
query paraphrasing in Arabic information retrieval). The
first question focused on discovering the user’s attitude if
(s)he did not obtain the search result (s)he was looking
for. The second question was to discover the user’s
paraphrasing capability if (s)he wanted to conduct a
search using the term “‫“عدد سكان منطقة الرياض‬, meaning the
“number of people in Riyadh area”.
Data collection
To calculate the required sample size for the survey, we
used the approach proposed by(Cochran, 1977). The
sample size n and margin of error E are given by
Equations 1, 2, and 3, respectively:
X = Z(
c 2
) r (100 − r )
100
(1)
n =
N × X
( N − 1) E 2 + X
(2)
E =
(N − n)X
n ( N − 1)
,
(3)
Where N is the population size and r is the fraction of
responses. The Z is the standard normal distribution. We
set the fraction of responses to the most conservative
assumption which is 50%.For Equation 1, Z(c/100) is the
critical value for the confidence level c. The goal of the
questionnaire was to measure the Arabic users’
perceptions toward search engines; in order to fulfill this
goal, we used Equations 1 and 2 with a population size
equal to the number of Arabic online users, which is
347,002,991 (Internet World Stats, 2011).The confidence
level was equal to 95%,and the Z was equal to 1.96 for
95% confidence level. The margin of error was 4.56; the
recommended sample size was 463. According to this
sample, there is a 95% chance that we are within the
margin of error of the correct answer.
RESULTS
All data were analyzed using the Statistical Package for
Social Sciences (SPSS, 16.0).Overall, 463 Arabic online
users participated in the study;31% of the participants
were between the ages of 26 and 36,as shown in Figure
5. The users were 23% female and 77% male.
Table 1 presents a detailed summary of the
questionnaire responses for the first component (search
engine usability) represented by frequency and
percentage. From the statistical analyses, the results
show that 73% of the Arabic Web users used search
38 Int. Res. J. Comput. Sci. Inform. Syst.
engines on a daily basis. This indicates the importance of
search engines as an information retrieval tool. Also,
most Arabic Web users were not very satisfied with the
Arabic search engines; 88% of users tended to use a
Multilanguage search engines to search for Arabic
content, only 12% of the users used Arabic-specific
search engines. In addition, 55% of users mainly used
the search engines to search for Arabic content, while
40% were looking for combined Arabic and English Web
content. As an evaluation of the current search engines’
support for Arabic language, 47% of users found that it
was good; while59% stated that they only sometimes
found what they were looking for using search engines to
search for Arabic contents.
Figure 6 represents the Arabic user’s behavior when
(s) he received no relevant search results. Query
paraphrasing was the common strategy (it was used by
84% of Arabic Web users). The questionnaire asked the
user to determine possible query paraphrasing for the
original query of “‫“عدد سكان منطقة الرياض‬. Only 12% of the
users were able to come up with possible options for
query paraphrasing for the original query (Figure 7).
From the Arabic Web user’s point of view, the most
important factors of a search engine usage are shown in
Figure 8. The factors that the Arabic Web users gave are
listed in the following order of priority:
(1) Large number of retrieved documents
(2) Precision of results
(3) Fame and reputation of search engine name
(4) Availability of search support tools (e.g., correction,
auto prediction, etc.)
Figure 5. Age structure of questionnaire respondents
Table 1. 2Summary of questionnaire results
Search engine usage rate
• Daily
• Every 2 or 3 days
• Sometimes
• Never
Type of search engine used to
search for Arabic content
• Arabic search engine
• Multilanguage search engine
Searching for Arabic content
frequency rate
• Always
• Sometimes
• Never
Type of content Arabic users
mostly use search engine to
search for
• Arabic language
• English language
• Arabic and English languages
• Multiple languages
The results of search engine
match for what the user was
looking for
• Always
• Sometimes
• Never
• Don’t know
Evaluation of the services that
the search engines offered to
support the Arabic content
search
• Excellent
• Good
• Acceptable
• Weak
Response
count
339
77
47
0
Response
count
56
407
Response
count
352
104
7
Response
count
Percentage
255
11
185
12
Response
count
55%
2%
40%
3%
Percentage
180
271
7
5
Response
count
39%
59%
2%
1%
Percentage
182
216
58
7
39%
47%
13%
2%
73%
17%
10%
0%
Percentage
12%
88%
Percentage
76%
22%
2%
Percentage
Figure 6. User behavior when (s)he does not retrieve suitable
results
Figure 7. Queries paraphrasing for original query ( ‫عدد سكان منطقة‬
‫)الرياض‬
ALDayel and Ykhlef 39
Figure 8. Factors affecting the usage of a search engine
CONCLUSION
This article has shed some light on Arabic Web searching
from the user’s perspective. A total of 463 Arabic user
responses were collected by an online questionnaire.
Based on the questionnaire results, the users tend to use
query paraphrasing during the search session. The task
of conducting paraphrasing queries is still complicated for
the majority of non-expert users, who have difficulty
expressing their needs through an accurate query. This
gives insight into the need for automatic query
paraphrasing in search engines. It is clear that the user
sees a large number of retrieved documents as an
important factor for search engine usage. Also, this article
indicates that Arabic users rarely use Arabic-specific
search engines to retrieve Arabic content; rather, they
use multilanguagesearch engines. This result illustrates
the need for improvement for the Arabic search engines.
In the future, we plan to build a query paraphrasing
technique that will improve information retrieval and
search capabilities of Arabic Web documents.
ACKNOWLEDGMENTS
This work was supported by the Research Center of the
College of Computer and Information Sciences at King
Saud University. The authors are grateful for this support.
REFERENCES
Al-Maimani MR, Naamany AA, Bakar AZA (2011). Arabic Information
Retrieval: Techniques, tools and challenges. In GCC Conference and
Exhibition (GCC), 2011 IEEE (pp. 541-544).
Apresjan J, Boguslavsky I, Iomdin L, Cinman L, Timoshenko S (2009).
Semantic Paraphrasing for Information Retrieval and Extraction.
Flexible Query Answering Systems, 5822/2009, 512-523.
Bar-Ilan J, Gutman T (2003). How do search engines handle nonEnglish queries?-A case study. In (pp. 415-424).
Chau, M., Fang, X., and Yang, C. C. (2007). Web searching in Chinese:
A study of a search engine in Hong Kong. JASIST, 58, 1044-1054.
Cochran WG (1977). Sampling Techniques. John Wiley and Sons.
European Commission (2011). Lingua Franca: Chimera or Reality.
Studies on Translation and Multilingualism.
Inkpen D (8-12-2007). Information Retrieval on the Internet.
http://www.si-te.uottawa.ca/-diana/csi4107L 1.
Internet World Stats (2011). Internet World Stats. Miniwatts Marketing
Group. http://internetworldstats.com/
Kuyoro Shade O, Okolie Samuel O, Kanu Richmond U (2012). Trends
in Web-Based Search Engine. CIS Journal, 3, 942-948.
Lewis MP, Grimes BF, Simons GF, Huttar G (2009). Ethnologue:
Languages of the world. (vols. 9) SIL international Dallas, TX.
Liaw SS, Huang HM (2003). An investigation of user attitudes toward
search engines as an information retrieval tool. CHB, 19, 751-765.
Mandl T (2008). Recent developments in the evaluation of information
retrieval systems: Moving towards diversity and practical relevance.
INFORMATICA-LJUBLJANA-, 32, 27.
Moukdad H (2004). Lost in cyberspace: How do search engines handle
Arabic queries. In the 32nd Annual Conference of the Canadian
Association for Information Science (pp. 1-7).
Moukdad H, Large A (2007). Information Retrieval from Full-Text Arabic
Databases: Can Search Engines Designed for English Do the Job?
Libri, 51, 63-74.
OWL STAT (2013a). Analysis of various points of web site entry.
Retrieved
1-2-2013a,
from
http://www.statowl.com/network_visitor_source.php?1=1andtimefram
e=last_6andinterval=monthandchart_id=4andfltr_br=andfltr_os=andflt
r_se=andfltr_cn=andtimeframe=last_12
OWL STAT (2013b). Search Engine Usage Statistics. Retrieved 2-22013b,
from
http://www.statowl.com/search_engine_market_share.php?1=1andti
meframe=last_6andinterval=monthandchart_id=4andfltr_br=andfltr_o
s=andfltr_se=andfltr_cn=andtimeframe=last_12
Pu HT, Chuang SL, Yang C (2002). Subject categorization of query
terms for exploring Web users' search interests. JASIST, 53, 617630.
Spink A, Bateman J, Jansen BJ (1999). Searching the Web: A survey of
Excite users. Internet research, 9, 117-128.
Tawileh W, Mandl T, Griesbaum J (2010). Evaluation of Five Web
Search Engines in Arabic Language. In Proceedings of LWA (pp.
221-228).
Zhang J, Lin S (2007). Multiple language supports in search engines.
Online Information Review, 31, 516-532.
Download