Online Spanish-English Dictionaries Websites

advertisement
Online Spanish-English Dictionaries:
A Comparative Usability Study
By Elena Winzeler
Dr. Min Liu
Fall 2013 Designs and Strategies for New Media
Table of Contents
INTRODUCTION
1.1 Purpose and Research Questions
1.2 Target Audience
1.3 Site Selection and Descriptions
Wordreference.com
SpanishDict.com
CollinsDictionary.com
5
5
5
6
7
8
9
METHODOLOGY
2.1 Participants
2.2 Tasks
2.3 Instruments
2.4 Testing Procedures
RESULTS
3.1 Pretest
3.2 Site Analysis
11
11
11
14
19
20
20
21
WordReference
SpanishDict
Collins
21
27
30
3.3 Comparative Analysis
34
Completion Rate
Accuracy
Number of Searches
Completion Time
Site Characteristics
Perceived Value for Accomplishing Tasks
Overall Satisfaction
DISCUSSION
4.1 Website Performance and User Perceptions
4.2 Interaction, Information, and Interface
4.3 Recommendations
Searching
Results Pages
Navigation
35
35
36
36
37
38
39
40
40
40
41
41
41
42
43
43
44
45
65
SUMMARY
URLs FOR SITES TESTED
REFERENCES
APPENDIX
AUTHOR INFORMATION
2
List of Images and Tables
Image 1
Image 2
Image 3
Table 1
Table 2
Table 3
Table 4
Table 5
Table 6
Table 7
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Table 8
Table 9
Image 12
Image 13
Image 14
Table 10
Table 11
Image 15
Image 16
Image 17
Image 18
Image 19
Table 12
Table 13
Table 14
Table 15
Table 16
Table 17
Table 18
Table 19
WordReference homepage
SpanishDict homepage
Collins homepage
Summary of traffic information for the three sites tested
Website Traffic Patterns
Research questions matched to data collection methods
Frequency of use and preference for online bilingual tools
Perceived value of online bilingual dictionaries
Summary of participant performance on Tasks 1 & 2 for WordReference
Summary of participant performance on Tasks 3, 4 & 5 for WordReference
Language auto-detect and word auto-suggest features on WordReference
Multi-word search queries on WordReference
Text-rich results page on WordReference
Spanish-heavy results page on WordReference
Getting “stuck” in the monolingual Spanish dictionary of WordReference
Misleading visual cues on the results page of WordReference
Low visibility for the conjugation tool on WordReference
Finding compound forms on WordReference
Summary of participant performance on Tasks 1 & 2 for SpanishDict
Summary of participant performance on Tasks 3, 4 & 5 for SpanishDict
Side-by-side comparison of dictionary and translator on SpanishDict
Results pages for a verb in inflected and infinitive forms
Two ways to access the phrasebook and forum on SpanishDict
Summary of participant performance on Tasks 1 & 2 for Collins
Summary of participant performance on Tasks 3, 4 & 5 for Collins
No language auto-detect on Collins dictionary
No language auto-detect on Collins translator
“Related terms” and “Browse nearby words” features on Collins
Low visibility for the conjugation tool on Collins
Distracting advertisement on Collins
Completion rate
Accuracy rate
Average number of searches per item
Completion time
Ratings on site characteristics related to the “three I’s”
Perceived value of site for specific tasks
Percent change in perceived value from pretest to posttest
Overall satisfaction
3
7
8
9
10
12
14
20
20
21
22
22
23
24
24
25
26
26
27
27
28
29
29
30
31
31
32
32
33
34
34
35
36
26
37
37
38
39
39
Appendix Guide
A Task List (Version 1)
B Task List (Version 2)
C Task List (Version 3-Final)
D Sample Page of Observation Record Form
E Background Questionnaire
F Pretest Questionnaire
G Posttest Questionnaire
H Orientation Script
I Time Schedule for Testing Event
4
45
49
51
53
55
57
60
63
64
Online Spanish-English Dictionaries:
A Comparative Usability Study
INTRODUCTION
With today’s technology, it is becoming more and more common for language learners to
consult Internet resources before they would a textbook or other physical book. However, not
all websites are equal in the quality of information they provide to language learners.
Specifically, online machine translators (MTs) can often provide incomplete or misleading
information, especially in the cases of words with multiple meanings, verb conjugations,
dialectal varieties, and idioms. Online bilingual dictionaries can provide more complete and
accurate information to the language learner, but can be more cumbersome to use.
1.1
Purpose and Research Questions
The purpose of this study is to examine three of the highest quality online dictionaries for
Spanish-English translation to discover the degree to which novice language learners can take
advantage of the information provided by these sites. This study was guided by research
questions on two levels: the site level and the comparative level (across all websites). The
relationship between the answers to these questions and the ensuing implications will be
explored in the Discussion section, along with best practice recommendations.
Site Level:
1. How easy was it to use the site to complete the tasks?
2. What were the most helpful and obstructive features that participants encounter when
using the site?
3. What were participants’ reactions to the site?
Comparative Level:
4. Which website produced superior performance results?
5. Which website was viewed most favorably by participants?
1.2
Target Audience
Those for whom this study might be of interest fall into three general categories. First, the
creators and developers of online bilingual dictionaries might find the data, analysis, and
recommendations presented to be of use in improving their own websites. Second, language
learners could use the information presented to inform their selection of an online bilingual
5
dictionaries to best suit their needs. Similarly, language teachers could glean from this report an
understanding of which website would be best to recommend to their students for a particular
task.
1.3
Site Selection and Descriptions
Site selection was guided by the desire to include websites of the highest information quality, in
order that the findings would be of maximum interest to the groups described above. It was
considered a priority to hold information quantity and quality as near-constants, so that the
effects of information architecture, interaction, and interface could be closely examined.
An original list of three sites was compiled based on the results of the 2013 Reader’s Choice
Awards on About.com. Online readers were invited to nominate their preferred online SpanishEnglish dictionary. WordReference.com (hereafter “WordReference”), SpanishDict.com
(hereafter “SpanishDict”), and the American Heritage Spanish Dictionary at Yahoo.com
(hereafter “Yahoo”) received the most nominations.
An initial task list was developed with the dual purpose to test both the basic and more
supplemental functionalities of the online dictionaries. In order to facilitate a true comparison,
the decision was made to standardize tasks across websites. At this point, it was discovered
that Yahoo lacked some of the more advanced features of the other two sites, including
dialectal information and verb conjugation tables. Therefore, Yahoo was removed from the list
of sites based on its dearth in information compared to WordReference and SpanishDict.
A replacement, Collinsdictionary.com (hereafter “Collins”), was found by using an online search
engine. Collins was then tested with the initial task list to ensure it provided information
comparable to that found in WordReference and SpanishDict. The information pertaining to the
task list was found to be as complete and accurate as that provided by the other two websites,
and so Collins was retained as the final website for review.
Below follows a description of the three websites tested in this study:
1. WordReference (http://www.wordreference.com/)
2. SpanishDict (http://www.spanishdict.com/)
3. Collins (http://www.collinsdictionary.com/)
6
WordReference
WordReference was begun by Michael Kellogg in 1999 with the purpose to “provide free online
bilingual dictionaries and tools to the world”.1 WordReference is a dictionary site only, and does
not provide a translator feature. It does, however, contain a very active language forum. Over
time, the site has grown from its original six language pairs of English-French, English-Italian,
English-Spanish, French-Spanish, Spanish-Portuguese, and English-Portuguese, to fifteen
language pairs including languages from Eastern Europe, the Middle East, and Asia. The site
won the 2013 Reader’s Choice Award for best Spanish-English dictionary, receiving 50% of the
votes.2 According to Alexa.com, WordReference is ranked as the 225th most visited site
(calculation based on average daily visitors and average pageviews) in the world. Its US rank is
786.
Image 1: The WordReference home page, set to Spanish-English dictionary search.
1
http://www.wordreference.com/english/AboutUs.aspx
http://spanish.about.com/gi/pages/poll.htm?poll_id=1074077045&linkback=http://spanish.about.com/b/2013/0
2/19/what-is-the-best-online-spanish-english-dictionary.htm&rc=1
2
7
SpanishDict
SpanishDict came in a close second to WordReference in the 2013 About.com Reader’s Choice
Awards, with 42% of votes cast. As can be gleaned from its name, SpanishDict devotes itself
entirely to English-Spanish translation. In contrast to WordReference, the site includes both a
bilingual dictionary and a translator integrated into a single search, along with video tutorials,
games, flashcards, and other tools of use to Spanish language learners. The site boasts over six
million visitors per month.3 According to Alexa.com, it’s US ranking among all websites visited is
1,567, whereas globally it ranks 3,941. For a side-by-side comparison of all three sites, please
refer to Table 1.
Image 2: SpanishDict homepage.
3
http://www.spanishdict.com/company/about
8
Collins
The Collins site, including its Spanish-English dictionary, launched on December 31, 2011.4
Collins contains both a Spanish-English dictionary and a translator tool. Unlike the other two
sites reviewed, Collins owes its reputation to its respected paper dictionary, published by
HarperCollins. In fact, WordReference uses the Collins Spanish-English dictionary as one of its
sources. It is worth noting that the Collins site is not nearly as popular as the other two in this
report, with a US ranking of 10,798th among all internet sites, according to Alexa.com. For more
traffic information on Collins, please refer to Table 1.
Image 3: Homepage of Collins, set to Spanish-English dictionary search.
4
http://www.wired.co.uk/news/archive/2012-01/03/collins-dictionary-online
9
Table 1 Summary of traffic information for the three sites tested in this study.5
WordReference
Data Type
Global Rank
226
Rankings
US Rank
784
Bounce Rate
41.80%
Traffic patterns Daily Page Views per Visitor
4.03
Daily Time on Site
5 min, 7 sec
Gender
Female
SpanishDict
Collins
3,957
10,804
1,583
10,811
43.20%
55.90%
3.92
2.75
5 min
3 min, 34 sec
Female
Female
Some college
Demographics* Education
Grad school
Grad school
or college
Access Location
Work
School
Work
Absolute
1.365 sec
0.961 sec
2.339 sec
Load Time
Fast, 60% of
Fast, 76% of
Slow, 67% of
Relative
sites are slower sites are slower sites are faster
*Indicates which demographic groups are overrepresented among site visitors, when compared
to the internet population in general.
Data Category
5
Information gathered from www.alexa.com
10
METHODOLOGY
2.1
Participants
Participants were nine graduate students studying educational technology at a large research
university in the southwest. They ranged from 23-32 years of age and six of the nine were
female. It should be observed that the educational background and gender ratio of the
participant group fit well with the demographic patterns of the three sites, as depicted in Table
1.
Participants were asked about the frequency with which they used the internet and in what
ways. Regarding the former, responses by frequency were “more than 8 hours” (n=2), “4-7
hours” (n=3), and “2-4 hours” (n=4), thus indicating frequent internet use as a common
characteristic. Regarding the latter, participants as a group reported between 3 and 6 different
types of internet use each, with an average of 4.78 (see Background Questionnaire in Appendix
E for a list of possible responses).
Participants also reported on their experience with language learning in general and Spanish in
particular. All participants reported at least one year of foreign language instruction.
Participants also reported moderate familiarity with Spanish, but generally low skill level. The
most common responses regarding exposure were “up to 1 year” (n=3) and “1-3 years” (n=3) of
Spanish instruction, but no participant reported conversational ability in Spanish, and only four
participants reported the ability to form simple sentences in Spanish.
2.2
Tasks
Tasks were created that would fit the following criteria:
1) Authentic to the Spanish language learning context
2) Accomplishable by learners with little to no background in Spanish
3) Accomplishable within the chosen websites without using outside resources or prior
knowledge
Additionally, care was taken to provide a balance between the most basic dictionary capabilities
– finding the meaning of individual words – and the most advanced features provided by all
three sites.
Omission of an Exploratory Task
In usability testing, it is quite common for the initial task to be exploratory in nature, with
participants given free rein to peruse the site, discover its features, and provide commentary on
first impressions (Nielsen et al. 2000, Liu et al. 2008, Stevensen & Liu 2010). However, the
decision was made to omit an initial exploratory task. As illustrated in Table 1 (the relevant
portion is reproduced below in Table 2), visitors on average spend only a short amount of time
on these sites (mean=4 min, 34 sec). Likewise, they spend little effort exploring the site, as
evidenced by a low number of page views per visitor (mean=3.567) and a high bounce rate
11
(defined as the percentage of visits to the site that consist of a single page view,
mean=46.97%). Thus, in an authentic usage scenario, it is unlikely that a user would spend
much time familiarizing himself with the site before attempting to achieve his goal.
Table 2: Website Traffic Patterns
WordReference
Data Type
Bounce Rate
41.8%
Traffic patterns Daily Page Views per Visitor
4.03
Daily Time on Site
5 min, 7 sec
Data Category
SpanishDict
43.2%
3.92
5 min
Collins
55.9%
2.75
3 min, 34 sec
Pilot Testing
The initial task list was tested in a pilot study, resulting in two significant changes. First, the
number of items per task was decreased to a more manageable number. Second, the final,
open-ended task was removed because it was found to be beyond the skill of a novice language
learner. For a reproduction of the initial task list, please see Appendix A.
A new open-ended task was developed, and the entire revised task (See Appendix B) list was
tested in a second pilot study. As a result of the second pilot, time limits for each task were
determined. Again, the open-ended task was found to require knowledge of Spanish beyond
that which could be expected of the novice learner, and this task was removed, resulting in a
finalized list of five tasks, reproduced below (for the full version that includes the content of
individual items, please refer to Appendix C).
Final Task List
Task 1: Spanish-English translation
Directions: Find an appropriate English translation of a Spanish word, given the context
of a sentence.
Sample item:
1. Soy alérgico a las plumas en esta almohada.
a.
I am allergic to the ______________ in this pillow.
Task 2: English-Spanish translation.
Directions: Find an appropriate Spanish translation of an English word, given the context
of a sentence.
Sample item:
1. The soccer game ended in a tie.
a.
El partido de fútbol terminó en un _______________________.
12
Task 3: Verb conjugation
Directions: You are writing a sentence in Spanish but you need to look up a specific verb
and how to conjugate it.
Sample item:
1. Henry gave me a surprise birthday present.
a.
Verb (él/preterit): ____________________
Task 4: Dialects
Directions: You have a pen pal in Mexico. Find out how he would say the following
words in his dialect.
Sample item:
1. Trunk (of a car): ________________
Task 5: Idioms
Directions: You are writing a sentence in Spanish and need to find a culturallyappropriate word or phrase to replace the following English idioms.
Sample item:
1. That test was a piece of cake.
a.
Ese examen _____________________________________________.
13
2.3
Instruments
Table 3 provides an overview of the research questions investigated and their corresponding
data sources. Each data collection method will be described in detail below.
Table 3: Research Questions Matched to Data Collection Methods
#
Analysis
Level
1
Site
2
Site
3
Site
4
Comparative
5
Comparative
Research Question
How easy is it to
use the site to
complete the
tasks?
What are the
most helpful and
obstructive
features that
participants
encounter when
using the site?
What are
participants'
reactions to the
site?
Which website
produced
superior
performance
results?
Which website is
viewed most
favorably by
participants?
Completion
rate
Accuracy
rate
Searches
per item
Completion
time
X
X
X
X
X
X
X
Observations
Participant
Comments
Post-task
usefulness
rating
X
X
X
X
X
X
X
Post-test
satisfaction
ratings
X
X
X
Observation Record Form
A form was created to facilitate the process of taking notes during observation of testing. For
each task, this form included spaces to record the following (see Appendix D for a sample
page):
1. Prompts used
2. Completion time
3. Accuracy per item
a. Accuracy was determined based on the participants’ ability to interpret the site
without the use of prior knowledge of Spanish. As such, certain morphological
14
X
mistakes regarding the number (singular vs. plural) of a noun or conjugation of a
verb (except in Task 3) were accepted as correct. Full (2 points), partial (1 point),
and no credit were given as follows:
i. Tasks 1 & 2
1. Full credit: correct meaning for the context
2. Partial credit: a correct translation, but incorrect meaning for the
context
3. No credit: did not complete or answer is not a translation of the
target.
ii. Task 3
1. Full credit: correct meaning for the context and correct tense
2. Partial credit: correct tense + incorrect meaning OR incorrect
tense + correct meaning
3. No credit: did not complete or incorrect tense + incorrect
meaning
iii. Task 4
1. Full credit: correct word in the specified dialect
2. Partial credit: a correct translation, but not the dialectal form
specified
3. No credit: did not complete or incorrect translation
iv. Task 5 (no partial credit given)
1. Full credit: an appropriate Spanish word or phrase
2. No credit: did not complete or inappropriate word or phrase
(literal translation or different meaning)
4. Observations and comments made by participants
a. All search queries made were recorded in this section.
5. Participants post-task rating of the usefulness of the site
a. Immediately after completing a task, participants orally responded to the
question, “From a scale of 1 (useless) to 5 (very useful), how would you rate the
usefulness of this site in performing this task.”
Questionnaires
Three questionnaires were created to test the participants’ background knowledge, skills, and
habits; pretest attitudes; and posttest attitudes and impressions. Much of these questionnaires
was modeled after those presented in Liu et al. (2008).
Background
The background questionnaire aimed to establish an understanding of participants’ background
in three key areas (sample questions follow, but for a full reproduction of the Background
Questionnaire, see Appendix E):
15
1. Internet use
2. Multilingualism
3. Spanish
For the first, it was reasoned that an inexperience with the Internet would negatively affect
testing results. The pretest screener served to effectively screen out participants without the
requisite experience and skill using the Internet for general purposes.
For the second, it was hypothesized that language learning experience might influence the
results of the usability test. Rather than try to control for this variable, it was decided to
measure participants’ experience with languages other than their native languages in order to
investigate its relationship, if any, with usability testing outcomes.
16
Finally, the tasks were created for participants with little to know Spanish ability.
Conversational ability in Spanish would have excluded potential participants from the study.
Pretest
In the pretest questionnaire, participants self-reported their language translation habits in
terms of frequency and preferences. In addition, they reported their attitudes on how useful
they perceived an online multilingual dictionary to be for specific tasks related to the task list
used in this study. Sample questions from each group are reproduced below, but the full
pretest questionnaire can be found in Appendix F.
17
Posttest
The posttest questionnaire served to measure participants’ attitudes and impressions regarding
specific characteristics and functionalities of the site. Participants rated their degree of
satisfaction with attributes related to the information, interaction, and interface of the site. The
pretest question regarding their perceptions of usefulness for specific tasks was reproduced
verbatim in order to see how participants’ impressions of their particular site related to their
expectations of online bilingual dictionary sites in general. Participants were given an openended forum to describe the best and worst features of the site. Finally, they rated the
likelihood that they would return to the site when presented with a task similar to those
completed during the usability testing. Samples of three different question formats are shown
below. The entire posttest is provided in Appendix G.
2.4
Testing Procedures
Participants were contacted about participating in this study via email. They were informed of
the general topic of the study and responded to a short survey about their interest in
18
participating, availability, and experience with Spanish. At this time, any potential participant
who reported conversational ability or fluency with Spanish was excluded from the study.
Participants who were retained met with the investigator for one hour to complete the usability
test. After introductions, the participant completed the background and pretest questionnaires.
The orientation script was read to the participants, and they gave their verbal permission for
the session to be recorded. The script described the general goal of the study, the role of the
moderator, their role as participants, and the procedures to be followed for the rest of the
session. Participants were asked to think aloud during the test, and behave as they normally
would under non-testing conditions. The orientation script was modified from a model
provided by Rubin & Chisnell (2008) and can be found in Appendix H.
Participants were randomly assigned a website to review, and asked if they had ever used the
site before. If they had, they would have been reassigned to a different website, but this did not
occur during testing.
During testing, the moderator read each question out loud. Participants had access to the full
task list (Appendix C), and used this document to record their answers to each item. The
moderator kept time on a per-item basis and recorded observations. There were two general
instances in which the moderator would interject with a prompt:
1. The participant had abandoned the think-aloud technique.
2. The participant had completed all of the items for a particular task with time to spare,
but had achieved below 50% accuracy.
In the first case, the moderator used discretion to not harass the participants with repeated
requests to think aloud or interrupt them if they were deep in concentration.
In the second case, this intervention was deemed necessary in order to stimulate the level of
motivation that an actual user might feel. As discussed in Nielson et al. (2000), the usability
testing situation is inherently artificial. No matter the effort put forth in creating authentic
tasks, the tester will never be as motivated to complete them as they would real tasks they
came across in their daily lives. To add to this general weakness of usability studies, this
particular study involved participants who were not active learners of Spanish. Therefore their
motivation was considered a cause for concern. The second prompt scenario described served
to modify participants’ behavior to hopefully bring it into closer alignment with how actual,
motivated learners would use the site. Hence, when they had completed a task with no attempt
at quality (a behavior distinct from that of motivated users), they were prompted to revisit the
task with more thoroughness (a behavior congruent with that of motivated learners). The
actual prompt used was, “If you were going to check your work, how would you do that?”
After completing the tasks, the participant completed the posttest questionnaire. They were
asked to think out loud as they completed the questionnaire to elaborate on their responses.
The moderator took notes on their elaborations and asked clarifying and follow-up questions
when necessary. In this way the posttest and debriefing were combined into a seamless
experience. The time schedule for the entire testing event can be found in Appendix I.
19
RESULTS
All participant data can be provided upon request. See the Author Information section for
contact details.
3.1
Pretest
Participants reported more frequent uses of language translators (m=2.56) than language
dictionaries (m=2.33). Likewise, responses showed a clear preference for online translators
(m=2.44) over multilingual dictionaries (m=2.17).
Table 4: Frequency of use and preference for online translators vs. online multilingual
dictionaries.
Frequency
Preference
WordReference SpanishDict Collins
Average
2.33
2.67
2.67
2.56
2.00
2.67
2.33
2.33
2.33
2.67
2.34
2.45
2.17
2.50
1.83
2.17
Translators
Dictionaries
Translators
Dictionaries
In keeping with this trend, they perceived online multilingual dictionaries to have more value in
common tasks such as translating a single word (m=4.67), differentiating between multiple
meanings of a word (m=4.44), and conjugating verbs (m=4.56) than in less common tasks
including finding a term belonging to a particular dialect (m=4.11), translating phrases or idioms
(m=4.33), and clarifying a confusing usage or grammar point (m=3.78).
Table 5: Perceived value of online bilingual dictionaries for specific tasks.
Translating a single word
WordReference SpanishDict Collins Average
4.67
4.67
4.67
4.67
Differentiating between
multiple meanings
Conjugating verbs
4.67
4.33
4.00
4.67
4.67
4.67
4.45
4.56
Finding a term belonging
to a dialect
Translating idioms
3.33
4.00
4.00
4.67
5.00
4.33
4.11
4.33
Clarifying usage and
grammar
3.67
3.67
4.00
3.78
20
3.2
Site Analysis
This section will analyze the three websites one at a time from the perspective of the five tasks
explored during usability testing in order to answer the first three research questions.
1. How easy is it to use the site to complete the tasks?
2. What are the most helpful and obstructive features that participants encounter when
using the site?
3. What are participants’ reactions to the site?
In the first section, participant performance on the basic functionality of online bilingual
dictionaries – translating a single word from Spanish to English (Task 1) and English to Spanish
(Task 2) – is summarized. The section summarizes performance on the more advanced
functionality of these sites – conjugating verbs (Task 3), finding dialectal variations (Task 4), and
translating idioms (Task 5). The third section details the most helpful and obstructive features
of the site in performing these tasks, concentrating on the three main elements common to all
five – searching, interpreting the results page, and navigating the site.
WordReference
Basic Functionality
Using WordReference to translate a word from Spanish to English (Task 1) and English to
Spanish (Task 2), the basic functionality of a bilingual dictionary, proved to be a relatively easy
task for the participants. Data from these tasks are summarized in Table 6, below. All
participants completed all items in these tasks within the time allotted. The accuracy rates for
Tasks 1 and 2 were 88.33% and 79.17%, respectively, while average usefulness ratings were
2.67 and 3.33. The fact that usefulness ratings were inversely related to accuracy may seem
surprising, but can possibly be explained by other factors. Specifically, while accuracy on Task 1
was higher than on Task 2, Task 1 took longer on average to complete (m=79.10% of total time
vs. m=64.31%) and required more average searches per item (m=1.92 vs. m=1.25). That
usefulness ratings bear direct relationships with searches per item and completion time, but
little to no relationship with accuracy is a trend in our data that illustrates that participants
have little awareness of the accuracy of their responses.
Table 6: Summary of participant performance on Tasks 1 & 2 for WordReference.
Task
1
2
Num. of
Items
4
4
Avg. Num. of
Items Completed
4.00
4.00
Avg. Searches
per Item
1.92
1.25
Completion Time
(percent of max.)
79.10%
64.31%
Accuracy
Rate
88.33%
79.17%
Usefulness
Rating
2.67
3.33
Usefulness was measured on a 5-point Likert scale from useless (1) to very useful (5).
Advanced Functionality
Tasks 3, 4, and 5, which involved conjugating verbs, finding dialectal variations, and translating
idioms, were a greater challenge to the participants using WordReference (see Table 7 below).
21
Finding dialectal variations of words (Task 4) was the easiest task, averaging only 1.11 searches
per item. On average, participants completed this task with 52.32% of the allotted time with
72.22% accuracy. In accordance with these other indicators of task ease, the average usefulness
rating given was 4.00, the highest rating of all tasks for this website. Conjugating verbs and
translating idioms proved much more difficult for the participants. Compared to Task 4, Tasks 3
and 5 required more searches (m=1.56 and m=1.94) and more time to complete (m=82.83%
and m=89.44%). Not surprisingly, accuracy rates (m=54.17% and m=66.67%) and usefulness
ratings (m=2.67 and m=2.00) were much lower.
Table 7: Summary of participant performance on Tasks 3, 4, & 5 for WordReference.
Task
3
4
5
Num. of
Items
4
3
3
Avg. Num. of
Items Completed
3.33
3.00
2.67
Avg. Searches
per Item
1.56
1.11
1.94
Completion Time
(percent of max.)
82.83%
52.32%
89.44%
Accuracy
Rate
54.17%
72.22%
66.67%
Usefulness
Rating
2.67
4.00
2.00
Summary of Features
Searching
Searching on WordReference was enhanced by the site’s auto-detection capabilities.
Specifically, once the user had accessed a certain bilingual dictionary (in our case EnglishSpanish), the site was able to detect the language of the search query and respond accordingly.
This meant that participants did not have to both with switching from one language to the
other, which saved time and minimized searching errors. Additionally, the site provided
automatic suggestions as letters were entered into the search box, which again saved the
participants time in not having to type the whole word and also minimized errors due to
incorrect spelling.
Image 4: Illustration of the language auto-detect feature and word auto-suggest on
WordReference.
22
Despite these helpful features, searching on WordReference did produce some frustration in
the participants. This occurred when the site returned a results page that was related to, but
did not exactly match the query. This would happen when a participant typed in a conjugated
verb (site returns the infinitive form), a plural (site returns the singular form), or more than one
word at a time (site returns the first word in the search query). These search returns are
understandable in light of the standard function of a dictionary – dictionaries have one entry
for a verb, not dozens for each inflectional form – but they ran contrary to participants’
expectations, mostly likely because of the group’s previous familiarity with online machine
translators. Notably, the reason for the search return was not provided to the user, and it often
took participants some time to realize that the results did not match their expectations.
Image 5: Illustration of searching for multiple words on Wordreference.
In the first image, “between a rock and a hard place” is typed into the search box. The second
image displays the results page returned. Notice the site automatically redirects to the results
page for the first word in the query, “between”.
Results Pages
The results pages for WordReference were information-rich, yet confusing for the participants.
As participant 2 explained, “The information is there, but it’s not too accessible”. While
participants often found the usage examples, grammatical information, and
comprehensiveness of the results helpful in completing the tasks, they were also overwhelmed
by the sheer amount of information presented at once, and felt its organization lacked clarity.
With so many meanings displayed on the page, participants often overlooked the one they
were looking for, either because it required extensive scrolling to find (i.e. was located far
below the fold) or because the text was difficult to read at a glance. The information is
organized into columns, but these lack headings and the two equivalent words are located on
opposite sides of the page, separated by grammar information and usage examples. One
participant even had difficulty distinguishing the relevant Spanish word from all the other
information provided! The grammar explanations were often in the form of abbreviations that
participants found confusing, and in general the visual cues on the page provided inadequate
support to the novice user.
23
Image 6: Illustration of the text-rich results page on WordReference.
Finally, the language of the explanatory text varied automatically depending on whether the
searched word was in English or Spanish. Thus, participants found translating from English to
Spanish far easier than the other way around. This is true even in the conjugation tables, even
though it is far less likely that a Spanish speaker would look up the conjugation table for a
Spanish verb than an English speaker would. At one point participant 1 asked in frustration, “Is
this site written for people who speak Spanish?”
Image 7: The Spanish-heavy results pages on WordReference.
The left and right panels present a typical results page for a Spanish word and the conjugation
tables for a Spanish word, respectively.
24
Navigating
WordReference had some very helpful navigation features that the participants appreciated,
but that nonetheless had usability flaws that could be improved upon. First, participants liked
that the search bar was both present and prominently located on all results pages, minimizing
the necessary clicks to begin a new search. However, in certain cases the search narrowed the
results in a way that was not transparent to users. Specifically, if a user initiated a search from
the results page of a monolingual dictionary or the conjugation page, then the search would
return a result within that section, rather than referencing the wider site. This caused some
search error messages that mislead the participants into thinking the word could not be found
on the site, when indeed they were “stuck” in the wrong part of the website.
Image 8: Illustration of a participant getting “stuck” in the monolingual Spanish dictionary
section of WordReference.
On the left, the user has typed an English noun into the search bar, expecting to be taken to the
English-Spanish results page. On the right, no results are returned because the search was
limited to the monolingual Spanish dictionary.
In another example of helpful functionality with room for improvement, the results pages
provided a deep-linked state that allowed participants to easily investigate the meaning of
words without additional typing. However, it was not always apparent to users which words
were clickable, indicating inadequate visual cues. Because of this, the deep-linked state
sometimes caused unintended navigation.
25
Image 9: Misleading visual cues on the results page of WordReference.
Every single word on this page is clickable, though only some appear so.
Finally, specific features of the site were difficult for the participants to find, including the
conjugation tables, compound forms of words, and the discussion forum. Because these
features are so helpful to the language learner, their lack of visibility represents a serious
usability flaw. The WordReference forum is highly regarded as perhaps the most reliable
Spanish language forum on the web (source: author expertise), yet none of the three
participants found the forum during testing. As participant 2 summarized, “It looks like a lot of
expertise and thought went into the initial iteration of this site. I am hopeful that they can
make further improvements to make things more navigable.”
Image 10: Low visibility for the conjugation tool on WordReference.
26
Participants did not immediately notice the Conjugator link at the top of the page. Clicking the
arrow would have also brought the user to the conjugation tables, but only one participant
discovered this by, in her words, “happy accident”.
Image 11: Finding the compound forms of words and the discussion forum required extensive
scrolling.
SpanishDict
Basic Functionality
Using SpanishDict, participants completed Tasks 1 & 2 with the highest combined accuracy
(m=79.17% and m=95.83%) in the shortest amount of time (m=62.50% and m=69.89% of
maximum allotted) of the three groups. The number of searches required was also low (m=1.42
and m=1.17). Consequently, SpanshDict received the highest usefulness ratings of any site and
any task on Tasks 1 & 2 (m=4.33 for both). However, it is worth noting that WordReference (on
Tasks 1 & 2) and Collins (on Task 2) produced equal or higher accuracy rates as did SpanishDict
on Task 1, and yet their usability ratings were not as favorable for these tasks.
Table 8: Summary of participant performance on Tasks 1 & 2 for SpanishDict.
Task
1
2
Num. of
Items
4
4
Avg. Num. of
Items Completed
4.00
4.00
Avg. Searches
per Item
1.42
1.17
Completion Time
(percent of max.)
62.50%
68.89%
Accuracy
Rate
79.17%
95.83%
Usefulness
Rating
4.33
4.33
Advanced Functionality
SpanishDict did not yield as consistent results with the more advanced tasks as it did with basic
dictionary searches. While accuracy on Tasks 3 & 5 was reasonably high, (m=79.17% and
m=66.67%) usefulness ratings were much lower (m=3.00 and m=2.00). The usefulness ratings
are most likely explained by the extra time (m=86.95% and m= 87.08% of maximum allotted)
27
and searches (m=2.33 and m=1.67) required to complete these tasks. Task 4, finding dialectal
information, was clearly the most difficult to perform on this website, with an average accuracy
of 38.89%. Two out of three participants exceeded the maximum time allotted for this task.
These data easily indicate why the website received the lowest usefulness rating of all tasks and
all websites (m=1.67) on Task 4.
Table 9: Summary of participant performance on Tasks 3, 4, & 5 for SpanishDict.
Task
3
4
5
Num. of
Items
4
3
3
Avg. Num. of
Items Completed
3.33
2.00
2.67
Avg. Searches
per Item
2.33
1.78
1.67
Completion Time
(percent of max.)
86.95%
97.50%
87.08%
Accuracy
Rate
79.17%
38.89%
66.67%
Usefulness
Rating
3.00
1.67
2.00
Summary of Features
Searching
The search function on SpanishDict received the highest ratings from participants of the three
websites reviewed; however, it too was not without its flaws. Its language auto-detect and
word auto-suggest functioned similarly to those of WordReference, and participants viewed
them as helpful. In contrast to WordReference, SpanishDict has both a translator function and a
dictionary function. These are seamlessly integrated such that either the dictionary or the
translator is activated depending on whether the search is one word or multiple words. These
two functions are mutually exclusive and automatic, meaning that the user cannot choose
which mode to use. Nowhere on the site is this explained and as a result, even by the end of the
testing session none of the participants could articulate how the search function worked.
While participants generally liked being able to search using more than one word (a common
strategy with this group was to type the entire sentence into the search box), the results were
often misleading or incomplete. For instance, Tasks 1 & 2 featured ambiguous words with many
meanings, which the dictionary clearly explained but were absent in the machine-produced
translations. Similarly, dialectal information was not available in translate mode, and idioms
were translated literally. In fact, the only task for which the translation mode proved to be
more helpful than the dictionary mode was in verb conjugation (Task 3), as evidenced by the
vastly higher accuracy rate for the SpanishDict group (m=79.17%) versus the WordReference
and Collins groups (m=54.17% and m=33.33%, respectively).
28
Image 12: A side-by-side comparison of the dictionary and translator results pages on
SpanishDict.
Results Pages
The results pages of SpanishDict dictionary had similar flaws to those discussed with regard to
WordReference, including an overwhelming amount of information, unclear organization, and
confusing abbreviations. One varyingly helpful feature was the verb table featured at the
bottom of the dictionary pages for verbs. One participant found it very useful, but two
participants did not find it at all due to its location at the bottom of the page. Additionally, the
table only appeared when the infinitive form of the verb was put into a search; a search for an
inflected form did not produce the table. The verb tables were laid out similarly to
WordReference, but with English headings and rollover help captions, which helped
participants to interpret them.
Image 13: Side-by-side comparison of results pages for a verb in infinitive (left - verb table
present) and inflected (right - verb table absent) forms.
29
In contrast to the dictionary results page, the translator results page provided much less
information. This was sometimes regarded positively, if the participants felt that the answer
was clear. Participants liked that three different translators were used. Participant 4 explained
that she made her choices based on frequency: “If two out of three translators say it, that’s
probably right”. Participants felt especially confident when all three translators produced the
same result. However, these pages did not really give participants enough information to
evaluate the quality of the results. Several times participants indicated that they were not
confident in their answers, but yet could not figure out a way to verify them.
Navigating
Navigation on SpanishDict was generally viewed more favorably by its testers than navigation
on the other two sites. Participants made use of the Translate and Conjugate tabs located at
the top of the page, and the ever-present search bar. However, some of the most helpful
features of the site were a bit harder to find. SpanishDict featured both a phrasebook and a
language forum. Though these features were accessible through the top navigation tabs – the
forum through the “Q & A” tab and the phrasebook through “MorePhrasebook” – none of
the participants used these navigational features. Instead, the participants accessed the
phrasebook and forum through links posted at the bottom of the results pages. While this did
eventually lead to successful use of the features, participants were not always quick to notice
this feature.
Image 14: Two ways to access the Phrasebook and Forum on SpanishDict.
On the left, the top navigation tabs used by none of the participants. On the right, links to the
phrasebook and forum placed below the fold.
Collins
Basic Functionality
On the basic tasks of supplying bilingual definitions of English and Spanish words, Collins
performed similarly to, though not quite as well as, the other two sites. Accuracy rates for Tasks
30
1 & 2 were 75.00% and 79.17%, but this website required the most time (m=86.32% and
m=72.50% of maximum allotted) and searches (m=1.67 and m=1.75) for these two tasks
combined. Despite this, usefulness ratings were actually higher for Collins than for
WordReference (m=3.67 for both tasks).
Table 10: Summary of participant performance on Tasks 1 & 2 for Collins.
Task
1
2
Num. of
Items
4
4
Avg. Num. of
Items Completed
4.00
4.00
Avg. Searches
per Item
1.67
1.75
Completion Time
(percent of max.)
86.32%
72.50%
Accuracy
Rate
75.00%
79.17%
Usefulness
Rating
3.67
3.67
Advanced Functionality
The advanced features of Collins were not easily usable by the participants. The numbers of
searches required for these tasks were the highest of any task on any site (m=2.22, m=2.44, and
m=3.62). On Task 3, all participants exceeded the maximum time allowance and on Task 5, two
out of three participants abandoned the task before completion. It is not surprise then that
accuracy (m=33.33% and m=44.44%) and usefulness ratings (m=1.67 and m=2.00) were quite
low. Task 4, however, proved somewhat easier. While one participant abandoned this task, the
other two completed it, resulting in an overall accuracy rate of 66.67% and an average
usefulness rating of 3.67.
Table 11: Summary of participant performance on Tasks 3, 4, & 5 for Collins.
Task
3
4
5
Num. of
Items
4
3
3
Avg. Num. of
Items Completed
2.00
2.33
2.00
Avg. Searches
per Item
2.22
2.44
3.62
Completion Time
(percent of max.)
100.00%
68.89%
77.64%
Accuracy
Rate
33.33%
66.67%
44.44%
Usefulness
Rating
1.67
3.67
2.00
Summary of Features
Searching
Collins has both dictionary and translator search capabilities, but unlike on SpanisDict, these
functions are accessed separately. The dictionary has auto-suggest for words, but does not
automatically detect language. In fact, the site does not “remember” previous searches and so
the user must be careful to always select the correct bilingual dictionary and source language,
otherwise results will not be returned. For the most part, the dictionary only accepts single
words in the search query, but it does occasionally accept common phrases. This functionality is
hidden, however, and it was common among participants to overgeneralize negative
experiences and to presume that the dictionary would not accept multi-word searches.
31
Image 15: The dictionary function on Collins requires an initial language designation in order
to return a result (no auto-detect).
The translator function is more limited than that of SpanishDict, because it provides only a
single response. In addition, it is more cumbersome to use because, like with the dictionary, it
does not automatically detect language of entry. Furthermore, since the site does not
“remember” previous searches, upon re-access automatically reverts to the default – English to
French. This caused participants much frustration because they did not always think to check
the language settings before translating.
Image 16: The translator function requires an initial language designation in order to return a
useful result.
Results Pages
Collins’s dictionary results pages received the same complaints about information overload,
unclear organization, and confusing abbreviations as the other two websites tested. In addition,
the site’s use of blue font for some words created a misleading visual cue; participants thought
they could click on the words to be taken to other pages, but this was not possible.
32
One unique feature on the Collins dictionary results page are two sections entitled “Browse
nearby words” and “Related terms”. The former displays the words (all hyperlinked) that would
appear before and after the searched word in an alphabetical dictionary, while the latter
displays terms that are morphologically related to the word and phrases that contain it. These
features have the potential to be quite helpful, but in fact only one participant found them,
because they is located below the fold and off to one side of the page. The one participant to
find these tools, participant 8, actually used them on three separate occasions, indicating that
she found them worthwhile.
Image 17: “Related terms” and “Browse nearby words” on Collins.
Navigating
Participants did not find Collins easy to navigate. Although the top tabs for Dictionaries and
Translator seemed clear at first, participants soon learned to their frustration that the
dictionary page required further navigation via tabs in the middle of the page, making accessing
the Spanish-English dictionary a three-click process. Participant 8 eventually decided that the
easiest way to access the Spanish-English dictionary was through the sitemap footer, but
neither of the other two participants utilized this feature. In addition, participants found the
top tabs misleading, believing they indicated drop-down menus when they did not.
Though Collins has verb tables just like WordReference and SpanishDict, none of the
participants found this feature, resulting in the lowest accuracy rate for Collins’s users on Task 3
of any task on any site.
33
Image 18: The verb conjugation tables on Collins were difficult to find.
Finally, though all three websites featured heavy advertising, only on Collins did users actually
mistake the ads as being a part of the site and click on them. Both participant 7 and participant
9 clicked on the ad shown in Image X.
Image 19: Distracting advertisement on Collins.
3.3
Comparative Analysis
In this section we will compare the three sites from quantitative and qualitative perspectives in
order to answer the last two research questions.
4. Which website produced superior performance results?
5. Which website was viewed most favorably by participants?
34
We will structure our quantitative findings in terms of completion rate, accuracy of responses,
number of searches, and completion time and our qualitative findings in terms of post-test
perceptions on site characteristics, site capabilities, and overall satisfaction.
Completion Rate
The completion rate per task is one indicator of the facility with which participants used the
features of the site to achieve objectives. An item was considered incomplete if the participant
failed to answer it within the allotted time or abandoned the task before completing the item.
WordReference and SpanishDict shared the highest completion rates for four out of five tasks,
with WordReference outperforming the other sites on Task 4. All participants completed all
items for the basic translation tasks, while completion rates for the tasks requiring use of
advanced functionality were lower, in the case of Collins, markedly so.
Table 12: Average percent of items completed, by task and site.
Average –
WordReference
SpanishDict
Collins
all sites
Task 1
100.00%
100.00%
100.00%
100.00%
Task 2
100.00%
100.00%
100.00%
100.00%
Task 3
83.25%
83.25%
50.00%
72.17%
Task 4
100.00%
66.67%
77.67%
81.45%
Task 5
89.00%
89.00%
66.67%
81.56%
Average – all tasks
94.45%
87.78%
78.87%
87.03%
A number in italics indicates one or more participants abandoned the task before exceeding the
maximum allotted time.
Accuracy
While accuracy would be one of the most important considerations for a user of the website, it
was also the metric that participants were least able to judge for themselves. WordReference
produced the highest accuracy rates on Tasks 1 and 4, translating a single word from Spanish to
English and finding dialectal variations. SpanishDict produced the highest accuracy rates on
Tasks 2 & 3, translating a word from English to Spanish and conjugating verbs. These two sites
performed equally well on Task 5, translating idioms, and their overall averages across tasks are
statistically equivalent at just .16 percentage points separating them. Collins, however, clearly
produced the least accurate responses across all tasks.
35
Table 13: Percent of items correct, by task and site.
Task 1
Task 2
Task 3
Task 4
Task 5
Average – all tasks
WordReference
88.33%
79.17%
54.17%
72.22%
66.67%
72.11%
SpanishDict
79.17%
95.83%
79.17%
38.89%
66.67%
71.95%
Collins
75.00%
79.17%
33.33%
66.67%
44.44%
59.72%
Average –
all sites
80.83%
84.72%
55.56%
59.26%
59.26%
Number of Searches
The number of searches required for each item provides another view at how usable the three
websites are. While WordReference required the lowest amount of searches overall and for
Tasks 3 and 4, SpanishDict outperformed it on Tasks 1, 2, and 5. Indeed, a view of the data
shows the number of searches required on Task 3 for SpanishDict to be an outlier – related no
doubt to factors discussed above – such that this website can be considered to generally
require the least amount of searches of the three sites tested.
Table 14: Average number of searches per item, by task and site.
Task 1
Task 2
Task 3
Task 4
Task 5
Average – all tasks
WordReference
1.92
1.25
1.58
1.11
1.94
1.56
SpanishDict
1.42
1.17
2.33
1.78
1.67
1.67
Collins
1.67
1.75
2.22
2.44
3.61
2.34
Average –
all sites
1.67
1.39
2.04
1.78
2.41
1.86
Completion Time
As discussed above, time needed to complete a task had a strong bearing on participant
satisfaction. On average, participants needed the least amount of time on WordReference,
followed by SpanishDict and Collins.
36
Table 15: Percent of time (based on maximum time allowed) taken to complete each task and
site.
Average WordReference
SpanishDict
Collins
all sites
Task 1
79.10%
62.50%
86.32%
75.97%
Task 2
64.31%
68.89%
72.50%
68.57%
Task 3
82.83%
86.95%
100.00%
89.93%
Task 4
52.32%
97.50%
68.89%
72.90%
Task 5
89.44%
87.08%
77.64%
84.72%
Average - all tasks
73.60%
80.58%
81.07%
78.42%
Note: A number in italics indicates that one or more participants from this group abandoned the
task without completing all items. In this case, 100% was used as the value for percent of time
taken for that individual.
Site Characteristics
Participants rated site characteristics on a five-point Likert scale from 1 (very unsatisfied) to 5
(very satisfied). The characteristics have been grouped according to their relations to the “three
I’s” of web design – interaction, information, and interface. We can see that SpanishDict
received the highest ratings for interaction and interface characteristics. However,
WordReference received the highest ratings for information, a fact that aligns well with the
finding that WordReference produced the most accurate results of the three websites. It is also
interesting to note that WordReference received the lowest ratings of the three sites on
interaction and interface, indicating clear priorities for improvement for this site. Likewise,
Collins’s low ratings on both interaction and information align with the finding that this site
produced the lowest overall accuracy ratings.
Table 16: Ratings of site characteristics related to the “three I’s”.
WordReference
2.67
Ease of searching
Interaction Ease of navigating
2.00
2.34
Average
Ease of finding
3.00
information on a page
Information
Quality of information
3.67
3.34
Average
Ease of reading the text
3.33
3.33
Interface
General appearance
3.33
Average
Overall
3 I Average
3.00
Satisfaction
37
SpanishDict
3.67
3.33
3.50
Collins
2.67
2.67
2.67
Average –
all sites
3.00
2.67
2.84
2.67
2.00
2.56
3.33
3.00
4.00
4.00
4.00
2.67
2.34
3.33
4.00
3.67
3.22
2.89
3.55
3.78
3.67
3.50
2.89
3.13
Perceived Value for Accomplishing Tasks
Participants rated the value they perceived the site as having for a particular task on a fivepoint Likert scale from 1 (useless) to 5 (very useful). The results are somewhat surprising.
Certainly it is understandable that for the majority of tasks SpanishDict received the highest
ratings. It is harder to explain is why WordReference did not receive the highest rating for any
task, while Collins received the highest ratings in two of the tasks. Somehow, WordReference
users finished the testing experience less satisfied with the website than data such as
completion rate, number of necessary searches, and completion time would indicate.
Table 17: Perceived value of site for specific tasks.
Translating a single word
Differentiating between multiple
meanings of a word
Conjugating a verb
Finding a term belonging to a particular
dialect
Translating phrases or idioms
Clarifying a usage or grammar point I
found confusing
Average - all tasks
WordReference SpanishDict
4.00
4.67
Collins
3.67
Average all sites
4.11
3.67
4.33
2.67
3.56
2.67
3.67
1.33
2.56
3.00
1.67
3.33
2.67
1.67
2.33
2.33
2.11
1.33
3.33
1.00
1.89
2.72
3.33
2.39
A comparison of the pretest and posttest data help to elucidate the peculiar discrepancy
described above. In viewing the change in perceptions from pretest to posttest, it is clear that
the Collins group experienced the most dissatisfaction as a percentage of their initial
expectations. Notably, posttest impressions of usefulness for all three sites fell below
expectations, indicating general dissatisfaction with the sites for the tasks assigned.
38
Table 18: Percent change in perceived value of the sites tested from pretest to posttest.
WR Pre
WR Post
WR %Change
SD Pre
SD Post
SD %Change
CPre
CPost
C%Change
4.67
4
-14.35%
4.67
4.67
0.00%
4.67
3.67
-21.41%
4.67
3.67
-21.41%
4
4.33
8.25%
4.67
2.67
-42.83%
4.33
2.67
-38.34%
4.67
3.67
-21.41%
4.67
1.33
-71.52%
Finding a term
belonging to a
particular dialect
3.33
3
-9.91%
4
1.67
-58.25%
5
3.33
-33.40%
Translating
phrases or idioms
4
1.67
-58.25%
4.67
2.33
-50.11%
4.33
2.33
-46.19%
Translating a
single word
Differentiating
between multiple
meanings of a
word
Conjugating a verb
Clarifying a usage
-63.76%
3.67 3.33
-9.26%
4
1
-75.00%
or grammar point I 3.67 1.33
found confusing
-34.34% 4.28 3.33
-21.80% 4.56 2.39
-48.39%
Average*
4.11
2.72
The bolded cells in the Average Change row represent the average of all cells above them, not
the percent change of the average perceived value rating pre- and posttest.
Overall Satisfaction
Table 19 below contains a variety of measures – including individual metrics and averages – of
overall satisfaction with the websites. Not surprisingly, SpanishDict stands out as the clear
favorite, no matter how satisfaction is measured.
Table 19: Summary of all satisfaction-related data.
WordReference
SpanishDict
Collins
3 I Average
3.00
3.50
2.89
Average perceived
2.93
3.07
2.94
usefulness - During testing
Average perceived
2.72
3.33
2.39
usefulness - Posttest
Overall User Experience
2.67
3.33
2.67
2.00
2.67
2.00
Likelihood of returning
Average - all metrics
2.66
3.18
2.58
Overall User Experience was rated on the same scale described in the Site Characteristics
section above. Likelihood of returning was rated on a five-point scale from 1 (no way) to 5 (I
would definitely use this site).
39
DISCUSSION
4.1
Website Performance and User Perceptions
This study produced some interesting findings that shed light on the relationship between
website performance and user perceptions. To sum up, the accuracy of results produced did
not evidence a direct relationship with user perceptions of a site. On the other hand, the
amount of time and number of searches needed to complete a task did correlate closely with
users’ perceptions of their experience. These findings indicate a potential weakness of usability
tests that rely on user perceptions alone. In the long run, a user who discovers that the
information he has been gathering from a bilingual dictionary website is not accurate will
abandon the site, no matter how positively he initially perceives his experience to be. Because
usability testing takes place in an isolated context, researchers should make every effort to
gather a variety of types of data to provide the most accurate appraisal of a website’s usability.
4.2
Interaction, Information, and Interface
Interaction, information and interface are the three elements that make up any website design.
Information can be considered the substance of a website, whereas interaction is the medium
through which the user and the website communicate. Finally, interface provides the user with
access to the first two. A usability flaw in any of these “three I’s” will negatively impact the
others, just as an improvement in one domain may mitigate the problems experienced with
another.
Participants in the WordReference group showed some awareness of the high quality of its
information, rating this domain the highest of the site’s “three I’s”. However, their negative
experiences with the interaction and interface of the site caused their overall satisfaction to be
lower than that of SpanishDict’s participants, despite comparable or even superior performance
by WordReference in quantitative terms.
Collins provided access to the same information – the task development process ensured that
all items were possible to complete using any of the sites – as the other two sites, yet
participants in this group rated its information far below ratings received by WordReference
and SpanishDict. Yet again, in Collins we see another case of poor interaction and interface
design mitigating the usability of the website’s information.
SpanishDict provides a very different example of the interaction among the “three I’s”. The
information it provided to participants was in fact equal in quality to that of WordReference,
yet participants’ overall positive experiences with the interaction and interface of the site
caused them to rate SpanishDict more highly as a whole.
40
4.3
Recommendations
The websites tested in this study all have their strengths and weaknesses in these “three I’s”.
Because these elements are integrated in the user experience and any design alteration will
have repercussions in all three, the recommendations in this study are organized around the
domains of user experience analyzed in the Results section: searching, results pages and
navigation.
Searching
Data gathered during this study clearly indicate that all three sites have room for improvement
in their search function. In all three cases, information and interaction seem to be the sources
of frustration, while the interface itself functions as intended by designers and expected by
users. The recommendation of this paper is to take the best elements of each site –
WordReference almost never returns an error message, SpanishDict accepts both single and
multi-word search queries, and Collins clearly differentiates between its dictionary and
translator functions – and improve them further according to recommendations developed by
Neilson et al. (2000). Our proposed best practice, to be adopted by any and all of these sites,
would combine the following characteristics (sites listed in parentheses already have these
characteristics to varying degrees):
1.
2.
3.
4.
5.
6.
7.
Provide a clearly visible search box on every page. (WordReference, SpanishDict, Collins)
Auto-detect language of input. (WordReference, SpanishDict)
Auto-suggest search terms. (WordReference, SpanishDict, Collins)
Provide a relevant search result whenever possible. (WordReference, SpanishDict)
Accept both single and multi-word search queries. (SpanishDict)
Allow user to choose between dictionary and translator modes. (Collins)
When a result differs from the original query (e.g. when the singular form of a noun is
returned instead of the plural), provide detailed feedback on why that particular result
was returned.
8. When no result return is possible, provide detailed feedback indicating why no results
was returned and suggesting alternative strategies, such as the other search mode
(either dictionary or translator), language forum, conjugation tables, or other features
such as SpanishDict’s phrasebook.
9. Provide “Advanced Search” functionality so that users can narrow down the number of
meanings returned by the search by such criteria as part of speech, dialect, compound
forms, etc. This should be a separate feature from the main search function.
Results Pages
Whereas the search functions of the three sites all differed in their strengths, the complaints
made by participants about the websites’ results pages were remarkably similar. They
complained that the information was poorly organized (information architecture) and
41
confusingly displayed (interface). Our proposed best practice would try to improve upon these
flaws by building more interactive features into the results pages, which in their current forms
are extremely static.
1. At the top of the results page, provide a hyperlinked menu of all the subcategories of
responses (e.g. transitive verb, intransitive verb, compound forms, conjugator,
phrasebook, forum, etc.).
2. Use accordion-style organization to minimize the amount of information initially
presented on a page, but still allow users to find out more information on an entry
without having to navigate to a new page (Tidwell 2011).
3. Provide rollover help on all abbreviations and grammatical terms.
4. Provide a deep-linked state such that users can click on almost any piece of text on the
page. (WordReference).
5. Provide clear visual cues to indicate what is clickable or not.
6. Provide “Advanced Search” functionality (described above) on each results page so that
users can narrow down results after the initial search.
Navigation
Adhering to the previous recommendations regarding the search functionality and results pages
should result in a decrease in the amount of navigation required of the user to accomplish his
or her goal. The most serious usability flaws related to navigation in these sites involved poor
visibility of important features in the interface. These features are unique to each site, so our
recommendations will address them separately, though together they indicate the desirable
best practices for all of these sites.
WordReference
1. Provide a tab for the discussion forum on all pages.
2. Feature a link to the discussion forum above the fold on all results pages, as described
above.
SpanishDict
1. Provide more clearly labeled tabs for the phrasebook and discussion forum on all pages.
2. Feature a links to the phrasebook and discussion forum above the fold on all results
pages, as described above.
Collins
1. Feature links to the “related terms” and “browse nearby words” functions above the
fold on all results pages, as described above.
42
SUMMARY
This study has examined three websites that provide English-Spanish bilingual dictionaries,
WordReference, SpanishDict, and Collins. Participants in this usability study used the websites
to complete five tasks. Two tasks approximated the most basic functionality for which a Spanish
learner would use an online dictionary. The other three tasks involved more advanced features
provided by these sites. In general, usability results were more favorable for the basic tasks
than the advanced. Qualitative analysis revealed the most helpful and obstructive features of
these sites, elucidating contributing factors to the ease/difficulty of use revealed through
quantitative measures. The three websites were then compared in terms of performance and
user preferences. The discussion provided recommendations for improvement and best
practices based on these findings.
URLs FOR SITES TESTED
1. WordReference
2. SpanishDict
3. Collins
http://www.wordreference.com/
http://www.spanishdict.com/
http://www.collinsdictionary.com/
43
REFERENCES
Ambiguous Words. Dillfrog. Retrieved November 25, 2013, from
http://muse.dillfrog.com/ambiguous_words.php
Analytics for any Website. Alexa. Retrieved November 25, 2013, from http://www.alexa.com
Dominus, M. (2007, May 15). Ambiguous words and dictionary hacks. The Universe of
Discourse. Retrieved November 1, 2013, from
http://blog.plover.com/lang/ambiguous.html
Erichsen, G. (2013, February 1). Which Online Translator Is Best?. About.com Spanish Language.
Retrieved November 25, 2013, from
http://spanish.about.com/od/onlinetranslation/a/online-translation.htm
Erichsen, G. (2013, February 19). What Is the Best Online Spanish-English Dictionary?.
About.com Spanish Language. Retrieved November 25, 2013, from
http://spanish.about.com/b/2013/02/19/what-is-the-best-online-spanish-englishdictionary.htm
Gaspari, F., & Hutchins, J. (2007). Online and Free! Ten Years of Online Machine Translation:
Origins, Developments, Current Use and Future Prospects.
Gaspari, F., & Somers, H. (2007). Making a sow's ear out of a silk purse: (mis)using online MT
services as bilingual dictionaries. ASLIB.
Kellogg, M. (n.d.). About Us. WordReference.com. Retrieved November 25, 2013, from
http://www.wordreference.com/english/AboutUs.aspx
Liu, M., Traphagan, T., Huh, J., Koh, Y. I., Choi, G., & McGregor, A. (2008). Designing Websites
for ESL Learners: A Usability Testing Study. CALICO Journal, 25(2).
Nielsen, J., Molich, R., Snyder, C., & Farrell, S. (2000). Search. E-Commerce User Experience (pp.
1-1). Delete: Nielsen Norman Group.
Regional Variations in Spanish Words Translated from English. (n.d.). Rennert: Breaking the
Language Barrier. Retrieved November 25, 2013, from
http://www.rennert.com/translations/resources/spanishvariations.htm
Rubin, J., & Chisnell, D. (2008). Handbook of usability testing how to plan, design, and conduct
effective tests (2nd ed.). Indianapolis, IN: Wiley Pub.
Solon, O. (2012, January 3). Collins launches free dictionary site. Wired UK. Retrieved November
25, 2013, from
http://www.wired.co.uk/news/archive/2012-01/03/collins-dictionary-online
Stevenson, M. P., & Liu, M. (2010). Learning a Language with Web 2.0: Exploring the Use of
Social Networking Features of Foreign Language Learning Websites. CALICO Journal,
27(2), 1-27.
Tidwell, J. (2008). Designing interfaces (2nd ed.). Beijing: O'Reilly.
44
APPENDIX
Appendix A: Task List (Version 1)
Task 1: Spanish-English translation
Find an appropriate English translation of a Spanish word, given the context of a sentence.
2. El niño se quedó en su cuarto toda la noche.
a. The boy stayed in his ______________ all night.
3. Ella es celosa en su apoyo al Partido Republicano.
a. She is _________________________ in her support of the Republican Party.
4. ¡Este cuadro es una obra de arte!
a. This ____________________ is a work of art!
5. Las acciones que había comprado declinaron rápidamente en valor.
a. The ______________ he had bought quickly declined in value.
6. Cuando miró a su muñeca, se dio cuenta de que había olvidado su reloj en casa.
a. When he looked at his wrist, he realized he had forgotten his __________________ at
home.
7. Soy alérgico a las plumas en esta almohada.
a. I am allergic to the ______________ in this pillow.
8. Hazme el favor de no tocar el piano mientras el bebé está dormido.
a. Do me the favor of not ___________________ the piano while the baby is asleep.
Task 2: English-Spanish translation
Find an appropriate Spanish translation of an English word, given the context of a sentence.
2. Because it was her birthday, she received a free dessert.
a. Debido a que era su cumpleaños, recibió un postre _______________.
3. Why don't we take a break?
a. ¿Por qué no nos tomamos un ___________________?
4. I was not present for the presentation because I was sick.
a. Yo no estuve _______________ en la presentación porque estaba enfermo.
5. The soccer game ended in a tie.
a. El partido de fútbol terminó en un _______________________.
6. Without a match I cannot light the candle.
a. Sin un _____________ no puedo encender la vela.
7. The thief attempted to scale the castle walls.
a. El ladrón intentó _______________ los muros del castillo.
8. Would you please pour me a glass of that?
a. ¿Podría servirme un _____________ de eso?
9. Last night I saw a bat on the ground with a broken wing.
45
a. Ayer por la noche vi a un ________________ en el suelo con un ala rota.
10. I want to buy curtains that match my sheets.
a. Quiero comprar cortinas que combinan con mis ____________________.
Task 3: Verb conjugation
You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it.
2. Henry gave me a surprise birthday present.
a. Verb:
3. My mom makes delicious cookies.
a. Verb:
4. We ran to the store.
a. Verb:
5. You draw better than anyone I know.
a. Verb:
6. Call Cecilia if you need directions.
a. Verb:
7. I took a shortcut.
a. Verb:
8. They turned in their papers yesterday.
a. Verb:
9. Go away.
a. Verb:
Task 4: Dialects
You have a pen pal in another country (listed in each item). Find out how he would say the following
words.
2.
3.
4.
5.
Jacket (Mexico): ____________________
Baby (Chile): ___________________________
Grocery store (Uruguay): ______________________
Swimming pool (Mexico): _______________________
Task 5: Idioms
You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to
replace the following English idioms.
2. That test was a piece of cake.
46
a. Ese examen _____________________________________________.
3. Jenny felt like she was between a rock and a hard place.
a. Jenny se sintió _____________________________________________.
4. That dress fits you like a glove!
a. ¡Ese vestido _____________________________________________!
5. Jimmy went to cool off in the pool.
a. Jimmy fue a _____________________ en la piscina.
Task 6: Ill-structured problem
You have used Google translate to transcribe a sentence from Spanish to English. However, the sentence
doesn’t exactly make sense. Use the site to find more appropriate translations for the words in the
sentence.
1. Original Spanish: ¿Eres de las mujeres que durante los últimos meses de 2012 se inscribió en el
gimnasio para sudar la gota gorda y lograr el ansiado "verano sin pareo"?
a. Problematic translation: You are of the women that during the last months of 2012 was
recorded in the gymnasium to sweat the fat drop and to achieve the desired "summer
without matching"?
i. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation: You’re one of the women who
ii. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation: to work up a sweat or just to sweat
iii. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation: bikini summer
iv. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation: joined a gym
v. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
vi. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
2. Original Spanish: No cabe duda de que en los últimos cinco años, el destino de América Latina ha
sido influenciado fuertemente por tres de sus más visionarios y decididos líderes: Hugo Chávez,
Rafael Correa y Evo Morales.
47
a. Problematic translation: There is not doubt that in the last five years, the destination of
Latin America has been influenced hard by three of its most visionary and determined
leaders: Moral Hugo Chávez, Rafael Correa y Evo.
i. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
ii. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
iii. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
iv. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
v. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
vi. Problematic word/phrase:
1. Original Spanish word/phrase:
2. Proposed better translation:
48
Appendix B: Task List for Usability Test (Version 2)
Task 1: Spanish-English translation
Find an appropriate English translation of a Spanish word, given the context of a sentence.
1. Soy alérgico a las plumas en esta almohada.
a. I am allergic to the ______________ in this pillow.
2. Hazme el favor de no tocar el piano mientras el bebé está dormido.
a. Do me the favor of not ___________________ the piano while the baby is asleep.
3. Ella es celosa en su apoyo al Partido Republicano.
a. She is _________________________ in her support of the Republican Party.
4. Las acciones que había comprado declinaron rápidamente en valor.
a. The ______________ he had bought quickly declined in value.
Task 2: English-Spanish translation
Find an appropriate Spanish translation of an English word, given the context of a sentence.
1. The soccer game ended in a tie.
a. El partido de fútbol terminó en un _______________________.
2. Without a match I cannot light the candle.
a. Sin un _____________ no puedo encender la vela.
3. The thief attempted to scale the castle walls.
a. El ladrón intentó _______________ los muros del castillo.
4. I want to buy curtains that match my sheets.
a. Quiero comprar cortinas que combinan con mis ____________________.
Task 3: Verb conjugation
You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it.
1. Henry gave me a surprise birthday present.
Verb (él/preterit):
2. You draw better portraits than anyone I know.
Verb (tú/present):
3. They turned in their papers yesterday.
Verb (ellos/preterit):
4. Call Cecilia if you need directions.
Verb (tú/imperative):
49
Task 4: Dialects
You have a pen pal in Mexico. Find out how he would say the following words.
1. Trunk of a car:
2. Swimming pool:
3. Straw:
Task 5: Idioms
You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to
replace the following English idioms.
1. That test was a piece of cake.
a. Ese examen _____________________________________________.
2. Jenny felt like she was between a rock and a hard place.
a. Jenny se sintió _____________________________________________.
3. That dress fits you like a glove!
a. ¡Ese vestido _____________________________________________!
Task 6: Open-Ended
Write two sentences in English about your favorite food below. Then, use the site to translate the
sentences to Spanish.
English Sentence 1:
English Sentence 2:
Spanish Sentence 1:
Spanish Sentence 2:
50
Appendix C: Task List for Usability Study (Version 3 – Final)
Task 1: Spanish-English translation
Find an appropriate English translation of a Spanish word, given the context of a sentence.
1. Soy alérgico a las plumas en esta almohada.
a. I am allergic to the ______________ in this pillow.
2. Hazme el favor de no tocar el piano mientras el bebé está dormido.
a. Do me the favor of not ___________________ the piano while the baby is asleep.
3. Ella es celosa en su apoyo al Partido Republicano.
a. She is _________________________ in her support of the Republican Party.
4. Las acciones que había comprado declinaron rápidamente en valor.
a. The ______________ he had bought quickly declined in value.
Task 2: English-Spanish translation
Find an appropriate Spanish translation of an English word, given the context of a sentence.
1. The soccer game ended in a tie.
a. El partido de fútbol terminó en un _______________________.
2. Without a match I cannot light the candle.
a. Sin un _____________ no puedo encender la vela.
3. The thief attempted to scale the castle walls.
a. El ladrón intentó _______________ las murallas del castillo.
4. I want to buy curtains that match my sheets.
a. Quiero comprar cortinas que combinan con mis ____________________.
Task 3: Verb conjugation
You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it.
1. Henry gave me a surprise birthday present.
Verb (él/preterit):
2. You draw better portraits than anyone I know.
Verb (tú/present):
3. They turned in their papers yesterday.
Verb (ellos/preterit):
4. Call Cecilia if you need directions.
Verb (tú/imperative):
51
Task 4: Dialects
You have a pen pal in Mexico. Find out how he would say the following words in his dialect.
1. Trunk of a car:
2. Swimming pool:
3. Drinking straw:
Task 5: Idioms
You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to
replace the following English idioms.
1. That test was a piece of cake.
a. Ese examen _____________________________________________.
2. Jenny felt like she was between a rock and a hard place.
a. Jenny se sintió _____________________________________________.
3. That dress fits like a glove!
a. ¡Ese vestido _____________________________________________!
52
Appendix D: Sample Page of Observation Record Form
53
54
Appendix E: Background Questionnaire
55
56
Appendix F: Pretest Questionnaire
57
58
59
Appendix G: Posttest Questionnaire
60
61
62
Appendix H: Orientation Script
Thank you for agreeing to participate in this usability study. I’ll be using a script to ensure that
my instructions to everyone who participates in the study are the same.
The purpose of this study is to compare the utility of three online Spanish-English dictionaries for
a variety of tasks that are typical of second language learning contexts. You will be testing one
of these sites. During the session, I will ask you to use the website to complete the tasks and will
observe you while you do them. As you do these things, try to do whatever you would normally
do.
Each task has several items. You will have a set amount of time to complete each task. I will
keep time and ask occasional questions, but in general I am not supposed to interfere with your
interactions with the site.
Please know that I’m not testing you, and there is no such thing as a wrong answer. All of the
tasks are possible to complete using the website. However, you may find that some tasks are
easier to complete than others. Remember, the relative difficulty of each task is largely due to
the design of the website.
I ask that you please think out loud while performing the tasks. Just tell me whatever is going
through your mind. Your doing this helps me understand how users interact with the site.
The whole session will take about 60 minutes. Do you have any questions before we begin?
63
Appendix I: Time Schedule for Testing Event
Event
Introduction + Pretest + Orientation Script
Task 1
Task 2
Task 3
Task 4
Task 5
Posttest + Debriefing
Total time
Time allowed
10 min
8 min
8 min
10 min
6 min
8 min
10 min
60 min
64
AUTHOR INFORMATION
Elena Winzeler is a M.Ed. candidate in the Learning Technologies program at the University of
Texas at Austin. Her academic interests lie in second language acquisition, literacy
development, and instructional design. She is a fluent, non-native speaker of Spanish.
To contact the author via email, please write to emwinzeler@utexas.edu.
This paper is written by Elena Winzeler for the course EDC385G Designs & Strategies for New Media at the University of
Texas at Austin.
65
Download