Online Spanish-English Dictionaries: A Comparative Usability Study By Elena Winzeler Dr. Min Liu Fall 2013 Designs and Strategies for New Media Table of Contents INTRODUCTION 1.1 Purpose and Research Questions 1.2 Target Audience 1.3 Site Selection and Descriptions Wordreference.com SpanishDict.com CollinsDictionary.com 5 5 5 6 7 8 9 METHODOLOGY 2.1 Participants 2.2 Tasks 2.3 Instruments 2.4 Testing Procedures RESULTS 3.1 Pretest 3.2 Site Analysis 11 11 11 14 19 20 20 21 WordReference SpanishDict Collins 21 27 30 3.3 Comparative Analysis 34 Completion Rate Accuracy Number of Searches Completion Time Site Characteristics Perceived Value for Accomplishing Tasks Overall Satisfaction DISCUSSION 4.1 Website Performance and User Perceptions 4.2 Interaction, Information, and Interface 4.3 Recommendations Searching Results Pages Navigation 35 35 36 36 37 38 39 40 40 40 41 41 41 42 43 43 44 45 65 SUMMARY URLs FOR SITES TESTED REFERENCES APPENDIX AUTHOR INFORMATION 2 List of Images and Tables Image 1 Image 2 Image 3 Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Table 8 Table 9 Image 12 Image 13 Image 14 Table 10 Table 11 Image 15 Image 16 Image 17 Image 18 Image 19 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 WordReference homepage SpanishDict homepage Collins homepage Summary of traffic information for the three sites tested Website Traffic Patterns Research questions matched to data collection methods Frequency of use and preference for online bilingual tools Perceived value of online bilingual dictionaries Summary of participant performance on Tasks 1 & 2 for WordReference Summary of participant performance on Tasks 3, 4 & 5 for WordReference Language auto-detect and word auto-suggest features on WordReference Multi-word search queries on WordReference Text-rich results page on WordReference Spanish-heavy results page on WordReference Getting “stuck” in the monolingual Spanish dictionary of WordReference Misleading visual cues on the results page of WordReference Low visibility for the conjugation tool on WordReference Finding compound forms on WordReference Summary of participant performance on Tasks 1 & 2 for SpanishDict Summary of participant performance on Tasks 3, 4 & 5 for SpanishDict Side-by-side comparison of dictionary and translator on SpanishDict Results pages for a verb in inflected and infinitive forms Two ways to access the phrasebook and forum on SpanishDict Summary of participant performance on Tasks 1 & 2 for Collins Summary of participant performance on Tasks 3, 4 & 5 for Collins No language auto-detect on Collins dictionary No language auto-detect on Collins translator “Related terms” and “Browse nearby words” features on Collins Low visibility for the conjugation tool on Collins Distracting advertisement on Collins Completion rate Accuracy rate Average number of searches per item Completion time Ratings on site characteristics related to the “three I’s” Perceived value of site for specific tasks Percent change in perceived value from pretest to posttest Overall satisfaction 3 7 8 9 10 12 14 20 20 21 22 22 23 24 24 25 26 26 27 27 28 29 29 30 31 31 32 32 33 34 34 35 36 26 37 37 38 39 39 Appendix Guide A Task List (Version 1) B Task List (Version 2) C Task List (Version 3-Final) D Sample Page of Observation Record Form E Background Questionnaire F Pretest Questionnaire G Posttest Questionnaire H Orientation Script I Time Schedule for Testing Event 4 45 49 51 53 55 57 60 63 64 Online Spanish-English Dictionaries: A Comparative Usability Study INTRODUCTION With today’s technology, it is becoming more and more common for language learners to consult Internet resources before they would a textbook or other physical book. However, not all websites are equal in the quality of information they provide to language learners. Specifically, online machine translators (MTs) can often provide incomplete or misleading information, especially in the cases of words with multiple meanings, verb conjugations, dialectal varieties, and idioms. Online bilingual dictionaries can provide more complete and accurate information to the language learner, but can be more cumbersome to use. 1.1 Purpose and Research Questions The purpose of this study is to examine three of the highest quality online dictionaries for Spanish-English translation to discover the degree to which novice language learners can take advantage of the information provided by these sites. This study was guided by research questions on two levels: the site level and the comparative level (across all websites). The relationship between the answers to these questions and the ensuing implications will be explored in the Discussion section, along with best practice recommendations. Site Level: 1. How easy was it to use the site to complete the tasks? 2. What were the most helpful and obstructive features that participants encounter when using the site? 3. What were participants’ reactions to the site? Comparative Level: 4. Which website produced superior performance results? 5. Which website was viewed most favorably by participants? 1.2 Target Audience Those for whom this study might be of interest fall into three general categories. First, the creators and developers of online bilingual dictionaries might find the data, analysis, and recommendations presented to be of use in improving their own websites. Second, language learners could use the information presented to inform their selection of an online bilingual 5 dictionaries to best suit their needs. Similarly, language teachers could glean from this report an understanding of which website would be best to recommend to their students for a particular task. 1.3 Site Selection and Descriptions Site selection was guided by the desire to include websites of the highest information quality, in order that the findings would be of maximum interest to the groups described above. It was considered a priority to hold information quantity and quality as near-constants, so that the effects of information architecture, interaction, and interface could be closely examined. An original list of three sites was compiled based on the results of the 2013 Reader’s Choice Awards on About.com. Online readers were invited to nominate their preferred online SpanishEnglish dictionary. WordReference.com (hereafter “WordReference”), SpanishDict.com (hereafter “SpanishDict”), and the American Heritage Spanish Dictionary at Yahoo.com (hereafter “Yahoo”) received the most nominations. An initial task list was developed with the dual purpose to test both the basic and more supplemental functionalities of the online dictionaries. In order to facilitate a true comparison, the decision was made to standardize tasks across websites. At this point, it was discovered that Yahoo lacked some of the more advanced features of the other two sites, including dialectal information and verb conjugation tables. Therefore, Yahoo was removed from the list of sites based on its dearth in information compared to WordReference and SpanishDict. A replacement, Collinsdictionary.com (hereafter “Collins”), was found by using an online search engine. Collins was then tested with the initial task list to ensure it provided information comparable to that found in WordReference and SpanishDict. The information pertaining to the task list was found to be as complete and accurate as that provided by the other two websites, and so Collins was retained as the final website for review. Below follows a description of the three websites tested in this study: 1. WordReference (http://www.wordreference.com/) 2. SpanishDict (http://www.spanishdict.com/) 3. Collins (http://www.collinsdictionary.com/) 6 WordReference WordReference was begun by Michael Kellogg in 1999 with the purpose to “provide free online bilingual dictionaries and tools to the world”.1 WordReference is a dictionary site only, and does not provide a translator feature. It does, however, contain a very active language forum. Over time, the site has grown from its original six language pairs of English-French, English-Italian, English-Spanish, French-Spanish, Spanish-Portuguese, and English-Portuguese, to fifteen language pairs including languages from Eastern Europe, the Middle East, and Asia. The site won the 2013 Reader’s Choice Award for best Spanish-English dictionary, receiving 50% of the votes.2 According to Alexa.com, WordReference is ranked as the 225th most visited site (calculation based on average daily visitors and average pageviews) in the world. Its US rank is 786. Image 1: The WordReference home page, set to Spanish-English dictionary search. 1 http://www.wordreference.com/english/AboutUs.aspx http://spanish.about.com/gi/pages/poll.htm?poll_id=1074077045&linkback=http://spanish.about.com/b/2013/0 2/19/what-is-the-best-online-spanish-english-dictionary.htm&rc=1 2 7 SpanishDict SpanishDict came in a close second to WordReference in the 2013 About.com Reader’s Choice Awards, with 42% of votes cast. As can be gleaned from its name, SpanishDict devotes itself entirely to English-Spanish translation. In contrast to WordReference, the site includes both a bilingual dictionary and a translator integrated into a single search, along with video tutorials, games, flashcards, and other tools of use to Spanish language learners. The site boasts over six million visitors per month.3 According to Alexa.com, it’s US ranking among all websites visited is 1,567, whereas globally it ranks 3,941. For a side-by-side comparison of all three sites, please refer to Table 1. Image 2: SpanishDict homepage. 3 http://www.spanishdict.com/company/about 8 Collins The Collins site, including its Spanish-English dictionary, launched on December 31, 2011.4 Collins contains both a Spanish-English dictionary and a translator tool. Unlike the other two sites reviewed, Collins owes its reputation to its respected paper dictionary, published by HarperCollins. In fact, WordReference uses the Collins Spanish-English dictionary as one of its sources. It is worth noting that the Collins site is not nearly as popular as the other two in this report, with a US ranking of 10,798th among all internet sites, according to Alexa.com. For more traffic information on Collins, please refer to Table 1. Image 3: Homepage of Collins, set to Spanish-English dictionary search. 4 http://www.wired.co.uk/news/archive/2012-01/03/collins-dictionary-online 9 Table 1 Summary of traffic information for the three sites tested in this study.5 WordReference Data Type Global Rank 226 Rankings US Rank 784 Bounce Rate 41.80% Traffic patterns Daily Page Views per Visitor 4.03 Daily Time on Site 5 min, 7 sec Gender Female SpanishDict Collins 3,957 10,804 1,583 10,811 43.20% 55.90% 3.92 2.75 5 min 3 min, 34 sec Female Female Some college Demographics* Education Grad school Grad school or college Access Location Work School Work Absolute 1.365 sec 0.961 sec 2.339 sec Load Time Fast, 60% of Fast, 76% of Slow, 67% of Relative sites are slower sites are slower sites are faster *Indicates which demographic groups are overrepresented among site visitors, when compared to the internet population in general. Data Category 5 Information gathered from www.alexa.com 10 METHODOLOGY 2.1 Participants Participants were nine graduate students studying educational technology at a large research university in the southwest. They ranged from 23-32 years of age and six of the nine were female. It should be observed that the educational background and gender ratio of the participant group fit well with the demographic patterns of the three sites, as depicted in Table 1. Participants were asked about the frequency with which they used the internet and in what ways. Regarding the former, responses by frequency were “more than 8 hours” (n=2), “4-7 hours” (n=3), and “2-4 hours” (n=4), thus indicating frequent internet use as a common characteristic. Regarding the latter, participants as a group reported between 3 and 6 different types of internet use each, with an average of 4.78 (see Background Questionnaire in Appendix E for a list of possible responses). Participants also reported on their experience with language learning in general and Spanish in particular. All participants reported at least one year of foreign language instruction. Participants also reported moderate familiarity with Spanish, but generally low skill level. The most common responses regarding exposure were “up to 1 year” (n=3) and “1-3 years” (n=3) of Spanish instruction, but no participant reported conversational ability in Spanish, and only four participants reported the ability to form simple sentences in Spanish. 2.2 Tasks Tasks were created that would fit the following criteria: 1) Authentic to the Spanish language learning context 2) Accomplishable by learners with little to no background in Spanish 3) Accomplishable within the chosen websites without using outside resources or prior knowledge Additionally, care was taken to provide a balance between the most basic dictionary capabilities – finding the meaning of individual words – and the most advanced features provided by all three sites. Omission of an Exploratory Task In usability testing, it is quite common for the initial task to be exploratory in nature, with participants given free rein to peruse the site, discover its features, and provide commentary on first impressions (Nielsen et al. 2000, Liu et al. 2008, Stevensen & Liu 2010). However, the decision was made to omit an initial exploratory task. As illustrated in Table 1 (the relevant portion is reproduced below in Table 2), visitors on average spend only a short amount of time on these sites (mean=4 min, 34 sec). Likewise, they spend little effort exploring the site, as evidenced by a low number of page views per visitor (mean=3.567) and a high bounce rate 11 (defined as the percentage of visits to the site that consist of a single page view, mean=46.97%). Thus, in an authentic usage scenario, it is unlikely that a user would spend much time familiarizing himself with the site before attempting to achieve his goal. Table 2: Website Traffic Patterns WordReference Data Type Bounce Rate 41.8% Traffic patterns Daily Page Views per Visitor 4.03 Daily Time on Site 5 min, 7 sec Data Category SpanishDict 43.2% 3.92 5 min Collins 55.9% 2.75 3 min, 34 sec Pilot Testing The initial task list was tested in a pilot study, resulting in two significant changes. First, the number of items per task was decreased to a more manageable number. Second, the final, open-ended task was removed because it was found to be beyond the skill of a novice language learner. For a reproduction of the initial task list, please see Appendix A. A new open-ended task was developed, and the entire revised task (See Appendix B) list was tested in a second pilot study. As a result of the second pilot, time limits for each task were determined. Again, the open-ended task was found to require knowledge of Spanish beyond that which could be expected of the novice learner, and this task was removed, resulting in a finalized list of five tasks, reproduced below (for the full version that includes the content of individual items, please refer to Appendix C). Final Task List Task 1: Spanish-English translation Directions: Find an appropriate English translation of a Spanish word, given the context of a sentence. Sample item: 1. Soy alérgico a las plumas en esta almohada. a. I am allergic to the ______________ in this pillow. Task 2: English-Spanish translation. Directions: Find an appropriate Spanish translation of an English word, given the context of a sentence. Sample item: 1. The soccer game ended in a tie. a. El partido de fútbol terminó en un _______________________. 12 Task 3: Verb conjugation Directions: You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it. Sample item: 1. Henry gave me a surprise birthday present. a. Verb (él/preterit): ____________________ Task 4: Dialects Directions: You have a pen pal in Mexico. Find out how he would say the following words in his dialect. Sample item: 1. Trunk (of a car): ________________ Task 5: Idioms Directions: You are writing a sentence in Spanish and need to find a culturallyappropriate word or phrase to replace the following English idioms. Sample item: 1. That test was a piece of cake. a. Ese examen _____________________________________________. 13 2.3 Instruments Table 3 provides an overview of the research questions investigated and their corresponding data sources. Each data collection method will be described in detail below. Table 3: Research Questions Matched to Data Collection Methods # Analysis Level 1 Site 2 Site 3 Site 4 Comparative 5 Comparative Research Question How easy is it to use the site to complete the tasks? What are the most helpful and obstructive features that participants encounter when using the site? What are participants' reactions to the site? Which website produced superior performance results? Which website is viewed most favorably by participants? Completion rate Accuracy rate Searches per item Completion time X X X X X X X Observations Participant Comments Post-task usefulness rating X X X X X X X Post-test satisfaction ratings X X X Observation Record Form A form was created to facilitate the process of taking notes during observation of testing. For each task, this form included spaces to record the following (see Appendix D for a sample page): 1. Prompts used 2. Completion time 3. Accuracy per item a. Accuracy was determined based on the participants’ ability to interpret the site without the use of prior knowledge of Spanish. As such, certain morphological 14 X mistakes regarding the number (singular vs. plural) of a noun or conjugation of a verb (except in Task 3) were accepted as correct. Full (2 points), partial (1 point), and no credit were given as follows: i. Tasks 1 & 2 1. Full credit: correct meaning for the context 2. Partial credit: a correct translation, but incorrect meaning for the context 3. No credit: did not complete or answer is not a translation of the target. ii. Task 3 1. Full credit: correct meaning for the context and correct tense 2. Partial credit: correct tense + incorrect meaning OR incorrect tense + correct meaning 3. No credit: did not complete or incorrect tense + incorrect meaning iii. Task 4 1. Full credit: correct word in the specified dialect 2. Partial credit: a correct translation, but not the dialectal form specified 3. No credit: did not complete or incorrect translation iv. Task 5 (no partial credit given) 1. Full credit: an appropriate Spanish word or phrase 2. No credit: did not complete or inappropriate word or phrase (literal translation or different meaning) 4. Observations and comments made by participants a. All search queries made were recorded in this section. 5. Participants post-task rating of the usefulness of the site a. Immediately after completing a task, participants orally responded to the question, “From a scale of 1 (useless) to 5 (very useful), how would you rate the usefulness of this site in performing this task.” Questionnaires Three questionnaires were created to test the participants’ background knowledge, skills, and habits; pretest attitudes; and posttest attitudes and impressions. Much of these questionnaires was modeled after those presented in Liu et al. (2008). Background The background questionnaire aimed to establish an understanding of participants’ background in three key areas (sample questions follow, but for a full reproduction of the Background Questionnaire, see Appendix E): 15 1. Internet use 2. Multilingualism 3. Spanish For the first, it was reasoned that an inexperience with the Internet would negatively affect testing results. The pretest screener served to effectively screen out participants without the requisite experience and skill using the Internet for general purposes. For the second, it was hypothesized that language learning experience might influence the results of the usability test. Rather than try to control for this variable, it was decided to measure participants’ experience with languages other than their native languages in order to investigate its relationship, if any, with usability testing outcomes. 16 Finally, the tasks were created for participants with little to know Spanish ability. Conversational ability in Spanish would have excluded potential participants from the study. Pretest In the pretest questionnaire, participants self-reported their language translation habits in terms of frequency and preferences. In addition, they reported their attitudes on how useful they perceived an online multilingual dictionary to be for specific tasks related to the task list used in this study. Sample questions from each group are reproduced below, but the full pretest questionnaire can be found in Appendix F. 17 Posttest The posttest questionnaire served to measure participants’ attitudes and impressions regarding specific characteristics and functionalities of the site. Participants rated their degree of satisfaction with attributes related to the information, interaction, and interface of the site. The pretest question regarding their perceptions of usefulness for specific tasks was reproduced verbatim in order to see how participants’ impressions of their particular site related to their expectations of online bilingual dictionary sites in general. Participants were given an openended forum to describe the best and worst features of the site. Finally, they rated the likelihood that they would return to the site when presented with a task similar to those completed during the usability testing. Samples of three different question formats are shown below. The entire posttest is provided in Appendix G. 2.4 Testing Procedures Participants were contacted about participating in this study via email. They were informed of the general topic of the study and responded to a short survey about their interest in 18 participating, availability, and experience with Spanish. At this time, any potential participant who reported conversational ability or fluency with Spanish was excluded from the study. Participants who were retained met with the investigator for one hour to complete the usability test. After introductions, the participant completed the background and pretest questionnaires. The orientation script was read to the participants, and they gave their verbal permission for the session to be recorded. The script described the general goal of the study, the role of the moderator, their role as participants, and the procedures to be followed for the rest of the session. Participants were asked to think aloud during the test, and behave as they normally would under non-testing conditions. The orientation script was modified from a model provided by Rubin & Chisnell (2008) and can be found in Appendix H. Participants were randomly assigned a website to review, and asked if they had ever used the site before. If they had, they would have been reassigned to a different website, but this did not occur during testing. During testing, the moderator read each question out loud. Participants had access to the full task list (Appendix C), and used this document to record their answers to each item. The moderator kept time on a per-item basis and recorded observations. There were two general instances in which the moderator would interject with a prompt: 1. The participant had abandoned the think-aloud technique. 2. The participant had completed all of the items for a particular task with time to spare, but had achieved below 50% accuracy. In the first case, the moderator used discretion to not harass the participants with repeated requests to think aloud or interrupt them if they were deep in concentration. In the second case, this intervention was deemed necessary in order to stimulate the level of motivation that an actual user might feel. As discussed in Nielson et al. (2000), the usability testing situation is inherently artificial. No matter the effort put forth in creating authentic tasks, the tester will never be as motivated to complete them as they would real tasks they came across in their daily lives. To add to this general weakness of usability studies, this particular study involved participants who were not active learners of Spanish. Therefore their motivation was considered a cause for concern. The second prompt scenario described served to modify participants’ behavior to hopefully bring it into closer alignment with how actual, motivated learners would use the site. Hence, when they had completed a task with no attempt at quality (a behavior distinct from that of motivated users), they were prompted to revisit the task with more thoroughness (a behavior congruent with that of motivated learners). The actual prompt used was, “If you were going to check your work, how would you do that?” After completing the tasks, the participant completed the posttest questionnaire. They were asked to think out loud as they completed the questionnaire to elaborate on their responses. The moderator took notes on their elaborations and asked clarifying and follow-up questions when necessary. In this way the posttest and debriefing were combined into a seamless experience. The time schedule for the entire testing event can be found in Appendix I. 19 RESULTS All participant data can be provided upon request. See the Author Information section for contact details. 3.1 Pretest Participants reported more frequent uses of language translators (m=2.56) than language dictionaries (m=2.33). Likewise, responses showed a clear preference for online translators (m=2.44) over multilingual dictionaries (m=2.17). Table 4: Frequency of use and preference for online translators vs. online multilingual dictionaries. Frequency Preference WordReference SpanishDict Collins Average 2.33 2.67 2.67 2.56 2.00 2.67 2.33 2.33 2.33 2.67 2.34 2.45 2.17 2.50 1.83 2.17 Translators Dictionaries Translators Dictionaries In keeping with this trend, they perceived online multilingual dictionaries to have more value in common tasks such as translating a single word (m=4.67), differentiating between multiple meanings of a word (m=4.44), and conjugating verbs (m=4.56) than in less common tasks including finding a term belonging to a particular dialect (m=4.11), translating phrases or idioms (m=4.33), and clarifying a confusing usage or grammar point (m=3.78). Table 5: Perceived value of online bilingual dictionaries for specific tasks. Translating a single word WordReference SpanishDict Collins Average 4.67 4.67 4.67 4.67 Differentiating between multiple meanings Conjugating verbs 4.67 4.33 4.00 4.67 4.67 4.67 4.45 4.56 Finding a term belonging to a dialect Translating idioms 3.33 4.00 4.00 4.67 5.00 4.33 4.11 4.33 Clarifying usage and grammar 3.67 3.67 4.00 3.78 20 3.2 Site Analysis This section will analyze the three websites one at a time from the perspective of the five tasks explored during usability testing in order to answer the first three research questions. 1. How easy is it to use the site to complete the tasks? 2. What are the most helpful and obstructive features that participants encounter when using the site? 3. What are participants’ reactions to the site? In the first section, participant performance on the basic functionality of online bilingual dictionaries – translating a single word from Spanish to English (Task 1) and English to Spanish (Task 2) – is summarized. The section summarizes performance on the more advanced functionality of these sites – conjugating verbs (Task 3), finding dialectal variations (Task 4), and translating idioms (Task 5). The third section details the most helpful and obstructive features of the site in performing these tasks, concentrating on the three main elements common to all five – searching, interpreting the results page, and navigating the site. WordReference Basic Functionality Using WordReference to translate a word from Spanish to English (Task 1) and English to Spanish (Task 2), the basic functionality of a bilingual dictionary, proved to be a relatively easy task for the participants. Data from these tasks are summarized in Table 6, below. All participants completed all items in these tasks within the time allotted. The accuracy rates for Tasks 1 and 2 were 88.33% and 79.17%, respectively, while average usefulness ratings were 2.67 and 3.33. The fact that usefulness ratings were inversely related to accuracy may seem surprising, but can possibly be explained by other factors. Specifically, while accuracy on Task 1 was higher than on Task 2, Task 1 took longer on average to complete (m=79.10% of total time vs. m=64.31%) and required more average searches per item (m=1.92 vs. m=1.25). That usefulness ratings bear direct relationships with searches per item and completion time, but little to no relationship with accuracy is a trend in our data that illustrates that participants have little awareness of the accuracy of their responses. Table 6: Summary of participant performance on Tasks 1 & 2 for WordReference. Task 1 2 Num. of Items 4 4 Avg. Num. of Items Completed 4.00 4.00 Avg. Searches per Item 1.92 1.25 Completion Time (percent of max.) 79.10% 64.31% Accuracy Rate 88.33% 79.17% Usefulness Rating 2.67 3.33 Usefulness was measured on a 5-point Likert scale from useless (1) to very useful (5). Advanced Functionality Tasks 3, 4, and 5, which involved conjugating verbs, finding dialectal variations, and translating idioms, were a greater challenge to the participants using WordReference (see Table 7 below). 21 Finding dialectal variations of words (Task 4) was the easiest task, averaging only 1.11 searches per item. On average, participants completed this task with 52.32% of the allotted time with 72.22% accuracy. In accordance with these other indicators of task ease, the average usefulness rating given was 4.00, the highest rating of all tasks for this website. Conjugating verbs and translating idioms proved much more difficult for the participants. Compared to Task 4, Tasks 3 and 5 required more searches (m=1.56 and m=1.94) and more time to complete (m=82.83% and m=89.44%). Not surprisingly, accuracy rates (m=54.17% and m=66.67%) and usefulness ratings (m=2.67 and m=2.00) were much lower. Table 7: Summary of participant performance on Tasks 3, 4, & 5 for WordReference. Task 3 4 5 Num. of Items 4 3 3 Avg. Num. of Items Completed 3.33 3.00 2.67 Avg. Searches per Item 1.56 1.11 1.94 Completion Time (percent of max.) 82.83% 52.32% 89.44% Accuracy Rate 54.17% 72.22% 66.67% Usefulness Rating 2.67 4.00 2.00 Summary of Features Searching Searching on WordReference was enhanced by the site’s auto-detection capabilities. Specifically, once the user had accessed a certain bilingual dictionary (in our case EnglishSpanish), the site was able to detect the language of the search query and respond accordingly. This meant that participants did not have to both with switching from one language to the other, which saved time and minimized searching errors. Additionally, the site provided automatic suggestions as letters were entered into the search box, which again saved the participants time in not having to type the whole word and also minimized errors due to incorrect spelling. Image 4: Illustration of the language auto-detect feature and word auto-suggest on WordReference. 22 Despite these helpful features, searching on WordReference did produce some frustration in the participants. This occurred when the site returned a results page that was related to, but did not exactly match the query. This would happen when a participant typed in a conjugated verb (site returns the infinitive form), a plural (site returns the singular form), or more than one word at a time (site returns the first word in the search query). These search returns are understandable in light of the standard function of a dictionary – dictionaries have one entry for a verb, not dozens for each inflectional form – but they ran contrary to participants’ expectations, mostly likely because of the group’s previous familiarity with online machine translators. Notably, the reason for the search return was not provided to the user, and it often took participants some time to realize that the results did not match their expectations. Image 5: Illustration of searching for multiple words on Wordreference. In the first image, “between a rock and a hard place” is typed into the search box. The second image displays the results page returned. Notice the site automatically redirects to the results page for the first word in the query, “between”. Results Pages The results pages for WordReference were information-rich, yet confusing for the participants. As participant 2 explained, “The information is there, but it’s not too accessible”. While participants often found the usage examples, grammatical information, and comprehensiveness of the results helpful in completing the tasks, they were also overwhelmed by the sheer amount of information presented at once, and felt its organization lacked clarity. With so many meanings displayed on the page, participants often overlooked the one they were looking for, either because it required extensive scrolling to find (i.e. was located far below the fold) or because the text was difficult to read at a glance. The information is organized into columns, but these lack headings and the two equivalent words are located on opposite sides of the page, separated by grammar information and usage examples. One participant even had difficulty distinguishing the relevant Spanish word from all the other information provided! The grammar explanations were often in the form of abbreviations that participants found confusing, and in general the visual cues on the page provided inadequate support to the novice user. 23 Image 6: Illustration of the text-rich results page on WordReference. Finally, the language of the explanatory text varied automatically depending on whether the searched word was in English or Spanish. Thus, participants found translating from English to Spanish far easier than the other way around. This is true even in the conjugation tables, even though it is far less likely that a Spanish speaker would look up the conjugation table for a Spanish verb than an English speaker would. At one point participant 1 asked in frustration, “Is this site written for people who speak Spanish?” Image 7: The Spanish-heavy results pages on WordReference. The left and right panels present a typical results page for a Spanish word and the conjugation tables for a Spanish word, respectively. 24 Navigating WordReference had some very helpful navigation features that the participants appreciated, but that nonetheless had usability flaws that could be improved upon. First, participants liked that the search bar was both present and prominently located on all results pages, minimizing the necessary clicks to begin a new search. However, in certain cases the search narrowed the results in a way that was not transparent to users. Specifically, if a user initiated a search from the results page of a monolingual dictionary or the conjugation page, then the search would return a result within that section, rather than referencing the wider site. This caused some search error messages that mislead the participants into thinking the word could not be found on the site, when indeed they were “stuck” in the wrong part of the website. Image 8: Illustration of a participant getting “stuck” in the monolingual Spanish dictionary section of WordReference. On the left, the user has typed an English noun into the search bar, expecting to be taken to the English-Spanish results page. On the right, no results are returned because the search was limited to the monolingual Spanish dictionary. In another example of helpful functionality with room for improvement, the results pages provided a deep-linked state that allowed participants to easily investigate the meaning of words without additional typing. However, it was not always apparent to users which words were clickable, indicating inadequate visual cues. Because of this, the deep-linked state sometimes caused unintended navigation. 25 Image 9: Misleading visual cues on the results page of WordReference. Every single word on this page is clickable, though only some appear so. Finally, specific features of the site were difficult for the participants to find, including the conjugation tables, compound forms of words, and the discussion forum. Because these features are so helpful to the language learner, their lack of visibility represents a serious usability flaw. The WordReference forum is highly regarded as perhaps the most reliable Spanish language forum on the web (source: author expertise), yet none of the three participants found the forum during testing. As participant 2 summarized, “It looks like a lot of expertise and thought went into the initial iteration of this site. I am hopeful that they can make further improvements to make things more navigable.” Image 10: Low visibility for the conjugation tool on WordReference. 26 Participants did not immediately notice the Conjugator link at the top of the page. Clicking the arrow would have also brought the user to the conjugation tables, but only one participant discovered this by, in her words, “happy accident”. Image 11: Finding the compound forms of words and the discussion forum required extensive scrolling. SpanishDict Basic Functionality Using SpanishDict, participants completed Tasks 1 & 2 with the highest combined accuracy (m=79.17% and m=95.83%) in the shortest amount of time (m=62.50% and m=69.89% of maximum allotted) of the three groups. The number of searches required was also low (m=1.42 and m=1.17). Consequently, SpanshDict received the highest usefulness ratings of any site and any task on Tasks 1 & 2 (m=4.33 for both). However, it is worth noting that WordReference (on Tasks 1 & 2) and Collins (on Task 2) produced equal or higher accuracy rates as did SpanishDict on Task 1, and yet their usability ratings were not as favorable for these tasks. Table 8: Summary of participant performance on Tasks 1 & 2 for SpanishDict. Task 1 2 Num. of Items 4 4 Avg. Num. of Items Completed 4.00 4.00 Avg. Searches per Item 1.42 1.17 Completion Time (percent of max.) 62.50% 68.89% Accuracy Rate 79.17% 95.83% Usefulness Rating 4.33 4.33 Advanced Functionality SpanishDict did not yield as consistent results with the more advanced tasks as it did with basic dictionary searches. While accuracy on Tasks 3 & 5 was reasonably high, (m=79.17% and m=66.67%) usefulness ratings were much lower (m=3.00 and m=2.00). The usefulness ratings are most likely explained by the extra time (m=86.95% and m= 87.08% of maximum allotted) 27 and searches (m=2.33 and m=1.67) required to complete these tasks. Task 4, finding dialectal information, was clearly the most difficult to perform on this website, with an average accuracy of 38.89%. Two out of three participants exceeded the maximum time allotted for this task. These data easily indicate why the website received the lowest usefulness rating of all tasks and all websites (m=1.67) on Task 4. Table 9: Summary of participant performance on Tasks 3, 4, & 5 for SpanishDict. Task 3 4 5 Num. of Items 4 3 3 Avg. Num. of Items Completed 3.33 2.00 2.67 Avg. Searches per Item 2.33 1.78 1.67 Completion Time (percent of max.) 86.95% 97.50% 87.08% Accuracy Rate 79.17% 38.89% 66.67% Usefulness Rating 3.00 1.67 2.00 Summary of Features Searching The search function on SpanishDict received the highest ratings from participants of the three websites reviewed; however, it too was not without its flaws. Its language auto-detect and word auto-suggest functioned similarly to those of WordReference, and participants viewed them as helpful. In contrast to WordReference, SpanishDict has both a translator function and a dictionary function. These are seamlessly integrated such that either the dictionary or the translator is activated depending on whether the search is one word or multiple words. These two functions are mutually exclusive and automatic, meaning that the user cannot choose which mode to use. Nowhere on the site is this explained and as a result, even by the end of the testing session none of the participants could articulate how the search function worked. While participants generally liked being able to search using more than one word (a common strategy with this group was to type the entire sentence into the search box), the results were often misleading or incomplete. For instance, Tasks 1 & 2 featured ambiguous words with many meanings, which the dictionary clearly explained but were absent in the machine-produced translations. Similarly, dialectal information was not available in translate mode, and idioms were translated literally. In fact, the only task for which the translation mode proved to be more helpful than the dictionary mode was in verb conjugation (Task 3), as evidenced by the vastly higher accuracy rate for the SpanishDict group (m=79.17%) versus the WordReference and Collins groups (m=54.17% and m=33.33%, respectively). 28 Image 12: A side-by-side comparison of the dictionary and translator results pages on SpanishDict. Results Pages The results pages of SpanishDict dictionary had similar flaws to those discussed with regard to WordReference, including an overwhelming amount of information, unclear organization, and confusing abbreviations. One varyingly helpful feature was the verb table featured at the bottom of the dictionary pages for verbs. One participant found it very useful, but two participants did not find it at all due to its location at the bottom of the page. Additionally, the table only appeared when the infinitive form of the verb was put into a search; a search for an inflected form did not produce the table. The verb tables were laid out similarly to WordReference, but with English headings and rollover help captions, which helped participants to interpret them. Image 13: Side-by-side comparison of results pages for a verb in infinitive (left - verb table present) and inflected (right - verb table absent) forms. 29 In contrast to the dictionary results page, the translator results page provided much less information. This was sometimes regarded positively, if the participants felt that the answer was clear. Participants liked that three different translators were used. Participant 4 explained that she made her choices based on frequency: “If two out of three translators say it, that’s probably right”. Participants felt especially confident when all three translators produced the same result. However, these pages did not really give participants enough information to evaluate the quality of the results. Several times participants indicated that they were not confident in their answers, but yet could not figure out a way to verify them. Navigating Navigation on SpanishDict was generally viewed more favorably by its testers than navigation on the other two sites. Participants made use of the Translate and Conjugate tabs located at the top of the page, and the ever-present search bar. However, some of the most helpful features of the site were a bit harder to find. SpanishDict featured both a phrasebook and a language forum. Though these features were accessible through the top navigation tabs – the forum through the “Q & A” tab and the phrasebook through “Moreïƒ Phrasebook” – none of the participants used these navigational features. Instead, the participants accessed the phrasebook and forum through links posted at the bottom of the results pages. While this did eventually lead to successful use of the features, participants were not always quick to notice this feature. Image 14: Two ways to access the Phrasebook and Forum on SpanishDict. On the left, the top navigation tabs used by none of the participants. On the right, links to the phrasebook and forum placed below the fold. Collins Basic Functionality On the basic tasks of supplying bilingual definitions of English and Spanish words, Collins performed similarly to, though not quite as well as, the other two sites. Accuracy rates for Tasks 30 1 & 2 were 75.00% and 79.17%, but this website required the most time (m=86.32% and m=72.50% of maximum allotted) and searches (m=1.67 and m=1.75) for these two tasks combined. Despite this, usefulness ratings were actually higher for Collins than for WordReference (m=3.67 for both tasks). Table 10: Summary of participant performance on Tasks 1 & 2 for Collins. Task 1 2 Num. of Items 4 4 Avg. Num. of Items Completed 4.00 4.00 Avg. Searches per Item 1.67 1.75 Completion Time (percent of max.) 86.32% 72.50% Accuracy Rate 75.00% 79.17% Usefulness Rating 3.67 3.67 Advanced Functionality The advanced features of Collins were not easily usable by the participants. The numbers of searches required for these tasks were the highest of any task on any site (m=2.22, m=2.44, and m=3.62). On Task 3, all participants exceeded the maximum time allowance and on Task 5, two out of three participants abandoned the task before completion. It is not surprise then that accuracy (m=33.33% and m=44.44%) and usefulness ratings (m=1.67 and m=2.00) were quite low. Task 4, however, proved somewhat easier. While one participant abandoned this task, the other two completed it, resulting in an overall accuracy rate of 66.67% and an average usefulness rating of 3.67. Table 11: Summary of participant performance on Tasks 3, 4, & 5 for Collins. Task 3 4 5 Num. of Items 4 3 3 Avg. Num. of Items Completed 2.00 2.33 2.00 Avg. Searches per Item 2.22 2.44 3.62 Completion Time (percent of max.) 100.00% 68.89% 77.64% Accuracy Rate 33.33% 66.67% 44.44% Usefulness Rating 1.67 3.67 2.00 Summary of Features Searching Collins has both dictionary and translator search capabilities, but unlike on SpanisDict, these functions are accessed separately. The dictionary has auto-suggest for words, but does not automatically detect language. In fact, the site does not “remember” previous searches and so the user must be careful to always select the correct bilingual dictionary and source language, otherwise results will not be returned. For the most part, the dictionary only accepts single words in the search query, but it does occasionally accept common phrases. This functionality is hidden, however, and it was common among participants to overgeneralize negative experiences and to presume that the dictionary would not accept multi-word searches. 31 Image 15: The dictionary function on Collins requires an initial language designation in order to return a result (no auto-detect). The translator function is more limited than that of SpanishDict, because it provides only a single response. In addition, it is more cumbersome to use because, like with the dictionary, it does not automatically detect language of entry. Furthermore, since the site does not “remember” previous searches, upon re-access automatically reverts to the default – English to French. This caused participants much frustration because they did not always think to check the language settings before translating. Image 16: The translator function requires an initial language designation in order to return a useful result. Results Pages Collins’s dictionary results pages received the same complaints about information overload, unclear organization, and confusing abbreviations as the other two websites tested. In addition, the site’s use of blue font for some words created a misleading visual cue; participants thought they could click on the words to be taken to other pages, but this was not possible. 32 One unique feature on the Collins dictionary results page are two sections entitled “Browse nearby words” and “Related terms”. The former displays the words (all hyperlinked) that would appear before and after the searched word in an alphabetical dictionary, while the latter displays terms that are morphologically related to the word and phrases that contain it. These features have the potential to be quite helpful, but in fact only one participant found them, because they is located below the fold and off to one side of the page. The one participant to find these tools, participant 8, actually used them on three separate occasions, indicating that she found them worthwhile. Image 17: “Related terms” and “Browse nearby words” on Collins. Navigating Participants did not find Collins easy to navigate. Although the top tabs for Dictionaries and Translator seemed clear at first, participants soon learned to their frustration that the dictionary page required further navigation via tabs in the middle of the page, making accessing the Spanish-English dictionary a three-click process. Participant 8 eventually decided that the easiest way to access the Spanish-English dictionary was through the sitemap footer, but neither of the other two participants utilized this feature. In addition, participants found the top tabs misleading, believing they indicated drop-down menus when they did not. Though Collins has verb tables just like WordReference and SpanishDict, none of the participants found this feature, resulting in the lowest accuracy rate for Collins’s users on Task 3 of any task on any site. 33 Image 18: The verb conjugation tables on Collins were difficult to find. Finally, though all three websites featured heavy advertising, only on Collins did users actually mistake the ads as being a part of the site and click on them. Both participant 7 and participant 9 clicked on the ad shown in Image X. Image 19: Distracting advertisement on Collins. 3.3 Comparative Analysis In this section we will compare the three sites from quantitative and qualitative perspectives in order to answer the last two research questions. 4. Which website produced superior performance results? 5. Which website was viewed most favorably by participants? 34 We will structure our quantitative findings in terms of completion rate, accuracy of responses, number of searches, and completion time and our qualitative findings in terms of post-test perceptions on site characteristics, site capabilities, and overall satisfaction. Completion Rate The completion rate per task is one indicator of the facility with which participants used the features of the site to achieve objectives. An item was considered incomplete if the participant failed to answer it within the allotted time or abandoned the task before completing the item. WordReference and SpanishDict shared the highest completion rates for four out of five tasks, with WordReference outperforming the other sites on Task 4. All participants completed all items for the basic translation tasks, while completion rates for the tasks requiring use of advanced functionality were lower, in the case of Collins, markedly so. Table 12: Average percent of items completed, by task and site. Average – WordReference SpanishDict Collins all sites Task 1 100.00% 100.00% 100.00% 100.00% Task 2 100.00% 100.00% 100.00% 100.00% Task 3 83.25% 83.25% 50.00% 72.17% Task 4 100.00% 66.67% 77.67% 81.45% Task 5 89.00% 89.00% 66.67% 81.56% Average – all tasks 94.45% 87.78% 78.87% 87.03% A number in italics indicates one or more participants abandoned the task before exceeding the maximum allotted time. Accuracy While accuracy would be one of the most important considerations for a user of the website, it was also the metric that participants were least able to judge for themselves. WordReference produced the highest accuracy rates on Tasks 1 and 4, translating a single word from Spanish to English and finding dialectal variations. SpanishDict produced the highest accuracy rates on Tasks 2 & 3, translating a word from English to Spanish and conjugating verbs. These two sites performed equally well on Task 5, translating idioms, and their overall averages across tasks are statistically equivalent at just .16 percentage points separating them. Collins, however, clearly produced the least accurate responses across all tasks. 35 Table 13: Percent of items correct, by task and site. Task 1 Task 2 Task 3 Task 4 Task 5 Average – all tasks WordReference 88.33% 79.17% 54.17% 72.22% 66.67% 72.11% SpanishDict 79.17% 95.83% 79.17% 38.89% 66.67% 71.95% Collins 75.00% 79.17% 33.33% 66.67% 44.44% 59.72% Average – all sites 80.83% 84.72% 55.56% 59.26% 59.26% Number of Searches The number of searches required for each item provides another view at how usable the three websites are. While WordReference required the lowest amount of searches overall and for Tasks 3 and 4, SpanishDict outperformed it on Tasks 1, 2, and 5. Indeed, a view of the data shows the number of searches required on Task 3 for SpanishDict to be an outlier – related no doubt to factors discussed above – such that this website can be considered to generally require the least amount of searches of the three sites tested. Table 14: Average number of searches per item, by task and site. Task 1 Task 2 Task 3 Task 4 Task 5 Average – all tasks WordReference 1.92 1.25 1.58 1.11 1.94 1.56 SpanishDict 1.42 1.17 2.33 1.78 1.67 1.67 Collins 1.67 1.75 2.22 2.44 3.61 2.34 Average – all sites 1.67 1.39 2.04 1.78 2.41 1.86 Completion Time As discussed above, time needed to complete a task had a strong bearing on participant satisfaction. On average, participants needed the least amount of time on WordReference, followed by SpanishDict and Collins. 36 Table 15: Percent of time (based on maximum time allowed) taken to complete each task and site. Average WordReference SpanishDict Collins all sites Task 1 79.10% 62.50% 86.32% 75.97% Task 2 64.31% 68.89% 72.50% 68.57% Task 3 82.83% 86.95% 100.00% 89.93% Task 4 52.32% 97.50% 68.89% 72.90% Task 5 89.44% 87.08% 77.64% 84.72% Average - all tasks 73.60% 80.58% 81.07% 78.42% Note: A number in italics indicates that one or more participants from this group abandoned the task without completing all items. In this case, 100% was used as the value for percent of time taken for that individual. Site Characteristics Participants rated site characteristics on a five-point Likert scale from 1 (very unsatisfied) to 5 (very satisfied). The characteristics have been grouped according to their relations to the “three I’s” of web design – interaction, information, and interface. We can see that SpanishDict received the highest ratings for interaction and interface characteristics. However, WordReference received the highest ratings for information, a fact that aligns well with the finding that WordReference produced the most accurate results of the three websites. It is also interesting to note that WordReference received the lowest ratings of the three sites on interaction and interface, indicating clear priorities for improvement for this site. Likewise, Collins’s low ratings on both interaction and information align with the finding that this site produced the lowest overall accuracy ratings. Table 16: Ratings of site characteristics related to the “three I’s”. WordReference 2.67 Ease of searching Interaction Ease of navigating 2.00 2.34 Average Ease of finding 3.00 information on a page Information Quality of information 3.67 3.34 Average Ease of reading the text 3.33 3.33 Interface General appearance 3.33 Average Overall 3 I Average 3.00 Satisfaction 37 SpanishDict 3.67 3.33 3.50 Collins 2.67 2.67 2.67 Average – all sites 3.00 2.67 2.84 2.67 2.00 2.56 3.33 3.00 4.00 4.00 4.00 2.67 2.34 3.33 4.00 3.67 3.22 2.89 3.55 3.78 3.67 3.50 2.89 3.13 Perceived Value for Accomplishing Tasks Participants rated the value they perceived the site as having for a particular task on a fivepoint Likert scale from 1 (useless) to 5 (very useful). The results are somewhat surprising. Certainly it is understandable that for the majority of tasks SpanishDict received the highest ratings. It is harder to explain is why WordReference did not receive the highest rating for any task, while Collins received the highest ratings in two of the tasks. Somehow, WordReference users finished the testing experience less satisfied with the website than data such as completion rate, number of necessary searches, and completion time would indicate. Table 17: Perceived value of site for specific tasks. Translating a single word Differentiating between multiple meanings of a word Conjugating a verb Finding a term belonging to a particular dialect Translating phrases or idioms Clarifying a usage or grammar point I found confusing Average - all tasks WordReference SpanishDict 4.00 4.67 Collins 3.67 Average all sites 4.11 3.67 4.33 2.67 3.56 2.67 3.67 1.33 2.56 3.00 1.67 3.33 2.67 1.67 2.33 2.33 2.11 1.33 3.33 1.00 1.89 2.72 3.33 2.39 A comparison of the pretest and posttest data help to elucidate the peculiar discrepancy described above. In viewing the change in perceptions from pretest to posttest, it is clear that the Collins group experienced the most dissatisfaction as a percentage of their initial expectations. Notably, posttest impressions of usefulness for all three sites fell below expectations, indicating general dissatisfaction with the sites for the tasks assigned. 38 Table 18: Percent change in perceived value of the sites tested from pretest to posttest. WR Pre WR Post WR %Change SD Pre SD Post SD %Change CPre CPost C%Change 4.67 4 -14.35% 4.67 4.67 0.00% 4.67 3.67 -21.41% 4.67 3.67 -21.41% 4 4.33 8.25% 4.67 2.67 -42.83% 4.33 2.67 -38.34% 4.67 3.67 -21.41% 4.67 1.33 -71.52% Finding a term belonging to a particular dialect 3.33 3 -9.91% 4 1.67 -58.25% 5 3.33 -33.40% Translating phrases or idioms 4 1.67 -58.25% 4.67 2.33 -50.11% 4.33 2.33 -46.19% Translating a single word Differentiating between multiple meanings of a word Conjugating a verb Clarifying a usage -63.76% 3.67 3.33 -9.26% 4 1 -75.00% or grammar point I 3.67 1.33 found confusing -34.34% 4.28 3.33 -21.80% 4.56 2.39 -48.39% Average* 4.11 2.72 The bolded cells in the Average Change row represent the average of all cells above them, not the percent change of the average perceived value rating pre- and posttest. Overall Satisfaction Table 19 below contains a variety of measures – including individual metrics and averages – of overall satisfaction with the websites. Not surprisingly, SpanishDict stands out as the clear favorite, no matter how satisfaction is measured. Table 19: Summary of all satisfaction-related data. WordReference SpanishDict Collins 3 I Average 3.00 3.50 2.89 Average perceived 2.93 3.07 2.94 usefulness - During testing Average perceived 2.72 3.33 2.39 usefulness - Posttest Overall User Experience 2.67 3.33 2.67 2.00 2.67 2.00 Likelihood of returning Average - all metrics 2.66 3.18 2.58 Overall User Experience was rated on the same scale described in the Site Characteristics section above. Likelihood of returning was rated on a five-point scale from 1 (no way) to 5 (I would definitely use this site). 39 DISCUSSION 4.1 Website Performance and User Perceptions This study produced some interesting findings that shed light on the relationship between website performance and user perceptions. To sum up, the accuracy of results produced did not evidence a direct relationship with user perceptions of a site. On the other hand, the amount of time and number of searches needed to complete a task did correlate closely with users’ perceptions of their experience. These findings indicate a potential weakness of usability tests that rely on user perceptions alone. In the long run, a user who discovers that the information he has been gathering from a bilingual dictionary website is not accurate will abandon the site, no matter how positively he initially perceives his experience to be. Because usability testing takes place in an isolated context, researchers should make every effort to gather a variety of types of data to provide the most accurate appraisal of a website’s usability. 4.2 Interaction, Information, and Interface Interaction, information and interface are the three elements that make up any website design. Information can be considered the substance of a website, whereas interaction is the medium through which the user and the website communicate. Finally, interface provides the user with access to the first two. A usability flaw in any of these “three I’s” will negatively impact the others, just as an improvement in one domain may mitigate the problems experienced with another. Participants in the WordReference group showed some awareness of the high quality of its information, rating this domain the highest of the site’s “three I’s”. However, their negative experiences with the interaction and interface of the site caused their overall satisfaction to be lower than that of SpanishDict’s participants, despite comparable or even superior performance by WordReference in quantitative terms. Collins provided access to the same information – the task development process ensured that all items were possible to complete using any of the sites – as the other two sites, yet participants in this group rated its information far below ratings received by WordReference and SpanishDict. Yet again, in Collins we see another case of poor interaction and interface design mitigating the usability of the website’s information. SpanishDict provides a very different example of the interaction among the “three I’s”. The information it provided to participants was in fact equal in quality to that of WordReference, yet participants’ overall positive experiences with the interaction and interface of the site caused them to rate SpanishDict more highly as a whole. 40 4.3 Recommendations The websites tested in this study all have their strengths and weaknesses in these “three I’s”. Because these elements are integrated in the user experience and any design alteration will have repercussions in all three, the recommendations in this study are organized around the domains of user experience analyzed in the Results section: searching, results pages and navigation. Searching Data gathered during this study clearly indicate that all three sites have room for improvement in their search function. In all three cases, information and interaction seem to be the sources of frustration, while the interface itself functions as intended by designers and expected by users. The recommendation of this paper is to take the best elements of each site – WordReference almost never returns an error message, SpanishDict accepts both single and multi-word search queries, and Collins clearly differentiates between its dictionary and translator functions – and improve them further according to recommendations developed by Neilson et al. (2000). Our proposed best practice, to be adopted by any and all of these sites, would combine the following characteristics (sites listed in parentheses already have these characteristics to varying degrees): 1. 2. 3. 4. 5. 6. 7. Provide a clearly visible search box on every page. (WordReference, SpanishDict, Collins) Auto-detect language of input. (WordReference, SpanishDict) Auto-suggest search terms. (WordReference, SpanishDict, Collins) Provide a relevant search result whenever possible. (WordReference, SpanishDict) Accept both single and multi-word search queries. (SpanishDict) Allow user to choose between dictionary and translator modes. (Collins) When a result differs from the original query (e.g. when the singular form of a noun is returned instead of the plural), provide detailed feedback on why that particular result was returned. 8. When no result return is possible, provide detailed feedback indicating why no results was returned and suggesting alternative strategies, such as the other search mode (either dictionary or translator), language forum, conjugation tables, or other features such as SpanishDict’s phrasebook. 9. Provide “Advanced Search” functionality so that users can narrow down the number of meanings returned by the search by such criteria as part of speech, dialect, compound forms, etc. This should be a separate feature from the main search function. Results Pages Whereas the search functions of the three sites all differed in their strengths, the complaints made by participants about the websites’ results pages were remarkably similar. They complained that the information was poorly organized (information architecture) and 41 confusingly displayed (interface). Our proposed best practice would try to improve upon these flaws by building more interactive features into the results pages, which in their current forms are extremely static. 1. At the top of the results page, provide a hyperlinked menu of all the subcategories of responses (e.g. transitive verb, intransitive verb, compound forms, conjugator, phrasebook, forum, etc.). 2. Use accordion-style organization to minimize the amount of information initially presented on a page, but still allow users to find out more information on an entry without having to navigate to a new page (Tidwell 2011). 3. Provide rollover help on all abbreviations and grammatical terms. 4. Provide a deep-linked state such that users can click on almost any piece of text on the page. (WordReference). 5. Provide clear visual cues to indicate what is clickable or not. 6. Provide “Advanced Search” functionality (described above) on each results page so that users can narrow down results after the initial search. Navigation Adhering to the previous recommendations regarding the search functionality and results pages should result in a decrease in the amount of navigation required of the user to accomplish his or her goal. The most serious usability flaws related to navigation in these sites involved poor visibility of important features in the interface. These features are unique to each site, so our recommendations will address them separately, though together they indicate the desirable best practices for all of these sites. WordReference 1. Provide a tab for the discussion forum on all pages. 2. Feature a link to the discussion forum above the fold on all results pages, as described above. SpanishDict 1. Provide more clearly labeled tabs for the phrasebook and discussion forum on all pages. 2. Feature a links to the phrasebook and discussion forum above the fold on all results pages, as described above. Collins 1. Feature links to the “related terms” and “browse nearby words” functions above the fold on all results pages, as described above. 42 SUMMARY This study has examined three websites that provide English-Spanish bilingual dictionaries, WordReference, SpanishDict, and Collins. Participants in this usability study used the websites to complete five tasks. Two tasks approximated the most basic functionality for which a Spanish learner would use an online dictionary. The other three tasks involved more advanced features provided by these sites. In general, usability results were more favorable for the basic tasks than the advanced. Qualitative analysis revealed the most helpful and obstructive features of these sites, elucidating contributing factors to the ease/difficulty of use revealed through quantitative measures. The three websites were then compared in terms of performance and user preferences. The discussion provided recommendations for improvement and best practices based on these findings. URLs FOR SITES TESTED 1. WordReference 2. SpanishDict 3. Collins http://www.wordreference.com/ http://www.spanishdict.com/ http://www.collinsdictionary.com/ 43 REFERENCES Ambiguous Words. Dillfrog. Retrieved November 25, 2013, from http://muse.dillfrog.com/ambiguous_words.php Analytics for any Website. Alexa. Retrieved November 25, 2013, from http://www.alexa.com Dominus, M. (2007, May 15). Ambiguous words and dictionary hacks. The Universe of Discourse. Retrieved November 1, 2013, from http://blog.plover.com/lang/ambiguous.html Erichsen, G. (2013, February 1). Which Online Translator Is Best?. About.com Spanish Language. Retrieved November 25, 2013, from http://spanish.about.com/od/onlinetranslation/a/online-translation.htm Erichsen, G. (2013, February 19). What Is the Best Online Spanish-English Dictionary?. About.com Spanish Language. Retrieved November 25, 2013, from http://spanish.about.com/b/2013/02/19/what-is-the-best-online-spanish-englishdictionary.htm Gaspari, F., & Hutchins, J. (2007). Online and Free! Ten Years of Online Machine Translation: Origins, Developments, Current Use and Future Prospects. Gaspari, F., & Somers, H. (2007). Making a sow's ear out of a silk purse: (mis)using online MT services as bilingual dictionaries. ASLIB. Kellogg, M. (n.d.). About Us. WordReference.com. Retrieved November 25, 2013, from http://www.wordreference.com/english/AboutUs.aspx Liu, M., Traphagan, T., Huh, J., Koh, Y. I., Choi, G., & McGregor, A. (2008). Designing Websites for ESL Learners: A Usability Testing Study. CALICO Journal, 25(2). Nielsen, J., Molich, R., Snyder, C., & Farrell, S. (2000). Search. E-Commerce User Experience (pp. 1-1). Delete: Nielsen Norman Group. Regional Variations in Spanish Words Translated from English. (n.d.). Rennert: Breaking the Language Barrier. Retrieved November 25, 2013, from http://www.rennert.com/translations/resources/spanishvariations.htm Rubin, J., & Chisnell, D. (2008). Handbook of usability testing how to plan, design, and conduct effective tests (2nd ed.). Indianapolis, IN: Wiley Pub. Solon, O. (2012, January 3). Collins launches free dictionary site. Wired UK. Retrieved November 25, 2013, from http://www.wired.co.uk/news/archive/2012-01/03/collins-dictionary-online Stevenson, M. P., & Liu, M. (2010). Learning a Language with Web 2.0: Exploring the Use of Social Networking Features of Foreign Language Learning Websites. CALICO Journal, 27(2), 1-27. Tidwell, J. (2008). Designing interfaces (2nd ed.). Beijing: O'Reilly. 44 APPENDIX Appendix A: Task List (Version 1) Task 1: Spanish-English translation Find an appropriate English translation of a Spanish word, given the context of a sentence. 2. El niño se quedó en su cuarto toda la noche. a. The boy stayed in his ______________ all night. 3. Ella es celosa en su apoyo al Partido Republicano. a. She is _________________________ in her support of the Republican Party. 4. ¡Este cuadro es una obra de arte! a. This ____________________ is a work of art! 5. Las acciones que había comprado declinaron rápidamente en valor. a. The ______________ he had bought quickly declined in value. 6. Cuando miró a su muñeca, se dio cuenta de que había olvidado su reloj en casa. a. When he looked at his wrist, he realized he had forgotten his __________________ at home. 7. Soy alérgico a las plumas en esta almohada. a. I am allergic to the ______________ in this pillow. 8. Hazme el favor de no tocar el piano mientras el bebé está dormido. a. Do me the favor of not ___________________ the piano while the baby is asleep. Task 2: English-Spanish translation Find an appropriate Spanish translation of an English word, given the context of a sentence. 2. Because it was her birthday, she received a free dessert. a. Debido a que era su cumpleaños, recibió un postre _______________. 3. Why don't we take a break? a. ¿Por qué no nos tomamos un ___________________? 4. I was not present for the presentation because I was sick. a. Yo no estuve _______________ en la presentación porque estaba enfermo. 5. The soccer game ended in a tie. a. El partido de fútbol terminó en un _______________________. 6. Without a match I cannot light the candle. a. Sin un _____________ no puedo encender la vela. 7. The thief attempted to scale the castle walls. a. El ladrón intentó _______________ los muros del castillo. 8. Would you please pour me a glass of that? a. ¿Podría servirme un _____________ de eso? 9. Last night I saw a bat on the ground with a broken wing. 45 a. Ayer por la noche vi a un ________________ en el suelo con un ala rota. 10. I want to buy curtains that match my sheets. a. Quiero comprar cortinas que combinan con mis ____________________. Task 3: Verb conjugation You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it. 2. Henry gave me a surprise birthday present. a. Verb: 3. My mom makes delicious cookies. a. Verb: 4. We ran to the store. a. Verb: 5. You draw better than anyone I know. a. Verb: 6. Call Cecilia if you need directions. a. Verb: 7. I took a shortcut. a. Verb: 8. They turned in their papers yesterday. a. Verb: 9. Go away. a. Verb: Task 4: Dialects You have a pen pal in another country (listed in each item). Find out how he would say the following words. 2. 3. 4. 5. Jacket (Mexico): ____________________ Baby (Chile): ___________________________ Grocery store (Uruguay): ______________________ Swimming pool (Mexico): _______________________ Task 5: Idioms You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to replace the following English idioms. 2. That test was a piece of cake. 46 a. Ese examen _____________________________________________. 3. Jenny felt like she was between a rock and a hard place. a. Jenny se sintió _____________________________________________. 4. That dress fits you like a glove! a. ¡Ese vestido _____________________________________________! 5. Jimmy went to cool off in the pool. a. Jimmy fue a _____________________ en la piscina. Task 6: Ill-structured problem You have used Google translate to transcribe a sentence from Spanish to English. However, the sentence doesn’t exactly make sense. Use the site to find more appropriate translations for the words in the sentence. 1. Original Spanish: ¿Eres de las mujeres que durante los últimos meses de 2012 se inscribió en el gimnasio para sudar la gota gorda y lograr el ansiado "verano sin pareo"? a. Problematic translation: You are of the women that during the last months of 2012 was recorded in the gymnasium to sweat the fat drop and to achieve the desired "summer without matching"? i. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: You’re one of the women who ii. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: to work up a sweat or just to sweat iii. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: bikini summer iv. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: joined a gym v. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: vi. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: 2. Original Spanish: No cabe duda de que en los últimos cinco años, el destino de América Latina ha sido influenciado fuertemente por tres de sus más visionarios y decididos líderes: Hugo Chávez, Rafael Correa y Evo Morales. 47 a. Problematic translation: There is not doubt that in the last five years, the destination of Latin America has been influenced hard by three of its most visionary and determined leaders: Moral Hugo Chávez, Rafael Correa y Evo. i. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: ii. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: iii. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: iv. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: v. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: vi. Problematic word/phrase: 1. Original Spanish word/phrase: 2. Proposed better translation: 48 Appendix B: Task List for Usability Test (Version 2) Task 1: Spanish-English translation Find an appropriate English translation of a Spanish word, given the context of a sentence. 1. Soy alérgico a las plumas en esta almohada. a. I am allergic to the ______________ in this pillow. 2. Hazme el favor de no tocar el piano mientras el bebé está dormido. a. Do me the favor of not ___________________ the piano while the baby is asleep. 3. Ella es celosa en su apoyo al Partido Republicano. a. She is _________________________ in her support of the Republican Party. 4. Las acciones que había comprado declinaron rápidamente en valor. a. The ______________ he had bought quickly declined in value. Task 2: English-Spanish translation Find an appropriate Spanish translation of an English word, given the context of a sentence. 1. The soccer game ended in a tie. a. El partido de fútbol terminó en un _______________________. 2. Without a match I cannot light the candle. a. Sin un _____________ no puedo encender la vela. 3. The thief attempted to scale the castle walls. a. El ladrón intentó _______________ los muros del castillo. 4. I want to buy curtains that match my sheets. a. Quiero comprar cortinas que combinan con mis ____________________. Task 3: Verb conjugation You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it. 1. Henry gave me a surprise birthday present. Verb (él/preterit): 2. You draw better portraits than anyone I know. Verb (tú/present): 3. They turned in their papers yesterday. Verb (ellos/preterit): 4. Call Cecilia if you need directions. Verb (tú/imperative): 49 Task 4: Dialects You have a pen pal in Mexico. Find out how he would say the following words. 1. Trunk of a car: 2. Swimming pool: 3. Straw: Task 5: Idioms You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to replace the following English idioms. 1. That test was a piece of cake. a. Ese examen _____________________________________________. 2. Jenny felt like she was between a rock and a hard place. a. Jenny se sintió _____________________________________________. 3. That dress fits you like a glove! a. ¡Ese vestido _____________________________________________! Task 6: Open-Ended Write two sentences in English about your favorite food below. Then, use the site to translate the sentences to Spanish. English Sentence 1: English Sentence 2: Spanish Sentence 1: Spanish Sentence 2: 50 Appendix C: Task List for Usability Study (Version 3 – Final) Task 1: Spanish-English translation Find an appropriate English translation of a Spanish word, given the context of a sentence. 1. Soy alérgico a las plumas en esta almohada. a. I am allergic to the ______________ in this pillow. 2. Hazme el favor de no tocar el piano mientras el bebé está dormido. a. Do me the favor of not ___________________ the piano while the baby is asleep. 3. Ella es celosa en su apoyo al Partido Republicano. a. She is _________________________ in her support of the Republican Party. 4. Las acciones que había comprado declinaron rápidamente en valor. a. The ______________ he had bought quickly declined in value. Task 2: English-Spanish translation Find an appropriate Spanish translation of an English word, given the context of a sentence. 1. The soccer game ended in a tie. a. El partido de fútbol terminó en un _______________________. 2. Without a match I cannot light the candle. a. Sin un _____________ no puedo encender la vela. 3. The thief attempted to scale the castle walls. a. El ladrón intentó _______________ las murallas del castillo. 4. I want to buy curtains that match my sheets. a. Quiero comprar cortinas que combinan con mis ____________________. Task 3: Verb conjugation You are writing a sentence in Spanish but you need to look up a specific verb and how to conjugate it. 1. Henry gave me a surprise birthday present. Verb (él/preterit): 2. You draw better portraits than anyone I know. Verb (tú/present): 3. They turned in their papers yesterday. Verb (ellos/preterit): 4. Call Cecilia if you need directions. Verb (tú/imperative): 51 Task 4: Dialects You have a pen pal in Mexico. Find out how he would say the following words in his dialect. 1. Trunk of a car: 2. Swimming pool: 3. Drinking straw: Task 5: Idioms You are writing a sentence in Spanish and need to find a culturally-appropriate word or phrase to replace the following English idioms. 1. That test was a piece of cake. a. Ese examen _____________________________________________. 2. Jenny felt like she was between a rock and a hard place. a. Jenny se sintió _____________________________________________. 3. That dress fits like a glove! a. ¡Ese vestido _____________________________________________! 52 Appendix D: Sample Page of Observation Record Form 53 54 Appendix E: Background Questionnaire 55 56 Appendix F: Pretest Questionnaire 57 58 59 Appendix G: Posttest Questionnaire 60 61 62 Appendix H: Orientation Script Thank you for agreeing to participate in this usability study. I’ll be using a script to ensure that my instructions to everyone who participates in the study are the same. The purpose of this study is to compare the utility of three online Spanish-English dictionaries for a variety of tasks that are typical of second language learning contexts. You will be testing one of these sites. During the session, I will ask you to use the website to complete the tasks and will observe you while you do them. As you do these things, try to do whatever you would normally do. Each task has several items. You will have a set amount of time to complete each task. I will keep time and ask occasional questions, but in general I am not supposed to interfere with your interactions with the site. Please know that I’m not testing you, and there is no such thing as a wrong answer. All of the tasks are possible to complete using the website. However, you may find that some tasks are easier to complete than others. Remember, the relative difficulty of each task is largely due to the design of the website. I ask that you please think out loud while performing the tasks. Just tell me whatever is going through your mind. Your doing this helps me understand how users interact with the site. The whole session will take about 60 minutes. Do you have any questions before we begin? 63 Appendix I: Time Schedule for Testing Event Event Introduction + Pretest + Orientation Script Task 1 Task 2 Task 3 Task 4 Task 5 Posttest + Debriefing Total time Time allowed 10 min 8 min 8 min 10 min 6 min 8 min 10 min 60 min 64 AUTHOR INFORMATION Elena Winzeler is a M.Ed. candidate in the Learning Technologies program at the University of Texas at Austin. Her academic interests lie in second language acquisition, literacy development, and instructional design. She is a fluent, non-native speaker of Spanish. To contact the author via email, please write to emwinzeler@utexas.edu. This paper is written by Elena Winzeler for the course EDC385G Designs & Strategies for New Media at the University of Texas at Austin. 65