2014 3rd International Conference on User Science and Engineering (i-USEr) Wikipedia Search Engine: Interactive Information Retrieval Interface Design Amanjot Kaur Sandhu1 and Tiewei Liu2 College of Fine Arts1 School of Information2 The University of Texas at Austin, Texas, USA {1kaur.amanjotsandhu, 2twliu}@utexas.edu Abstract— Wikipedia search interface was redesigned based on the literature review for this project. An initial interface was designed with interactive information retrieval features covering the aspects of search box, categories, navigation, layout and search result views. Eight randomly selected subjects tested the interface in the scenario of simulated search tasks and provided their feedback via post-task questionnaire. A redesigned search user interface was proposed based on the subjects’ feedback to the initial design. The new interface is expected to meet users’ search preference and will inform further IIR (Interactive Information Retrieval) interface design. according to traditional classification schemes. Subjects were asked which set of categories helped them best to complete the task and why one set of categories was more helpful than others. Keywords - Wikipedia search, interactive information retrieval, search user interface iv. What kind of layout is preferred by users? Subjects were presented with several layout designs of the result page and were asked for their preference. Whether the different elements (categories, search results, etc.) displayed on the screen impacted the search experience was the research question to be answered. I. iii. Is our navigation design flexible? Search is a long journey during which users’ information needs to change frequently. Subjects usually jump back and forth and from page to page. Subjects were asked to evaluate the flexibility of our crumb box navigation design and other navigation features. INTRODUCTION Interactive search user interface is a hot topic in the information science. Numerous features have been proposed to optimize user experience in the past two decades. Though most features have been adopted in the popular search engines, online communities and other websites, to what extent can these features help in the search process and whether users really like using these features are questionable. The purpose of the study is to get users’ search preference on the platform of a self-designed Wikipedia search interface and in the scenario of simulated search tasks. This study assisted users to compare and evaluate several key features of interactive search interface design, such as the search box, control features, navigation and layout. Then the search interface was redesigned based on the subjects’ feedback. The designed interface is expected to be an ideal interactive search interface that can meet users’ search preference. The results of the user tests will also inform further IIR interface design. Study was intended to answer the following questions based on the experimental results: v. Are the different search result views useful to users? Subjects were shown various formats of search results, including texts, images and metadata and asked whether the different views of search results were useful. Users’ preferences to different formats of information display were learned. vi. Are there other interactive features users want to see in a search engine? Subjects were asked to recommend interactive features to be added to the interface being tested. The answers reflected users’ preference and expectation for the IIR interface. II. Many efforts have been made by information professionals and computer scientists to investigate features that can optimize the usability and better users’ experience in the interactive search engines. Russell-Rose and Tate [1] argue that recognition over recall is one of the key principles in HCI. People are better at recognizing things they have previously experienced than recalling them from memory. Auto-complete accommodate such human nature by transforming the problem of recall into recognition. The auto-complete function provides users with the option of whether to select from the suggested list or to enter the query in full. However, the latter has two advantages: It helps users save time and keystrokes; i. Are the auto-complete and query clarification features of the search box useful? Subjects were asked to compare non-interactive search box and interactive search box with auto-complete and query clarification features. ii. What kind of categories helps users better in the search? Two sets of categories were shown to subjects when they performed the search tasks. One set of categories was created with clustering and sorting method. The other was designed 978-1-4799-5813-9/14/$31.00 ©2014 IEEE RELATED WORK 18 2014 3rd International Conference on User Science and Engineering (i-USEr) Scatter/Gather method was more difficult to use than the classic information retrieval systems in terms of user perception though it helps the subjects accomplish the tasks more efficiently. In fact, Scatter/Gather clustering was particularly useful when users are less familiar with the search tasks [10]. and it avoids spelling mistakes and typographic errors. Russell-Rose and Tate distinguish the use of auto-complete and auto-suggest as auto-complete is used for lookup; while auto-suggest accommodates exploratory search [1]. White and Roth [2] discuss query suggestion in the context of exploratory search systems (ESSs), which must offer users the ability to specify information needs. Users’ existing knowledge impacted the User- defined queries and it may be the possibility that this also limits the opportunity for exploratory search. Query suggestions help users to select additional query terms [2]. When suggestions are generated from the historical query log data, they actually narrow a search to target a particular subtopic [2]. Wilson [3] breaks the elements of search user interface into four main groups (input, control, informational and personalizable) and discusses auto-complete in this framework. Auto-complete guides people towards queries that are likely to work. Since auto-complete provides information to the searcher as they query, it “helps make the search box a better Informational feature as well as an Input feature. Auto-complete can also be personalizable with the queries a user has been used in his search history [3]. White and Marchionini [4] take interactive query expansion (IQE) as a useful technique that helps users formulate improved query statements and ultimately retrieve better search results. They introduced a technique called Real-Time Query Expansion (RTQE), which “offers query expansion terms to searchers as they enter queries, and updates following each term to reflect potential completions of the search query”. Tidwell [11] takes page layout as the art of manipulating the user’s attention on a page to convey meaning, sequence, and points of interaction. Visual hierarchy is an important element of page layout. “A good visual hierarchy gives instant clues about the relative importance of page elements and the relationships among them” [11]. The large text block located in the center of the page is usually the primary content; while small but important items should be put at the top of the page, along the left side or in the top-right corner [11]. Russell-Rose and Tate [1] discuss three main choices of layout: vertical, horizontal and hybrid when talking about the faceted search. The most common vertical layout places facets on the left. “It provides visual coherence that helps reinforce the relationship between the selections made and the results returned” [1]. In addition, it helps maintain visibility if the browser is resized [1]. Some websites choose to display the facets on the right, such as the Harvard University Library and the Edinburgh University Library [12], [13]. In the horizontal layouts, facets are placed on the top of the page. In this way facets are placed at a more dominant and visible position of the page. However, the number of facets shown on the page depends e page. The facet menu will also be invisible when users scroll down the results. The hybrid layout combines the features of vertical and horizontal configurations and arranges the facets both on the sides and the top. Classification systems aim to help make information more findable and usable by removing some of the ambiguity of language [5]. Categories, as a manual classification, are sometimes discussed in comparison to the automated grouping approach of clustering. Hearst [6] argues that category systems are usually logical and consistent. They present wellunderstood and predictable meaning units. Besides, category systems navigate well in a hierarchical structure. Documents also need to be manually assigned to categories. By contrast, Clustering methods are fully automatable. But they are less consistent, coherent and comprehensible. In addition, they usually lack predictability, mix different dimensions simultaneously. Current online clustering systems cannot produce understandable results in a hierarchical structure [6]. Faceted classifications are less applicable when collections are large and unmanaged [3], [7]. Wilson [3] argues that clustering, the approach to automatically identify attributes of a collection or result set is more important in this circumstance. However, he admits that the results of the automated classification can be highly variable and it is difficult to generate meaningful groups and effective labels [3]. Tidwell [11] argues that navigating around a website or application is like commuting. A good navigation design should shorten the distance a user must travel in search. Navigation incurs a cognitive cost. Interface designers should pay attention that the cost can’t be too high [10]. Images and metadata are alternative formats of displaying information. They can also increase the visualization of text data. RussellRose and Tate [1] consider these applications at an aggregate level, that is, they are designed to aggregate, organize and summarize data from numerous sources by using data visualizations to communicate key metrics, patterns and overall status [1]. III. METHODOLOGY Both qualitative and quantitative implemented in this project. approaches were A. Literature Review The initial interface was designed based on literature review. Previous research results concerning users’ information needs and preferences were good resources that helped to understand user behavior. [1] Some frequently recommended features were adopted in the initial design and study was intended to get users’ feedback to these features. Clustering is a popular topic in recent years. Clustering has achieved high precision, recall and efficiency in information retrieval because of its advantages in domain independence, scalability, and the potential to capture meaningful themes within a set of documents [3], [8]. Clustering enables the user to explore a collection through interaction and a form of query preview. However, as Tunkelang [9] states the clusteringbased Scatter/Gather work assumes that documents only contain unstructured text. This minimal data model limits the power of an exploratory interface [9]. Study shows that the B. User Test Eight users were recruited to perform two imitated search tasks on the initial search interface. Each subject needs to 19 2014 3rd International Conference on User Science and Engineering (i-USEr) answer several post-task questions. Their answers informed that how to improve the interface design. • Search Task 2: Search for the Apple Company’s Wikipedia images. C. Observation The subjects were asked to complete a post-task questionnaire after they completed both search tasks. The questions covered the key features to be studied in this project. Data collection was anonymous. The subject’s name only appeared on the consent form. No other personal data was recorded. Each subject was assigned a number that was used to record the study result. The process of user tests was observed to better understand the difficulties subjects encountered when completing the tasks on the interface. The in-time notes taken provided an additional support when analyzing the data collected in the user tests. D. Statistical Analysis Statistical method was implemented when analyzing the data collected in the user tests. The result of the statistical analysis also sets criteria for whether and how to modify the features in the initial design. IV. D. Stage 4: Data Analysis The questionnaires collected in the user tests were carefully studied. Each question in the questionnaire was designed to get users’ feedback to one of the features to be studied in the project. Most objective questions were evaluated with five scales: strongly agree, agree, neutral, disagree and strongly disagree. Subjects’ feedback proved to be stable and consistent in general. Almost all the objective questions received overwhelmingly more numbers of “agree” and “disagree”. Users’ preference to the interactive features was thus obviously seen. Answers to the 3 subjective questions were listed and analyzed one by one. In-time notes made in the user tests provided additional support when interpreting the answers to the questionnaire. RESEARCH PROCESS There were five stages in the research process: project idea and literature review, initial interface design, user test, data analysis, and redesign of user interface. A. Stage 1: Literature Review The initial idea of the project is to find out users’ preference to the features of interactive search interfaces that are related to categories, including what kinds of categories users prefer and how to display those categories. Some books and articles were read in the field of interactive search interface, information seeking and user experience. The bibliography includes the various readings and literature from different resources. In this process, more IIR design features caught our attention. However, interactive search box and diverse format of search results (text, image, meta data) were then added to the research plan in order to make the search experience more complete to users. E. Stage 5: Redesign of User Interface Redesign of the Wiki search interface was based on the data collected in the user tests. Design features received positive feedback was remained; while those received negative feedback were abandoned. New features the users recommended were adopted if they were compatible with our original design that received good feedback or if they can replace the original features that received bad feedback. The following changes were made to the initial design: 1) Search query clarification page In this page, user was asked to clarify the search query and displays the various possible categories related to that query (see Fig. 1). We changed the terminology of some categories after the users’ feedback. 2) Search results list page This page shows the search results in the list view (see Fig. 2). The left side displays all the related categories. User can change the language also. User can navigate to three different views: List, Images and Keywords. After user testing, the tool tip of the labels added with the icons of these views. Also B. Stage 2: Initial Interface Design After the features to be studied were decided, the initial interface design was made. For the convenience of data retrieval, Wikipedia search engine was decided to design and data retrieved from Wikipedia, one of the most popular databases of open access was used. The initial search interface contained the features to be studied, such as auto-complete, query clarification, categories, various search results views and flexible navigation. All the features were incorporated in the minimum number of web pages. Finally, a search engine consisting 12 pages was created. Two versions of prototype were created, one with Indesign and the other with Axure. C. Stage 3: User Test After the prototype was created, 8 subjects were invited to test the usability of the search interface. The subjects were randomly selected students, ranging from freshmen to PhD students. All are English proficient. The subjects were informed to the purpose of the research and the procedure of the user study. Subjects were also informed that the study was voluntary and confidential. Each subject signed a consent form before participated in the study. Each subject was then asked to complete two search tasks on the interface. • Search Task 1: Search the information of the Apple Company. Fig. 1. Search query clarification page 20 2014 3rd International Conference on User Science and Engineering (i-USEr) users like this layout over the other layouts which were used during the user testing. They want the categories to be displayed on the left side as it is the traditional way of showing. 3) Search results images page This page displays the Wikipedia images of the search results (see Fig. 3). On the mouse over of any image, it will display the slide show of all the other images of that Wikipedia page in that thumbnail. After user’s feedback, the snippet of the Wikipedia page added with the image thumbnail. The heading of this page will also refine so that users’ can easily recognize what this page is all about. 4) Search image details page User will navigate to this page after clicking on any image of the search results images page (see Fig. 4). All the images of that Wikipedia page displays here. User can click on any thumbnail and that selected image will be displayed in bigger size with the image information below. After the user’s feedback, the link to that Wikipedia page has added and also the clear back navigation to the previous page has added. 5) Search results keyword page This page shows all the related keywords of a category (see Fig. 5). This page will help researchers and experts to see the possibilities of all the related keywords of any category they will be searching. After user’s feedback, the clear heading of this page will be added so that users can easily understand what this page is all about. V. Fig. 2. Search results list page RESULTS Each design feature of the search interface was studied with specific objective questions in the questionnaire. Answers to the subjective questions provided explanations to subjects’ preference in the objective questions in some cases. They also helped researchers get users’ preference to other features of interactive search user interface which were not considered in the initial design but of significant importance in IIR study. Fig. 3. Search results images page A. Search Box The search box has two interactive features: query autocomplete and query classification. When being asked whether the auto-complete function is useful, 4 of the 8 subjects agreed it is useful (see Fig. 6). When answering the subjective questions, one subject suggested adding two options, i.e. text search and image search, to the search box so that users can search within a limited scope from the very beginning. Fig. 4. Search image details page B. Categories Two pages with two different sets of categories were shown to the subjects. One set was used by another popular Wikipedia search engine – SearchTechnologies and generated according to the frequency of Wikipedia tags being clicked by users. In fact, this method of generating categories is clustering and sorting of tags. The other set was designed by the researchers based on the common understanding and traditional classification principles. The entries included in each set of categories are: Fig. 5. Search results keyword page 21 2014 3rd Internatioonal Conference on User Science and Engineering (i-U USEr) subjects, 3 subjects agree annd 2 strongly agree that the different views of the search results are helpful; while the other 3 found the different foormats of search results are not beneficial. Some subjects com mmented that the metadata view would be helpful for experrts. Another subject said that metadata page would be helpfuul when the user wanted to play around what all is available forr a particular category. F. Search Engine a subjects to evaluate the One objective question asked Wikipedia search engine as a whole. w Being asked whether the search engine is easy to use, 2 subjects strongly agreed and 4 agreed that it was easy to usse (see Fig. 7). The other two subjects held neutral opinion. 4 3 The autto-complete function is useful in n the search task. 2 1 The que ery clarification is 0 useful in n the search task. Stronly Agree Agree Neutral Disagree Strongly Disagree Fig. 6. Search box query results 1) SearchTechnologies categories Apple II Games, Commodore 64 Gam mes, Year of Birth Missing (living people), DOS Games, Amerrican Film Actors, Amiga Games, American Television Actors,, English-language Films, Mac OS Games, Windows Games, Atari ST Games, IOS Games, 2011 Singles and Atari 8-Bit Faamily Games. 2) Researchers’ categories All, Corporation, People, Product, Eatables, Folklore, Books, Games, Places and Scholarly Papers. G. More Features The two subjective questions, one asking subjects what t search engine and the other features they want to add to the asking for other suggestions, received very good feedback. The suggestions can be geneeralized into two categories – labels and visualization. 1) Labels Two subjects suggested adding titles or descriptions indicating what the image wass and where it was retrieved to each picture. Three subjects suggested s adding labels to text, image and metadata icons dispplayed on the upper-right of the search results. Researchers alsso observed that some subjects had difficulty finding the buttoon of the “Image View” during the user test. Only 1 subject found both sets of categoories useless as the subject only used the search function and select “All” to complete the task. The other 7 subjects unnanimously agreed that the categories created by the researcherrs help them most when performing the task. From the follow-uup question asking why they prefer that set of categories, we can see that most subjects think researchers’ categories are moore useful because they are simple and broad. Some also thouught the categories used by SearchTechnologies were too speecific and hard to understand. Another thought the SeearchTechnologies categories would be useful when users narrow n down the search result; but they were not be displayedd as the first-level filter. 2) Visualization Some subjects came up witth suggestions that can improve the visualization of elementss of the search interface. For example, the circles on the query q clarification page can be dynamic & animated; the skip button should be better placed; the metadata view should havve smaller font and more clear view; instead of bold and hiighlighted words in the search result list, it is better to show taags or categories. C. Navigation One key navigation feature of the initiial interface is to show the crumb box at the top of the searchh results. However, the navigation function is tested throughou ut the website with commonly used features like navigational laabels and buttons. When being asked whether the navigation is flexible and easy to use, 5 subjects held neutral opinion; 2 subbjects agreed and 1 strongly agreed that the navigation was well designed. Two subjects did not find a way to exit from thee pages displaying images and suggested an “Exit” or “Back”” button could be added to these pages. VI. DISCUSSION I The study proved to be fruuitful considering its reliability and validity. “Validity is the extent to which methods and measures allow a researcher too get at the essence of whatever it is that is being studied, whhile reliability is the extent to which the method and measuures yield consistent findings” [14]. The study achieved compparatively high reliability. As an experiment conducted in the lab l environment, the situations subjects experienced were tightly controlled by the researchers, such as receivinng the same instructions and D. Layout Three different layouts of the search result page were presented to the subjects, with categories onn the left side, on the right side and on the middle-top of the page respectively. Out of the 8 subjects, 6 prefer the layouut with categories displayed on the left. The other two layoutss each gained one preference. 4 3 2 This search engine is easy to use. 1 0 E. Formats of Search Results The search results were presented in thrree formats – text, image and metadata. Four pages were dessigned to test the usefulness of the diverse formats of results display. Of the 8 Stronly Agree NeutralDisagree Strong gly Agree Disagre ee Fig. 7. Search Engine as a whole query results 22 2014 3rd International Conference on User Science and Engineering (i-USEr) performing exactly the same tasks on the same interface in the same environment. The strictly controlled experiment conditions make consistent research results possible. Several efforts were made to improve the validity of the study. extensions and visualizations are not the concern of this study. However, users obviously have expectations in these aspects. Future researchers should pay more attention to these issues. In summary, this study invites researchers to several interesting topics of IIR. The study thus was proved to be fruitful. First of all, the subjects were randomly selected students. All are English and computer proficient. Therefore, all the subjects were assumed to be able to fully understand and competent to complete the search tasks. Second, it may be questionable whether subjects’ behavior exhibited when they were aware of being monitored with Morae would be the same as their behavior in a natural environment. However, subjects were informed that the study was anonymous. Each subject was assigned a number that will be used to record the study result and there is no way to link subjects to the data. A consent form was signed by each subject before participation. The strict protocol between subjects and researchers guaranteed that subjects could freely express their opinion in the study. Third, the study was well focused on several key features of interactive search user interfaces and the post-task questions were designed specifically to get subjects’ preference to each feature. Thus results spoke directly to the research questions. Therefore, the researchers could get at the essence of the issue to be studied. VIII. LIMITATIONS Though the study achieved reliability and validity to some extent, the results of this research have certain limitations. First, in order to complete the course project, researchers had to finish the literature review, design prototype, conduct user tests and redesign the search engine within a limited time. Some features, such as adding “Exit” or “Back” buttons to the image pages, should have been considered by the researchers at the stage of prototype design. Failing to include such a function might have impact to subjects’ answers when they evaluate the navigation function and the general performance of the search engine. Second, only two tasks were designed to test the Wikipedia search engine. Moreover, the two tasks tested different features. This greatly reduced the external validity of the results of this study. REFERENCES VII. CONCLUSION [1] The results of the user tests well spoke to the research questions to be studied. Users’ preferences to the key features of the search interface were stable in general. Most features designed based on previous studies were welcomed by the subjects. This study contributed to further IIR study in several aspects: [2] [3] [4] First of all, an ideal design of interactive search user interface was proposed based on the results of user study. This design contained some features that can meet most users’ search preference and thus can be used as a model for search interface development. Second, categories generated automatically by the search systems based on the frequency of clicks are very popular in recent years. Many search engines, including some successful and popular ones, have adopted this clustering and sorting method to create categories. However, this study shows that users still prefer categories created according to the traditional hierarchical classification schemes, which are usually characterized as systemization and generality. These findings are also supported by the study conducted by English etc., which suggested that the explicit exposure of hierarchical faceted metadata in a manner that is intuitive and inviting to users can strikingly optimize the usability of user interface [15]. Third, whether metadata should be used as a format of displaying information to general users and how to display it is a topic to be answered in further studies. As we can see from this study, some subjects thought the metadata view were useless in the search; while some believed that metadata could only be used in some specific cases. Fourth, the study discovered several features that deserve future investigation. The features such as labels, [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] 23 T. Russell-Rose and T. Tate, “Designing the search experience: The information architecture of discovery,” Newnes, 2013. R. W. White and R. A. Roth, “Exploratory search: Beyond the queryresponse paradigm. Synthesis,” Lectures on Information Concepts, Retrieval, and Services, 2009, 1(1), 1–98. doi:10.2200/S00174ED1V01Y200901ICR003 M. L Wilson, “Search user interface design synthesis,” Lectures On Information Concepts, Retrieval, And Services, 2012, 3(3), 1–143. doi:10.2200/S00371ED1V01Y201111ICR020 R. W. White and G. Marchionini, “Examining the effectiveness of realtime query expansion,” Information Processing and Management, 2007, 43(3), pp. 685–704. doi:10.1016/j.ipm.2006.06.005 G. Smith, Tagging: People-Powered Metadata for the Social Web. Berkeley, CA: New Riders Publishing, 2007 M. Hearst, Search User Interfaces, Cambridge University Press, 2009. J. Teevan, S. Dumais, and Z. Gutt, “Challenges for supporting faceted search in large, heterogeneous corpora like the Web,” HCIR, 2008. Redmond, WA, USA. M. Hassenzahl and N. Tractinsky, “User experience: A research agenda,” Behaviour & InformationTechnology, 25(2), 91–97, 2006.DOI: 10.1080/01449290500330331 D. Tunkelang, “Faceted search. synthesis,” Lectures on Information Concepts, Retrieval, and Services, 2009, 1(1), 1–80. doi:10.2200/S00190ED1V01Y200904ICR005 X. Gong, W. Ke, Y. Zhang, and R. Broussard, “Interactive search result clustering: A study of user behavior and retrieval effectiveness,” Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 07/2013, pp.167 – 170. J. Tidwell, Designing Interfaces, O'Reilly Media, 2010 The Harvard University Library Hollis System. [Online]. Available: http://hollis.harvard.edu/?q=HCI The Edinburgh University Library OPAC. [Online]. Available: http://catalogue.lib.ed.ac.uk/vwebv/search?searchArg=hci&searchCode= GKEY%5E*&searchType=0 D. Kelly, “Methods for evaluating interactive information retrieval systems with users,” Found. Trends Inf. Retr., 2009, 3(1—2), 1–224. doi:10.1561/1500000012. J. English, M. Hearst, R. Sinha, K Swearingen, and P. Yee, “Flexible search and navigation using faceted metadata,” Technical Report, University of Berkeley, 2002.