TOWARD USABLE BROWSE HIERARCHIES FOR THE WEB Kirsten Risden Microsoft Research 1 Introduction The World Wide Web presents both new challenges and opportunities for conducting Human Factors work. Browse hierarchies used to classify web content present an interesting case in point. On the one hand, the size of the domain to be classified and its general-purpose intent (i.e., it is intended for all users and retrieval of all types of information) make traditional techniques such as card sorts too unwieldy. On the other hand, the nature of the Web makes efficient data collection from a relatively large number of people a possibility. The purpose of this initial, small-scale study was to begin to investigate usability methods that have the potential to scale to the range of users and tasks and at the same time take advantage of the data collection possibilities that exist for the web (e.g., server log data.) A coherent, learnable category structure is a central goal for browse hierarchies such as those on Yahoo, Excite, msn.com and other Internet portals. Such a structure will allow users to efficiently find the information they need and to become more and more proficient in using the hierarchy over time. We know from cognitive psychology (Rosch 1975) that coherent, learnable category structures have high within-category similarity and high between category discriminability. For abstract categories, such as those found in browse hierarchies for the web, we also know that linguistic cues that highlight relevant features of the categories can be important (Horton and Markman 1980). Categories whose members go together in a loose way, have high overlap with other categories, or are represented with general labels that do not highlight reasons for category membership should be difficult for people to use. Beyond the difficulties of size and the general-purpose nature of browse hierarchies in the creation of usable browse hierarchies, the fact that once a browse hierarchy is released to the Web content continues to be added makes maintaining a good user experience challenging. Changes to the makeup and size of categories mean that the category structure can continue to grow and evolve new categories and new category structure. Accommodating changes may have unanticipated effects on the user’s ability to find information. Taking the hypothetical example in Table 1 below, evolving from an “Arts” to an “Arts & Humanities” category may lead to confusion between this new category and a “Culture & Society” category within the same browse hierarchy. This is presumably because increases in the generality of a category demand more encompassing labels, which in turn, allow for greater overlap between category content. So in addition to doing usability evaluation during the creation of a browse hierarchy, there is a need to do usability “check ups” on and ongoing basis. ARTS Art History Artists Design Arts Museums Theory Visual Arts ARTS & HUMANITIES Architecture Art History Artists Culture Design Arts Humanities Museums Musical Arts Performing Arts Photography Theory Visual Arts CULTURE & SOCIETY Architecture Art History Culture Death and Dying Fashion Food and Drink Gender Religion Holidays Mythology Table 1. Illustration of how changes within one category (Arts) may lead to greater similarity and user confusion across categories (Arts & Humanities versus Culture and Society). Added categories are bolded. Clearly, it would be nice to have a way to continually monitor how effectively and efficiently users can locate information as well as the source of any difficulties they are experiencing. Users should be able to make a direct path through the hierarchy to the content they want easily identifying which categories lead to the information and differentiating them from categories that will not lead to the desired information. Traversals between major categories of the browse hierarchy on the same information retrieval task would be evidence of confusion and an indication of usability problems in areas where traversals are common. Such data may be obtained through server logs under certain circumstances. The goal of the following study was to determine the potential usefulness of tracking traversal patterns through a browse hierarchy as a way to monitor confusion and determine its source. 2. Study design and methodology 5 participants were asked to complete 35 information retrieval tasks using an experimental browse hierarchy. The hierarchy was presented to subjects within a software tool that displays categories in a simple hierarchical format and records user paths. This software set the tasks in a context in which user interface problems would not interfere with the specific interest in the usability of the categories. The 35 information retrieval tasks were based on popular activities on the Web, the participants did the tasks in different orders, and they were allowed to “back up” in the hierarchy if they thought they needed to use a different area. Of particular interest were the top-level categories explored on a given task. This was, in part, to simplify analysis and, in part, because top level categories pose challenges to finding information in a browse hierarchy. Top-level categories tend to be more general. As a result it is often quite difficult for users to determine which top-level category a particular topic is likely to be in. If the methodology I am proposing is useful, it should be sensitive to this difficulty and should expose the primary sources of user problems. The top-level categories of the browse hierarchy used in this study are shown in Table 2 below. Business & Finance Computers & Internet Entertainment & Media Health & Fitness Home & Family Interests & Lifestyles People & Communities News & Information Reference & Education Sports & Recreation Travel & Leisure Table 2. Top level categories of the browse hierarchy used in this study. 3 Results Exploration of multiple top-level categories on the same task was assumed to indicate a lack of certainty or confusion regarding where the information would be located. Each top-level category explored on a given task was scored as “confused” with each of the others explored on that task. For example, if the task was to look for information about bike riding and the user looked in Health & Fitness first, Interests & Lifestyles second and finally settled on Sports & Recreation, then each of these categories would be scored as being confused with one another on this trial. Confusability matrices were constructed by tallying the number of times each pair of categories was confused across tasks and subjects. Analysis of these data was organized around three questions. 1) How prevalent is confusion between top-level categories in the browse hierarchy? 2) What is the structure of this confusion? 3) What is the source of the problem? The average frequency with which top-level categories were confused with one another was 18.55 across the 35 trials. 44% of that occurred during the first ten tasks for an average frequency of 8.18. The average frequency with which toplevel categories were confused on the last ten tasks was 3.82. This indicated that there was a substantial problem with differentiating the categories from one another; one that diminished but was nonetheless present even after 25 trials. To determine the structure of confusion, a network representation was created. (See Chi 1983 and Chen 1997 for other examples of using networks to understand concept relationship in complex information sets.) Categories that were confused were linked together in the network 1. This network is shown in Figure 1. The overall pattern reveals that the vast majority of user confusion involved the Interests & Lifestyles and News & Information categories. A separate network constructed for just the last 10 tasks (not shown here) that users performed showed that Interests & Lifestyles continued to cause confusion even after a substantial number of trials. Figure 1. Network representation of category confusion. Categories that were frequently confused with one another are linked. Examination of the content of these and other categories in the browse hierarchy showed substantial redundancy. Many sub categories were “members” of two or more of the top-level categories. The proportion of redundant sub categories ranged from a high of .89 to a low of .00 as shown in Table 3. Follow up analyses showed a strong correlation between the proportion of redundant sub categories and the frequency with which a top-level category was confused with other categories (r (9) = .76, p < .05). This finding suggests that a high level of redundancy makes it very difficult for users to learn to differentiate one category from another. Overly general labels may also fail to provide linguistic cues that highlight differences between categories. For example, most of the categories could be viewed as “interests” or “information”. The use of these highly general words in Interests & Lifestyles and News & Information may make it difficult for users to distinguish between these and other categories. In other words, the more similar a label is to other labels the more frequently it will be confused with other categories during information retrieval tasks. To determine whether this is the case, a separate set of subjects was asked to rate each pair of labels on a 7 point Likert scale. Average similarity scores are provided in Table 3. Higher numbers indicated greater similarity. The correlation between similarity ratings and frequency of confusion was strong and significant (r (9) = .71, p < .05) indicating that categories with more similar labels were more likely to be confused with other categories in the set. 1 For clarity of presentation, only those category pairs confused four or more times are linked. This accounts for 56% of the total confusion data and clearly illustrates the major patterns in the data. Top level category Proportion redundant Average similarity to sub categories other labels. Business & Finance .00 4.35 Computers & Internet .14 4.12 Entertainment & Media .20 4.30 Health & Fitness .00 4.35 Home & Family .80 4.75 Interests & Lifestyles .89 5.15 People & Communities .83 4.25 News & Information .80 5.08 Reference & Education .60 4.25 Sports & Recreation .20 4.45 Travel & Leisure .33 4.57 Table 3. Redundancy and similarity scores for each top-level category. 4 Discussion The major conclusion that can be drawn from this study is that tracking traversal patterns through a browse hierarchy is a useful and insightful way to monitor user experience. This study has shown that traversal data contained valuable information about usability problems in one browse hierarchy. More importantly, analysis of the traversal data revealed the source of confusion, and permitted diagnosis of why it occurred. Specifically, the users in this study experienced a significant amount of confusion in using a browse hierarchy. The source of confusion was pin pointed to two major categories by mapping the data into a network representation that makes inter relations among categories explicit. Finally, the patterns observed in the network representation were validated against measures of learnable category structures. This demonstrated that those patterns were indeed rooted in users' psychological experience of the browse hierarchy and provided both an explanation for why confusion was occurring where it did and what to do to alleviate it. The next step is to generalize the methods and approach used in this small scale study to data from large numbers of people carrying out their own information retrieval tasks in real world settings. Tools for automatically collecting and analyzing such data will need to be developed to make such work tractable. However, the opportunity to continually monitor user experience in a dynamic and changing software environment is likely to be worth the effort. 5 References Chen, C. (1997). Structuring and visualizing the WWW by generalized similarity analysis. In Proceedings of the 8th ACM Conference on Hypertext, (Southampton, U.K., April). Pp. 177-186. Chi, M.T.K. & Koeske, R.D. (1983). Network representation of a child’s dinosaur knowledge. Developmental Psychology, 19, 29-39. Markman, E. M. (1980). Developmental differences in the acquisition of basic and superordinate categories. Child Development, 51, 708-719. Rosch, E. H. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192-233.