Spatial Ability and Visual Navigation: An Empirical Study Chaomei Chen1 and Mary Czerwinski2 1 Department of Information Systems and Computing, Brunel University, Uxbridge UB8 3PH, UK 2 Microsoft Research, One Microsoft Way, 9N/2290, Redmond, WA 98052, USA In this paper, we describe an empirical study of individuals' spatial navigation strategies and a number of performance and preference measures with regard to the design of a novel 3D visualisation. The underlying semantic space of the user interface consists of a collection of papers from the three most recent ACM SIGCHI conference proceedings, visualised as a virtual reality network. This network was automatically constructed based on semantic similarities derived from latent semantic analysis. We studied the search strategies and general preferences of eleven subjects who used this system to find papers on various topics. The study has led to a number of interesting findings, which should be valuable for designers and evaluators of 3D user interfaces. The results highlight the importance of structural elements in the design of a semantically based user interface, because search strategies of users relied heavily on these mechanisms in the design. The results of this study also demonstrate that we are able to characterise and learn from users' search strategies in a visual environment strongly shaped by semantic relationships of the information content. Implications for user interface design based on users’ psychological models of a semantic space are described. INTRODUCTION Individual differences are by far the most significant cause of differences in various human-computer interaction activities (14). However, it has also been stated that a well-designed interface is effective and crucial in accommodating individual differences (11). Using visualisation and virtual reality techniques in the design of new generations of information systems has highlighted the need for a better understanding of a number of fundamental issues during the design process, including individual differences and the elements of good design that can accommodate these differences. It is well known that users approach these novel user interfaces with different domain knowledge, differential depths of computer experience and different styles of using an information system, as well. Navigation is a process whereby people determine where they are in relation to their surrounding environment, and how to get to particular objects or places given that location in space. Recently, researchers have identified a number of issues with regard to navigating in electronic worlds (15). The mapping between structure and semantics has been regarded with particular interest, as it is one of the most challenging issues facing designers of visual user interfaces today. What are the relations between semantic structures and visualisation structures? What visualisation structures are appropriate to tasks in visual navigation? What is the nature of the interaction between the inherent structure, the imposed structure and the user's cognitive model of the space? There is little data and few criteria for assessing the effectiveness and usability of 3D interfaces and a wide variety of visualisation techniques (but see 24, 9, 22). In virtual reality-based information systems, it is crucial for designers to understand the usability of 3D representations and how these visual representations affect people's ability to handle information systems. The goal of such research efforts is to develop guidelines and heuristics for design that are beneficial to the human-computer interaction (HCI) community. In this paper, we will describe an empirical study in attempt to address some of these problems, and provide further empirical support for design principles in the realm of semantic visualisation. The aim of this empirical study is twofold: 1) to assess the usability of a particular virtual environment for visual navigation and thereby improve the design of this specific system, and 2) to examine essential aspects of 3D semantic visualisation and develop an integrated methodology for development and evaluation. Specifically, we will attempt to determine the following: Is spatial ability and user satisfaction with a 3D navigation structure correlated? Do users who have higher spatial ability search more efficiently in a novel, 3D environment? Do they visit fewer nodes during search tasks? Will users with higher spatial ability traverse longer distances? Will distinct, structural patterns be frequently accessed in a 3D virtual environment? Will they be memorable? What is the role of virtual reality in enhancing 3D semantic visualisation? How should we assess the usability of such systems? 2 In this study, we visualised 169 papers from the Association for Computing Machinery’s (ACM) Special Interest Group in Computer-Human Interaction (SIGCHI) according to semantic relatedness in a spatial environment in virtual reality. The semantic similarities were obtained automatically by Latent Semantic Indexing (12), and a user interface was designed to visualise patterns in these similarities. The method of design is described in (6, 7). The spatial navigation strategies and abilities of users were correlated with several different performance and preference measures. Implications for user interface design based on users’ psychological models of a semantic space are described. Usability Issues on Visualisation One basic research question in the context of information visualisation is simply how to assess the effectiveness of a new visualisation tool. The problems with using traditional measures of recall and precision as means of evaluating interactive information retrieval are now well-known, especially the relevance as decided according to judgements from experts, rather than the searchers themselves (25). Veerasamy and Belkin (25) suggested that one should not take for granted the ease of use of visualisation tools in interactive information retrieval systems. They found that some topics appeared to be easier than others in terms of the extent to which they benefited from the visualisation tools. They also noted the problems of finding enough subjects to account for inter-subject differences, and of being able to account for inter-topic differences. The influence of spatial ability on users’ performance with hypertext systems has been investigated in earlier studies. For example, Vicente and Williges (26) found that users with high spatial ability completed their tasks quicker than users with lower spatial ability. Campagnoni and Ehrlich (5) reported that users with good visualisation ability used the top-level table of contents less frequently than users with lower visualisation ability, suggesting that a good spatial ability may help one to memorise how the information is organised. On the other hand, Leidig (18) only found a marginal main effect of spatial ability on the accuracy of performance, although there was a significant main effect of spatial ability on users' general satisfaction. More recently, Czerwinski and Larson (9) studied usability issues of 2.5D visual user interfaces. In their study it was noted that higher spatial abilities were significantly related to a user’s ability to initially navigate and find search items in worlds requiring “fly-through” with the mouse. Search time, specifically, was positively related to spatial ability. It was also observed that users with lower spatial abilities were hesitant to explore large numbers of categories, as if the mental effort required was higher, when compared to users with high spatial ability test scores. These less spatially adept users were also more likely to get lost, and to complete fewer search tasks. In 3 addition to these findings, a review of the literature informed us that visual scanning abilities, as measured by cognitive battery subtests, have been correlated strongly with certain kinds of computer interaction behaviors, such as scanning for a document in a menu item search task (17). However, a spatial ability subtest was found to correlate highly with performance in other kinds of computer tasks, such as navigating a complex hierarchical user interface structure. So clearly, spatial ability subtests are capable of tapping into something more than a simple notion of "intelligence" and can differentially predict performance, depending on the task in question. We decided, therefore, to monitor this ability in our subjects, in the hope that it might help elucidate characteristics in subjects’ performance that would otherwise simply add variability to the data. Wayfinding in Virtual Environments Another factor that can influence subjects’ abilities to navigate 3D environments is their preferred navigation strategy. Darken and Sibert (10) studied whether people use physical world wayfinding strategies in large virtual worlds. They examined a complex searching task in a number of virtual worlds with different environmental cues. Their study showed that subjects were often disoriented in virtual worlds without any landmarks, paths or cues. Simply adding cues like borders, boundaries and gridlines significantly improved navigation performance. Since wayfinding strategies and behaviours were strongly influenced by the environmental cues in the Darken and Sibert study, the results reflected underlying principles for 3D user interface design. In other words, an organisational metaphor, with landmarks and navigational cues, was of utmost importance in successfully navigating these virtual worlds. In fact, cue availability was shown to have a much stronger influence on subject behaviour than did individual differences in related work on this topic (11). Darken and Sibert (10) also investigated their subjects’ spatial memory in connection with using a virtual environment, by asking their subjects to sketch an overall organisation of the virtual environment for trials in which they searched for ships on the sea. They found that different organisational cues resulted in significant differences in terms of the recall accuracy of the spatial layout of the sketch and for individual targets in the environment. Method Subjects Eleven subjects participated the study, including three females and eight males. Nine of the subjects were enrolled in a M.Sc. in Information Systems course. On the average, these subjects have used computers for three years or more and have used the WWW for less than 6 months. They had used the Internet more than 4 times a 4 week prior to their involvement in this study. Email and WWW browsing were the most common activities they described as performing with their personal computers. We acknowledge that our user population was an advanced, computer-experienced group. Yet, even in this group we were able to observe a wide range of spatial abilities and navigational strategies during the study. We are anxious to run more studies with less experienced users to further generalise the findings reported in this paper. Visualisation of the Semantic Space The underlying semantic space of the user interface used in this study consisted of 169 papers published in the three most recent ACM SIGCHI conference proceedings, namely CHI'95, CHI'96 and CHI'97 (1, 2, 3). Papers were structured and visualised according to their semantic similarities, such that papers on similar topics tended to be grouped near to each other. A virtual link structure was automatically generated according to patterns of semantic similarities. Further details on underlying models and algorithms can be found in (6, 7). In this study, the user interface1 was based on the Netscape Navigator 3.0 browser and its Live3D VRML 1.0 plug-in viewer. The screen was split into two frames. The virtual world was displayed in the left-hand side frame. The right-hand side frame was used to display the abstract of a paper selected from the virtual world. Papers were visualised as coloured spheres in a connected network. The initials of authors of each paper labelled the node in the user interface. If the user moved the mouse cursor onto a sphere, the title of the paper would pop up next to the sphere. If the user clicked on the sphere, the abstract would appear in the right-hand side frame. 1 The version used in this study is at http://www.brunel.ac.uk/~cssrccc2/vrml/acm/spatial/spatialx.html 5 Figure 1. The user interface of CHI Proceedings. Users could manipulate the virtual world in a number of ways. For example, they could walk towards the visualisation model by moving their mouse cursor forward. The most important design rationale of the virtual space was to allow users to have greater control over the amount of information displayed in the frame. If they wanted to have an overview of the entire space, they could walk backwards until they could see the entire structure at a glance. If they wanted to examine local details, they could walk up closer to an area so that they could see labels clearly in that area. The actual abstracts of these papers were stored on a web server at the Brunel University. The experiment was conducted on a PC with 233MHz CPU and 32Mb RAM, with a 17-inch display monitor. All the movements on the computer screen were videotaped for subsequent examination and analysis. Procedure This study included pre-test and post-test phases. In the pre-test, spatial ability scores were collected from a standard paper folding test (13), in which subjects were asked to answer multiple-choice questions about the consequence of punching a hole in a paper folded in a particular way. This test is widely included in a battery of 6 tests used for determining overall cognitive ability, but is the specific subtest tapping into spatial ability. An overall spatial ability score on this test ranged from 0 to 20. The average of the spatial ability scores in our study was 10 with a standard deviation of 3.91. A short demonstration was given to subjects on how to use the Live3D VRML viewer. Then subjects were asked to complete two major tasks (named Task 1 and Task 2) and two minor tasks. Ten minutes were allowed for each major task. Three minutes were allowed for each minor task. After subjects completed these tasks, they were asked to complete a post-test questionnaire concerning usability and user satisfaction. The two major tasks required subjects to find articles related to specific topics. Once they found a relevant paper, they were instructed to save the abstract of the paper to a local directory on their PC. In Task 1, subjects were instructed to find as many papers as they could about a topic, whereas in Task 2, they were told to find only 5 papers on a different topic. For scoring user performance on Tasks 1 and 2, a relevance judgement was derived for each subject as follows. First, we generated a list of papers based on a full-text search for a task's topic across the entire collection of 169 papers. Then, irrelevant papers were removed from the list and additional relevant papers were added to the list based on the first author's own judgement. This process led to three types of relevancy, depending on the strategy used. For highly relevant papers, one could retrieve the papers directly from an examination of their titles. For the intermediately relevant papers, one would have to read the papers’ abstracts and keywords. For the difficult determinations, one would need to explore the paper’s content more deeply. For example, if one looks for papers on "visualisation", it is likely that one will retrieve papers #1 and #2 below easily just from reading the titles, but the relevance of paper #3 is less obvious. 1. Tilebars: visualization of term distribution information in full text information access 2. Visualizing complex hypermedia networks through multiple hierarchical views 3. An organic user interface for searching citation links There were 24 relevant papers for Task 1 and 18 for Task 2 in the semantic space of 169 papers. After Task 1 was completed, subjects were asked to complete a brief spatial memory test. For this memory test, subjects were asked to sketch the shape of the visualised semantic space as best they could from memory. This test was designed to find out what subjects could remember after having searched through the semantically designed user interface. Studies have shown that the more information processing resources applied to study materials, the better those materials are recalled (8). It was our hope that subjects were engaged in this information visualisation deeply enough that memory performance for the structure of the space would be high. 7 Details from these sketches should highlight what subjects learned about the structure of the semantic space, and how deeply the visualised structure that they worked with was encoded into memory. Next, subjects performed Task 2, followed by a categorization and abstraction exercise. For the latter task, subjects were asked to name the clusters of papers in the visualised semantic space. We intended to use this task to find out whether subjects could summarise groups of papers associated with distinctive structural patterns and what naming schemes they might use. If meaningful user patterns emerged from this task, these patterns will be used in future designs based on these materials, with the hope that the added semantic structure (e.g., labels, landmarks, and signs) will benefit ease of use with the user interface. Other task performance measures were calculated by means of determining the number of times a subject selected to read the abstract of a paper, the number of abstracts saved for each task, and the number of saved abstracts relevant to the study topics as per their instructions. These scores were calculated for Tasks 1 and 2 only. The entire session lasted approximately 30 minutes. Results As stated above, both qualitative and quantitative performance measures were collected during the test session. Correlations were computed between some measures of task performance and subjects’ spatial ability scores from the paper folding exercise. We also analyzed the drawings of the memorized structure of the user interface, and compared that to the parameters of the actual structure, and correlated this with spatial ability. The videotapes of the search strategies of subjects who had the highest/lowest performance scores were also examined, in order to identify usability issues and strategies relevant to future user interface designs of this genre. Task Performance The number of abstracts saved by each individual was positively correlated with spatial ability in both Task 1 and Task 2 (r=0.45 and 0.27, respectively). In addition, recall was positively correlated with spatial ability in both Task 1 and Task 2 (r= 0.42 and 0.37, respectively). Precision was considerably higher than recall in both tasks, which is not surprising given that no keyword- or fulltext-search facility was provided. The pattern of relationships between precision and the total number of abstracts viewed and spatial ability was not as meaningful, and sometimes went in a direction that was counter-intuitive. Task 1 Mean S.D. Correlation 8 (r-value) with spatial ability Recall 0.13 0.09 0.42 Precision 0.64 0.35 -0.53 Abstracts viewed 15.30 5.95 0.07 Abstracts saved 5.30 4.00 0.45 Recall 0.13 0.07 0.37 Precision 0.79 0.32 -0.18 Abstracts viewed 14.60 11.26 -0.43 Abstracts saved 4.20 3.71 0.27 Task 2 Table 1. Task performance statistics and their correlations with spatial ability. Navigation Strategies In order to analyse navigational patterns in further detail, we superimposed the frequencies of accessing papers that were judged relevant in Task 1 over the visualised semantic structure (see Figure 2). In the figure, relevant papers are marked as boxes and the number of dots beside each box indicates how many different individuals successfully found that target in Task 1. Task performance scores suggested that subjects did reasonably well if relevant papers were located in some structurally significant areas of the user interface. However, if task-relevant papers were located in outskirts of the structure in the user interface, results demonstrated that subjects were less successful. In addition, subjects seemed to be affected by the varying visibility of topical keywords (i.e., whether a search word appears in the title, or was hidden in the abstract, or if there was a complete vocabulary mismatch) across the semantic space. This was a serious issue for users, especially when they were examining a node in a key position, such as a gateway or a branching point. (We found that subjects, in their first few moves, typically examined these positions, or hotspots; if a user failed to recognise a relevant paper at a hotspot, they typically abandoned local exploration in favour of another area of the network, potentially missing many local targets). We will discuss this issue in more detail in later sections. 9 Figure 2. The locations of search targets in Task 1. To understand how users actually navigated through the semantic space, we reviewed the videotapes of subjects who have the highest and lowest task performance scores. The findings were informative. First of all, regardless of task performance, the majority of the subjects regarded the central circle structure as a natural starting point. They tended to aim at the central circle as an initial user interface location and zoom into the virtual world in order to bring this circular area into focus. Outskirts of the central circle tended to be ignored during the initial search. Next subjects would check a number of positions on the circle, especially points connecting to branches. Over time, subjects would gradually expand their search space outwards to reach nodes farther away from the central area. An example of a good strategy observed was when one subject sampled a single node in each cluster and moved on to other clusters quickly during the initial stage. This strategy maximised the likelihood of not becoming lost in a local minimum. It is interesting to note that sometimes, even when a subject actually hit a target, she would ignore it initially. However, when she came back to the same target after saving several other targets, she saved the previously 10 ignored or unrecognised target immediately. This raises the conjecture of subjects’ raising and lowering their acceptance thresholds as they gain experience with the information space. Based on our initial observations, visual navigation strategies can be characterised by high- and low-level browsing modes. In a high-level browsing mode, the unit of browsing is a group of objects in the semantic map, and users jump from one cluster of papers to another. In a low-level browsing mode, the unit of browsing is individual nodes, i.e., users move carefully from one node to another. In the latter case, their browsing paths were clearly confined by the visualisation structure. Some subjects hopped from one cluster to another in long jumps, whereas other subjects carefully examined each node along a path according to the virtual semantic structure. Subjects who made longer jumps apparently realised that they might be able to rely on the global structural cues to help with their navigation. Navigational patterns observed also verified the special role of distinctive structural patterns such as circles, stars, and long spikes as we expected from earlier research (6). It appeared that good performers, as measured by recall rates, tended to have a balance between the two modes (exploiting global and local structural cues), whereas less efficient performers tended to exhibit local scanning earlier and stay there longer. Given that recall was correlated with spatial ability in both Task 1 and 2 (r=0.42 and 0.37, respectively), one may conjecture that subjects with good spatial ability scores were able to benefit from the visualisation structure more than subjects with lower spatial ability scores. This echoes the findings observed by Vicente and Williges (26) in a 2D user interface environment. They found that subjects with lower spatial ability scores returned to a hierarchical table of contents more frequently than subjects with higher spatial ability scores, suggesting that there were significant differences between individuals in learning and using the abstract structure in information search tasks. We will be analysing the videotapes of our users more thoroughly to gather detailed data about navigation strategies and report our findings in the near future. Search Strategies The videotapes of our users captured a number of interesting search strategies, although we haven't yet examined all of the subjects’ interactive sessions. Many subjects directed their initial focus towards the central circle in the visualisation model. We found that each task session started with a pre-search stage. In this stage, subjects would adjust the visualisation model to a comfortable position to start navigation. It turned out to be a common choice for many subjects to devote all of the screen real estate to the display of the central circle before they started to examine titles and/or abstracts of papers. Branching points, where two branches were connected, were also found to be strategically significant in our initial analyses. Subjects often checked the title of a paper at such positions. 11 The videotapes revealed that subjects were able to jump across a considerable distance from these branching points. This confirms our speculation that distinctive structural patterns do have a special place in guiding visual navigation. Figure 3 shows the first few moves in the virtual environment of a particularly effective subject. The large shaded area denotes where he had subsequently focused his search in this area for a considerably long time and was able to retrieve several targets successfully. Note that this subject achieved the highest task performance scores for both tasks, but did not have the highest spatial ability score in the paper folding pre-test. People with lower spatial ability scores may develop good strategies, if they understand how the structure is organised, but this may take time. Some people only realised that they could benefit from the structure during the second half of their session. On the other hand, some subjects thought that the idea was simple enough to understand straightaway, but exhibited difficulty manipulating the virtual world in the Live3D viewer. Therefore, more research needs to be carried out looking at the longitudinal benefits of using these kinds of visualisations in information retrieval tasks, as well as examining better user interface controls and 3D input devices. Figure 3. One subject's initial moves while searching the semantic space. Spatial Memory The spatial memory test provided an alternative viewpoint of the interaction between visualised semantic structures and users’ understanding of that space. By identifying what subjects learned about the structure and 12 how their remembered details vary from one another, we were able to understand more about how to exploit various characteristics of visual semantic structures to support navigation. Figure 4 shows the sketches of the semantic space from four different subjects. These sketches show not only that these subjects have focused on similar features in the semantic space, but also what level of detail subjects can remember about the user interface. These figures are directly related to the differences between subjects’ navigation strategies and their emerging cognitive maps. Figure 4. Subjects' sketches of the semantic information space searched during the study. Most subjects clearly remembered the shape of the central circle. In (a), the subject highlighted the central circle and three sub-areas around the circle. The video analysis confirmed that these areas had been intensively visited during his search. In (b), the subject was able to remember more details about the branches surrounding the central circle. In addition, he added some strokes inside the circle, although they were not as accurate as other structural patterns in his sketch. In (c), the subject had been focused on the central circle and sub-structures within the circle. He vaguely indicated the existence of some branches outside the circle, but accurately outlined 13 the shape of the branch inside the circle. In fact, the video shows that his initial search focused on the circle and the branch inside the circle, before he switched to a local search mode. Another factor may be also related to his memorised cognitive map. At one point, he found a relevant paper, saved it and adjusted the view to search more carefully in the local area, keeping the inside-circle branch in the centre of the screen for about 2 minutes. In (d), the sketch becomes more conceptual. The subject probably had focused on the outskirts of the semantic space, as it is more accurately depicted than is the central area. It is also interesting to note that nodes differ in size in (d), and that there is a sense of symmetry, which is generally regarded as a desirable feature in graph drawing. The subjects who sketched (a) and (b) achieved higher recall in task performance, whereas the subjects who sketched (c) and (d) had higher precision scores. While this provides an brief hint of how subjects’ spatial memory may be influenced by this information visualisation, as well as their individual differences in ability and strategy, we will continue to analyse these structures for meaningful implications for 3D user interface design. We are currently examining the sketches across all the subjects in an attempt to identify structural patterns in these drawings associated with their spatial ability scores. Preliminary analyses have shown that all subjects remember the main cycle of the structure very well. Interestingly, spatial ability can be used to predict how likely a subject is to more closely remember the correct number of major and minor arteries in the structure (r=.77 and .75, respectively, for major and minor arteries). Clearly, subjects with higher spatial abilities are building a fairly veridical representation of the space in as short a period as 30 minutes, and these subjects were able to incorporate that knowledge in their navigation strategies (longer jumps, using global structural cues). One interesting question that awaits future research is whether all subjects’ maps would begin to converge over repeated exposures and longer use of the information space. Categorisation and Abstraction The categorisation and abstraction task was designed to help us understand how individuals would refer to distinctive structural patterns and what features associated with these patterns are likely to be most useful in design. Seven subjects were able to give names to structural patterns without requesting further details. However, some subjects found this task very difficult, and could not complete it. Some wanted to check particular spheres again before they could confidently provide a name for the given structures. One subject named the central circle a "General Issues" area and another one as "the M25" a motorway around the Greater London. These names could make sense as personal landmarks. For example, the shape of the central circle is similar to a ring 14 motorway in a road atlas. One subject named a cluster as "Virtual Reality"apparently because he remembered some virtual reality papers in that area. (Note that virtual reality was not even a task topic.) Some clusters in the categorisation and abstract tasks contained items that were relevant to the targeted areas for Task 1 and Task 2. For example, one cluster includes a number of papers on information visualisation a search topic in Task 1. The following names were given by some of the subjects to this cluster: 22 point bug; coral; information models; information access; interface issues; searching, data, graphic; information, hypertext, cscw2; The name 22-point bug was obviously based on the fact that there are 22 objects in the cluster. The majority of the subjects, however, used a content-based naming scheme, which is usually more informative about the semantic nature of a group of papers from the users' point of view. Content-based names should also be useful in designing a search facility and a query formation user interface. It appears that subjects with higher performance scores were able to give more informative names based on semantic characteristics, whereas subjects with lower performance scores relied more on the physical appearance of a cluster. There are clearly a number of fundamental questions we must answer in the future research and development. For example, based on user requirements that we have gathered from various sources, users apparently would like to see clearly labelled clusters. How do we achieve this? If individual domain experts name clusters, will these names be generally meaningful to a variety of user groups? In terms of distinctive labelling, what is the upper limit on the number of names that can be practically handled manually? If we choose an automated labelling approach, should we develop or adapt a domain-specific naming algorithm, and what semantic attributes should be taken into account? These are some of the research questions that we will be investigating in our future projects. 2 Computer-Supported Cooperative Work (CSCW) 15 User Satisfaction A standard post-test questionnaire was used to assess usability issues and user satisfaction. The questionnaire consists of three blocks of questions and multiple-choice statements about overall satisfaction, usability issues and user interface design. The questionnaire was developed over two years by Microsoft usability engineers, as part of an effort to understand what defined an “engaging” user interface for the web. The questionnaire has been well validated. Details of the validation efforts are described in (16, 21). In general, user satisfaction ratings suggested areas of the user interface that worked well, and areas for improving the software. The majority of the subjects liked the system, on average (Mean=3.36, S.D. =1.02Subjects were less likely to recommend this system to others or state that they would use it on a regular basis (see Table 2). Mean S.D. I liked it. 3.36 1.02 I would recommend this software to others. 2.82 1.54 I would use this software on a regular basis. 2.27 1.56 Global Appeal (all ratings are on a 1-5 Likert scale, with 1=negative and 5=positive). Table 2. Global appeal ratings for the user interface. Eight of the eleven subjects (73%) liked the visualisation idea for organising papers. Nine subjects (82%) thought the user interface was imaginative, but 36% felt that it was confusing and 19% thought it lacked predictability. None of the users rated the user interface as boring. The design was… Agree (%) …imaginative 82 …fun 55 …original 36 …confusing 36 …intuitive 36 …not my type of program 18 …predictable 9 …boring 0 16 Table 3. Design satisfaction ratings for the user interface. The following four usability scores were slightly below average. During the post-test interview, a number of possible reasons for these ratings were identified, such as being unfamiliar with manipulating a VRML world, the incorporation of unlabelled structures and a lack of understanding of clustering models in general. These issues will be discussed further in following sections. Usability Mean S.D. Right when I started, the purpose of the software was clear. 2.82 1.72 It was easy to get where I wanted to go. 2.45 1.44 Right when I started, I knew what I could do. 2.36 1.63 Each area of the software was clearly marked to indicate my location. 2.18 1.60 Table 4. Usability satisfaction ratings for the user interface. Online appeal factor ratings were mixed, although many subjects liked the unique user interface design (Mean=4.45, S.D.=0.52). In particular, simplicity, ability to zoom and walk around topics, and navigating in topical clusters were among the named favourite features. On the other hand, average ratings were low in the areas of customisability, community and familiarity. Obviously these are areas in which the authors will need to put serious efforts into redesigning the user interface. Online Appeal Mean S.D. This software feels unique (or different). 4.45 0.52 This software is mentally challenging. 3.64 0.92 This software has appealing graphics. 3.45 1.13 This software is responsive (not too slow). 3.36 1.50 This software provides valuable information. 3.27 1.74 This software is easy to use. 2.82 1.72 This software uses cutting edge technology. 2.73 1.62 This software provides a detailed environment to interact with. 2.64 1.63 This software is timely. 2.55 1.63 This software is personalised or customisable. 2.18 1.72 This software provides a shared experience (or community). 1.91 1.75 17 This software feels familiar. 1.82 1.54 Table 5. Satisfaction ratings for the online appeal of the user interface. When asked "Who would use this software?", a number of possible applications were mentioned, including researchers who want to have an overview of topics or people who want to use it as a personalised digital library. Discussion This study has produced an interesting but complex set of findings and a large amount of data yet to be fully analysed. Based on the initial results of our analyses, we identified some significant implications for system design, usability evaluation and user behavioural modelling. We intend to incorporate these insights into our next versions of the system. An in-depth observation of individuals' navigation strategies should be of interest to the design and evaluation of other spatial semantic systems in the HCI community. Future Design Directions: Index Search and Cluster Labels An important issue that has been studied in previous studies in hypertext information retrieval is how efficiently one can locate a primary target in hyperspace, and how that influences subsequent browsing (23). In this study, subjects heavily relied on popping up title information to assess the relevance of papers at least at initial stages of search in the virtual world. Subjects confirmed in post-test interviews that it would have been much more helpful if they could have had some means of quickly narrowing down their search space. One natural solution to this problem could be to provide index search facilities that could help users locate a highly probable paper with which to start their navigation through the space. We are working on a new prototype of combining this notion with the structural visualisation approach. Figure 5 is a screen shot of how the hits of a search could be superimposed on the overall semantic structure as an initial step in this direction. With this improvement, it is hoped that users will be able to choose their starting point for navigating more easily. Cluster labelling is another improvement to the user interface that has been requested by a number of users. We will be conducting more empirical studies on refined versions of the system in the near future. It will be very interesting to analyse how the user-driven search enhancements will affect how individuals navigate and search through the virtual environment. 18 Figure 5. An initial prototype that superimposes search hits on the overall semantic structure of the user interface. In this study, several subjects commented on the need for more accurate user interface controls so that the manipulation of the VRML world would be easier. For example, some users would have liked to have been able to specify the degree of zooming accurately instead of having to use the walk control. A few subjects had problems of losing control of the VRML world while navigating. Once this occurred, these subjects had to reset the viewpoint to its initial position. An application-specific dashboard might be useful in these instances. Some subjects over-shot their targets in the virtual worldthe virtual world is then behind the viewpoint. From our observations, it proved difficult for users to get the appropriate viewpoint back if they were new to using the Live3D viewer. It was also disconcerting to the users when this happened. Therefore, a prudent guideline might be to avoid dramatic shifts in the point of view of a virtual world, at least in this type of system, for novice users. As reported in the Results section, we have observed that an understanding of the organisational principles in the user interface seemed to be reflected in the search strategies subjects used in this study. For instance, one successful search strategy was to focus on the central ring initially, and then sample documents outward from the ring. In this way, subjects were able to make optimal use of the clustering organisation inherent in the user interface, and corroborates previous research (e.g., 10, 11). In addition, the spatial memory test in this study underlined the need for reinforcing strategically significant points, or structural hotspots, as well as larger structural patterns in the virtual environment. Strong cues (e.g., landmarks or signs) should be included in the design to reinforce users' cognitive map of the virtual space. For example, an animation of how papers were 19 organised would help users to understand the nature of the organisation. This notion awaits future user interface design work. Incremental Learning A somewhat surprising behaviour was observed in a few subjects' sessions. On several occasions, especially when subjects encountered a paper that was indeed judged to be relevant according to our list of correct papers, subjects carried out their search as if they didn't notice the relevance of a paper. However, when they came back to the same paper, after successfully finding a few other papers, they saved the paper without hesitation. Reasons behind this phenomenon are still not clear although it happened during several subjects’ sessions. One possible explanation is that these papers only seem relevant to the search at hand after fully investigating the space. Other related papers that subjects found between their first encounter and the second encounter with these questionable items might fill in some conceptual gap. This consideration led us to the triangular inequality assumption in Pathfinder network scaling. If we use this assumption, we can conjecture a model as follows to explain this behaviour. To an individual user, the semantic similarity between the task topic and a subtle paper is not clear initially. Then if she found another paper obviously related to the topic and somehow related to the first paper as well, she was able to associate the task topic with the first paper. Based on this argument, one would expect that since the explicit links in the visualisation model were derived as a Pathfinder network, see (7) for details of the modelling, users would be able to identify such implicit connections more easily. However, this is a complex issue that is beyond the scope of this study. Spatial Ability Correlation between spatial ability and user behaviour was computed for a number of different tasks, as well as preference data. Although we found that recall was positively correlated with spatial ability, as were a number of other measures, the overall impact of spatial ability was not straightforward. Sometimes the direction of the correlation was unexpected. A few aspects of the design of this study could be improved in future research to help clarify the impact that spatial ability might have on the usability of such information visualisations. For example, the entire task session was very limited in terms of time, especially for subjects who had not used the VRML viewer prior to the test session. The sample size should be increased to minimise variability in the data resulting from extreme combinations of spatial ability and experience with computers. 20 On the other hand, the spatial memory test and the categorisation task turned out to be very informative. The videotape coding provided an additional means of understanding users' navigation strategies. We would recommend usability studies on visualisation-based information systems to include such exercises. Conclusion In this study, users searched through a virtual world of semantically organised HCI papers. They knew little about the underlying computational model from which the user interface was derived prior to their test session. For the most part, the participants were naive with regard to manipulating VRML worlds using an ordinary mouse. We predicted that people with higher spatial ability would navigate the virtual world better, more efficiently, and would be more satisfied with the 3D user interface, as had been reported in previous studies. The results with regard to spatial abilities were not as clear for 3D user interface visualisations, and demonstrated that users faced two major types of problems when navigating 3D user interfaces of this type: Cognitive: the number of explicit navigational cues in the virtual world was suboptimal, and will be a ripe area for future design research. Being unfamiliar with the similarity-based organisation model used to develop the user interface, users needed to take advantage of user interface landmarks and directional cues to guide their initial search paths. This study observed different search and navigation strategies, from fast cluster dipping, to paper mining, depending on the spatial abilities and computer expertise of subjects. In addition, users needed to develop an adequate spatial representation of the user interface for better wayfinding during their search tasks. Having strong cues in the environment will likely hasten the development of such representations so that they can be manipulated during navigation tasks. Motor: the problems with directly manipulating the user interface in Live3D may have accounted for more performance variability than did subjects’ different spatial abilities. No strong correlation was found with consistency between spatial ability and task performance measures as predicted. Increased understanding from examining videotapes, as well as conducting more experimental studies, should provide us with useful design directions. We intend to recruit a broader range of user types (i.e., less experienced computer users) in order to generalise our findings to larger segments of the general population, as well. 21 What have we learned from this study? We now know several typical search strategies that deploy in a virtual world, which have not been observed in textual or static graphical user interfaces. Some subjects' strategies demonstrated that leveraging semantic clustering in the user interface is advantageousthese subjects were able to retrieve information within a cluster much faster than between clusters. We now know what information would be good for users to recognise, including landmarks and directional cues. As presented above, our initial redesigns have already incorporated the findings from this research, with the goal of increasing the likelihood that subjects can memorise special structural patterns within an information visualisation, and thereby maximise their search performance. Acknowledgements This work was partially supported by EPSRC under the Multimedia and Networking Applications programme, Research Grant Number GB/L61088. Thanks to all the people who participated in the study as subjects. Thanks to anonymous reviewers for their helpful comments. References 1. ACM. Human Factors in Computing Systems CHI95 Conference Proceedings. New York: ACM Press, 1995. Available at http://www.acm.org/sigchi/chi95/ 2. ACM. Human Factors in Computing Systems CHI96 Conference Proceedings. New York: ACM Press, 1996. Available at http://www.acm.org/sigchi/chi96/ 3. ACM. Human Factors in Computing Systems CHI97 Conference Proceedings. New York: ACM Press, 1997. Available at http://www.acm.org/sigchi/chi97/ 4. ALTMANN, E., LARKIN, J. H. and JOHN, B.E. Display navigation by an expert programmer: A preliminary model of memory. In: Proceedings of CHI'95. Available at http://www.acm.org/sigchi/chi95/ proceedings. 5. CAMPAGNONI, F., and EHRLICH, K. Information retrieval using a hypertext-based help system. ACM transactions on information systems, 7, 1989, 271-291. 22 6. CHEN, C. Augmenting user interfaces for digital libraries with virtual reality. In: Proceedings of the 31st Hawaii International Conference on System Sciences. V.II. Digital Documents Track. New York: IEEE Computer Society, 1998, 148-157. 7. CHEN, C. Structuring and visualising the WWW with Generalised Similarity Analysis. In: Proceedings of Hypertext'97. New York: ACM Press, 1997, 77-186. 8. CRAIK, F.I.M., and LOCKHART, R.S. Levels of processing: A framework for memory research. Journal of Verbal Learngin and Verbal Behavior, 11, 1972, 671-684. 9. CZERWINSKI, M. and LARSON, K. The new Web browsers: They're cool but are they useful? In: H. Thimbleby, B. O'Conaill and P. Thomas (eds.), People and Computers XII: Proceedings of HCI'97. Berlin: Springer Verlag, 1997. 10. DARKEN, R. P. and SIBERT, J. L. Wayfinding strategies and behaviors in large virtual worlds. In: Proceedings of CHI'96. Available at http://www.acm.org/sigchi/chi96/proceedings/papers/Darken/ Rpd_txt.htm 11. DARKEN, R. and J.L. SIBERT. A toolset for navigation in virtual environments. In: User Interface Software and Technology. 1993. Atlanta, GA: ACM Press. 12. DEERWESTER, S., DUMAIS, S. T., LANDAUER, T. K., FURNAS, G. W. and HARSHMAN, R. A. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 1990, 391-407. 13. ECKSTROM, R. B., FRENCH, J.W., HARMAN, H. H. and DERMAN, D. Kit of factor-referenced cognitive tests. Princeton, N.J.: Educational Testing Service, 1976. 14. EGAN, D. Individual differences in human-computer interaction. In: M. Helander (ed.), Handbook of Human-Computer Interaction. Amsterdam: Elsevier Science Publishers, 1988, 543-568. 15. JUL, S. and FURNAS, G. W. Navigation in electronic worlds: A CHI 97 workshop. SIGCHI Bulletin, 29(4), 1997, 44-49. 16. KANERVA, A., KEEKER, K., RISDEN, K., SCHUH, E. and CZERWINSKI, M. Web usability research at Microsoft Corporation. In: J. Ratner, E. Grosse and C. Forsythe (eds.), Human Factors for World Wide Web Development. New York: Lawrence Erlbaum, 1997, 207-224. 17. LARSON, K. and CZERWINSKI, M. Web page design: Implications of memory, structure and scent for information retrieval (To appear in Proceedings of CHI'98). 23 18. LEIDIG, P. The relationship between cognitive styles and mental maps in hypertext assisted learning. Unpublished doctoral dissertation, Virginia Commonwealth University. UMI-92-25398, 1992. 19. PIROLLI, P. Computational models of information scent-following in a very large browsable text collection. In: Proceedings of CHI 97, New York: NY: ACM Press, 1997, 3-10. 20. PIROLLI, P., PITKOW, J., and RAO, R. Silk from a sow's ear: Extracting usable structures from the Web. In: Proceedings of CHI 96, New York: NY: ACM Press, 1996, 118-125. 21. RISDEN, R., HANNA, E. and KANERVA. A. Dimensions of children's intrinsic motivation in computer software products. Poster presented at the Meeting of the Society for Research in Child Development, Washington, DC, April, 1997. 22. SATALICH, G. Navigation and wayfinding in virtual reality: Finding the proper tools and cues to enhance navigational awareness. Unpublished Doctoral Dissertation, 1995. Available at http://www.hitl.washington.edu/publications/1995pub.html. 23. SMEATON, A. F. and MORRISSEY, P. J. Experiments on the automatic construction of hypertexts from texts. The New Review of Hypermedia and Multiemdia, 1, 1995, 23-40. 24. SUTCLIFFE, A. and PATEL, U. 3D or not 3D: Is it nobler in the mind? In: M. A. Sasse, R. J. Cunningham and R. L. Winder (eds.), People and Computers XI: Proceedings of HCI'96. London: Springer-Verlag, 1996, 79-94. 25. VEERASAMY, A. and BELKIN, N. J. Evaluation of a tool for visualization of information retrieval results. In: Proceedings of SIGIR'96 (Zurich, Switzerland). New York: ACM Press, 1996, 85-92. 26. VICENTE, K. J. and WILLIGES, R. C. Accommodating individual differences in searching a hierarchical file system. International Journal of Man-Machine Studies, 29, 1988, 647-668. 24