When the Wait Isn’t So Bad: The Interacting Effects of Web Site Delay, Familiarity, and Breadth February 20, 2004 Dennis F. Galletta (contact author) Associate Prof. of Bus. Admin. Katz Graduate School of Business University of Pittsburgh Pittsburgh, PA 15260 Phone 412-648-1699; Fax 412-648-1693 galletta@katz.pitt.edu Scott McCoy Assistant Professor of MIS School of Business College of William & Mary Williamsburg, VA 23187-8795 Phone 757-221-2062; Fax: 757-221-2937 scott.mccoy@business.wm.edu Raymond M. Henry Assistant Professor Department of Management Clemson University Clemson, SC 29634 Phone 864-656-0591; Fax 864-656-2015 rhenry@clemson.edu Peter Polak Assistant Professor of CIS School of Business Administration University of Miami Coral Gables, FL 33124-6540 Phone 305-284-8044; Fax 305-284-5161 ppolak@miami.edu Acknowledgements: The authors would like to thank Enrique Mu, Dong-Gil Ko, and Alex Lopes for allowing their students to participate. The authors also appreciate the excellent comments from the attendees of the research workshop at the University of Illinois at Urbana-Champaign. Abstract Although its popularity is widespread, the World-Wide Web is well known for one particular drawback: its frequent delay when moving from one page to another. This experimental study examined whether delay and two other Web site design variables (site depth and content familiarity) have direct and interactive effects on user performance, attitudes, and behavioral intentions. An experiment was conducted with 160 undergraduate business majors in a completely counterbalanced fully factorial design that exposed them to two Web sites and asked them to browse the sites for 9 pieces of information. Results showed that all three factors have strong direct impacts on user attitudes and performance, leading to behavioral intentions, as predicted. A significant 3-way interaction was found between all three factors on attitudes and performance, and strong 2-way interactions were found between all pairs of factors on performance, also as predicted. The negative attitude and performance effects of delay appear to be reduced by breadth and familiarity together, but only performance is enhanced by all possible two-way interactions. Additional research is needed to discover other factors interacting with delay, other effects of delay, and under what amounts of delay these effects occur. Keywords: electronic commerce, response time, web site design, attitudes, performance ISRL Categories: AA01, AA02, AA05, AI0105, EI0206, FB0403, HA0902, HC0101 When the Wait Isn’t So Bad: The Interacting Effects of Web Site Delay, Familiarity, and Breadth In spite of large investments in business-to-consumer electronic commerce, online stores lag behind brick-and-mortar stores by a wide margin. Online consumer spending represents only 1.5% of the $872.5 billion in total retail sales for the most recent quarter (Commerce Department, 2003). Meanwhile, the majority of users report that they try to use the Web at least to collect information for purchase decisions, but have problems doing so. Practitioners have provided ample testimony that one of the most important problems is the pervasive delay with each attempt to view information. Studies have found that the enhanced usability of a fast site leads to increased usage (Nielsen, 1999a), and improving delay from 8 seconds or more to between 2 and 5 seconds doubled the traffic of some sites (Wonnacott, 2000). There is a potential strategic disadvantage resulting from slow service; losses from download times of 8 seconds and higher have been estimated to be $4.35 billion annually (Zona, 1999). The research community has also examined systems delay in many settings over the years, including, most recently, Web browsing. Building on that work, this study examines delay, along with factors that have not earlier been studied in the Web arena, or with delay, as they influence multiple outcomes such as attitudes, behavioral intentions, and performance of users. Delay: No Silver Bullet Delay is considered by some to be “the most commonly experienced problem with the Web” (Pitkow et al., 1998), leading many to call it the “World Wide Wait” (Nah & Kim, 2000). Although modal access speed has increased considerably since early user surveys, problems of delay continue to plague users (Selvidge, 2003; Nielsen, 1999b). Many are turning to broadband access for help, although high costs and limited access will continue to leave them in the minority of users both in the US and worldwide. 2 One should not be tempted, however, to assume that providing broadband to the everincreasing minority of people will solve delay problems, which have a variety of causes (Nah & Kim, 2000; Rose et al., 1999; Nielsen, 1997). The ultimate speed of loading a page is only as fast as the slowest link in the chain from the server itself, to the Internet backbone, to the ISP, and finally, to the user’s desktop. If they manage to avoid telecommunications congestion at the weakest link in the system, users could still find themselves in a global waiting line at the server or find themselves as victims of design or configuration error that in some cases triple page loading time (Koblentz, 2000). Network traffic growth continues to exceed upgrades in network bandwidth, painting a gloomy picture for the future (Sears & Jacko, 2000). An increase in the number of broadband users will also require a large number of server upgrades (Connolly, 2000). In the words of Sears and Jacko, “delays will be a concern for many years to come” (p. 49). Delay, Familiarity and Structure While neglected until now, a case will be made in this paper that familiarity and structure interact with delay in their effects on a user’s Web experiences. Slow loading of a single page or even a small number of pages might not be very noticeable or memorable. However, repeated delays are quite common for two reasons: Many sites have “deep” structures that require several clicks to arrive at a target page. Many users are unfamiliar with the hierarchical structure of the site, becoming confused or “lost,” choosing incorrect hyperlinks (Edwards & Hardman, 1989) by a trial-and error. A user-centered approach will align the links with user goals (Sawasdichair and Poggenpohl, 2002). Repeated clicks on each site will add up to a sizable time investment in viewing what might be a blank screen for a great deal of time.i The factors mentioned above lead to the following research question: How do delay, familiarity, and site depth interact to influence attitudes, performance, and finally, behavioral intentions in online product information searches? 3 This question should be of interest to researchers who wish to understand how those factors independently and interactively form a predictive model of Web attitudes and performance. It should also be of interest to designers who wish to attract and retain visitors to their site. Because designers seldom have complete control over delay, it is important to understand ways in which other features of the site might compensate for, or perhaps exacerbate, delay difficulties. Providing fast access to a target Web page is therefore a complex issue that cannot be remedied with a single technology or approach; for the foreseeable future, it appears that users will continue to confront issues of delay. Design choices such as increasing the depth of a site or making use of unfamiliar terminology can greatly magnify the delay (Shneiderman, 1998). Theoretical Foundations Individuals using the Web navigate within a structure to find information (Bernard, 2002). There are, however, several ways to view this structure. Korthauer and Koubek (1994) identify three types of structural variables that affect users’ ability to find information: inherent structure, internal structure, and navigational structure. Inherent structure refers to the underlying physical structure, which creates the overall form of a Web site. Internal structure is the user’s knowledge of the domain being accessed. Navigational structure refers to the structure of tangible navigational aids within a site. These three aspects of site structure impact the user’s ability to find information. Discussions of the ability to find information in a Web site are informed by Pirolli and Card’s (1995) Information Foraging Theory (Katz & Byrne, 2003; Larson & Czerwinski, 1998). This theory asserts that individuals searching for information employ strategies to maximize their rate of finding target information and minimize the costs associated with finding that information (Katz & Byrne, 2003). Costs pertain to cognitive effort required of users and penalties imposed by following incorrect paths. A key component of this theory is the concept of information scent, defined as “the 4 ‘imperfect’ perception of the value, cost, or access path of information sources obtained from proximal cues” (Pirolli & Card, 1995, p. 646). When the scent is strong, users can easily find target information. In a Web environment, scent would refer to the amount of information about the location of target information that is provided by the site’s structure. A site can be made more usable if it conforms to a known genre (Crowston and Williams, 2000), defined as “typified communicative actions characterized by similar substance and form and taken in response to recurrent situations” (Orlikowski and Yates, 1994, p. 299). If Web sites serving a similar function have similar structures (such as terminology or other cues about the desired path) that are implied by conventions within that genre, scent would increase and navigation will be simplified (Crowston and Williams, 2000). While genre may have some impact, there are still structural choices that can be made, including choices about the number of links and nodes used and the labeling of links, even within the same genre. These choices will impact scent and the ability of users to find information. The ability to find information will be mediated by the complexity of the structure being navigated (Bernard, 2002). Various structures impose different costs and provide differing degrees of information scent. This article presents an experimental study that looks at two aspects of structure (depth and familiarity) along with delay to determine the extent to which these factors have independent and interactive effects on user attitudes, performance, and finally, behavioral intentions. The research model is presented in Figure 1. In the following sections the elements of the model are described and hypotheses about the relationships among these elements are developed. Delay Our understanding of Web page download time is informed by research on computer response time. Response time has been shown consistently to be a major factor in overall usability 5 both in studies of the World-Wide Web (McKinney et al., 2002; Torkzadeh & Dillon, 2002; Rose et al., 2001, 1999; Rose & Straub, 2001; Nielsen, 1999b) and large-scale information systems (Shneiderman, 1998; Kuhmann, 1989). Familiarity H6 H7 Speed H1 H7 H3 H5 Depth Attitudes Performance H8 Behavioral Intentions H4 H2 main effect 2-way interaction 3-way interaction Figure 1: Research Model Effects of delay. Prior research has shown an inverse relationship between response time and user performance (Barber & Lucas, 1983; Butler, 1983) and productivity (Martin & Corl, 1986; Dannenbring, 1983). Long delays lead to frustration (Doherty & Kelisky, 1979), dissatisfaction (Lee & MacGregor, 1985), feelings of being lost (Sears et al., 2000), and giving up (Nah, 2002). Negative impacts are strongest when delays are longer than expected, occur in unpredictable patterns, or are unaccompanied by duration or status information (Dellaert & Kahn, 1999; Polak, 2002). Tolerable Level of Delay. Early studies suggested that the ideal response time is 2 seconds (Miller, 1968), although such short response times might be impractical to use as a general Web design standard (Shneiderman, 1998), given the abundant and varied sources for delay. The maximum delay varies widely in previous literature, where delays over certain limits led to a variety of outcomes. At 38 seconds, users will abandon a task (Nah, 2002). At 15 seconds, the delay becomes disruptive (Shneiderman, 1998). At 12 seconds, users will lose patience (Hoxmeier & DiCesare, 2000). At 10 seconds, they will lose interest (Ramsay et al., 1998). At 8 seconds they suffer psychological and performance consequences (Kuhmann, 1989). 6 A recent study by Galletta et al. (2004) imposed delays of 0, 2, 4, 6, 8, 10, and 12 seconds on 196 Web-browsing subjects to determine effects of lengthening delay times on performance, attitudes, and behavioral intentions. The results revealed significant effects on all three dependent variables as delays increased. Interestingly, by the time delays exceeded 6 seconds, further reductions in performance, attitudes, and behavioral intentions seemed to lose significance. It is striking that such relatively short delays in prior research have been shown to produce significant results in both user attitudes and performance. Users of the World Wide Web often must deal with much longer wait times (Ramsay et al., 1998). From a cognitive perspective, increased delays increase the cognitive effort that users must expend. The increased time between steps on the information path may cause users to lose track of the information scent. Delays also directly increase the cost that a user incurs for each choice that is made in finding target information. Therefore, we propose the following hypotheses: H1a: A fast site will result in more favorable user attitudes about the Web site than will a slow site. H1b: A fast site will result in higher user performance than will a slow site. Depth Many alternative Web site designs can be selected, with varying effects on the number of selections that must be made. Given a fixed number of end nodes in a Web site, a designer can elect to use a “broad” strategy, increasing the number of hyperlinks on each page while decreasing the number of clicks and page loads, or a “deep” strategy with fewer links per page but more hierarchical levels. This is a primary aspect of the inherent structure of the website. The tradeoff between breadth and depth has been widely studied in menu design for information acquisition (for example, see Norman & Chin, 1988; Parkinson et al., 1988; Landauer & Nachbar, 1985; Kiger, 1984; Miller, 1981) and hypermedia (Edwards & Hardman, 1989), but only recently in Web design (Katz & Byrne, 2003; Chae & Kim, 2003, Zaphiris & Mtei, 1997). 7 Much of the work concerning the tradeoff between breadth and depth has focused on determining the optimal number of items per level (Larson & Czerwinski, 1998). While there has been some variation in research findings, most of the studies listed above have determined that breadth is preferred to depth (Jacko et al., 1995). The theory of task complexity (Frese, 1987) can be used to explain this preference (Jacko & Salvendy, 1996): A deeper structure increases the number of decisions that must be made, which in turn increases complexity, response time and the number of errors. Additionally, the information scent, the certainty that the chosen path is the correct one, decreases between distant nodes (Bernard, 2002), perhaps because of the stronger association between the descriptor and the target (Larson & Czerwinski, 1998). These explanations are consistent with Shneiderman’s (1998) observation: “when the depth goes to four or five (levels), there is a good chance of users becoming lost or disoriented” (p. 249). Also, too few links per page will require more page loads, decisions, and further delay (Zaphiris & Mtei, 1997; Landauer & Nachbar, 1985) Although increasing the breadth of a site requires fewer clicks by the user to reach the desired page, providing too many links on a page can result in clutter, making it difficult to read or comprehend (Larson & Czerwinski, 1998). Most laboratory studies consider 8-9 items per menu to be a “broad” structure and 2-3 items per menu to be a “deep” structure, but even when those numbers are altered, the preference for breadth over depth remains (Katz & Byrne, 2003) To summarize, an ideal, universal depth or breadth of a Web site cannot be determined, although most studies have found that users prefer, and perform best with, a “broad” structure with 89 items per menu in interactive systems. These choices can have direct implications on the complexity of a site and information scent that is provided, both factors that impact the ability to find target information. Therefore, we propose: 8 H2a: A broad Web site will result in more favorable user attitudes about the site than will a deep Web site. H2b: A broad Web site will result in higher user performance than will a deep Web site. Familiarity Another aspect of structure is the internal structure, or the underlying user knowledge of the domain. Familiarity with Web site content has been shown to be an important determinant of shopping effectiveness, but familiarity is difficult to control because of widespread heterogeneity of users (Chau et al., 2000). We therefore define familiarity as the ability of users to discern, from the terminology presented on the site, their next move to reach their goal. Because users choose their paths based on the likelihood of finding target information, familiarity with the terms used to define those paths increases the information scent provided by the paths. Pages higher on the hierarchy often depict broad identifiers for which the user must make inclusion judgments in order to reach the appropriate page at lower levels (Paap & Roske-Hofstrand, 1988; Katz & Byrne, 2003; Pardue & Landry, 2001). If users do not understand the category names and partitions used by designers, they will take longer (Somberg & Picardi, 1983) and have more difficulty reaching their goal (Dumais & Landauer, 1983). The goal is similar to that in “wayfinding,” (Santosa, 2003; Jul & Furnas, 1997) but we make use of semantic, rather than graphical, cues. The disorientation that occurs when users become lost is so widespread and serious that Edwards and Hardman (1989) call it “hypertext’s major limiting factor” (p. 62). A cognitive explanation of this is that users spend some of their limited processing capabilities on navigation that could otherwise be spent on attending to comprehension of the content (Thuring et al., 1995). This cognitive overhead should obviously be minimized or the site will become unusable. Unfamiliarity requires that users locate information through trial and error (Schwartz & Norman, 1986), increasing the costs that users experience and the likelihood of choosing incorrect paths. When dealing with unfamiliar product categories it is possible for users to spend a great deal 9 of time lost in the hierarchy of the system (Norman & Chin, 1988; Robertson et al., 1981). Paap and Roske-Hofstrand (1988) suggested that user familiarity with category names should influence the design of hierarchical structures. Quite often, firms organize Web sites according to their own internally-generated product line categories. For example, one very large consumer electronics firm once included three paths to its camcorders. The links were “Camcorders,” “Handycams,” and “Professional Camcorders.” All were camcorders, but in different price, size, and/or feature groupings. A customer might be looking for a particular model or format, but lack of familiarity with the groupings would make it necessary to perform a “brute force” search through the site. The use of meaningful categories is one of the determinants of effective e-commerce Web sites (Turban & Gehrke, 2000) and of Web site ease of use (Lederer et al., 2000). These structural elements directly impact information acquisition costs, cognitive effort required, and information scent. We propose the following hypotheses regarding familiarity: H3a: A familiar Web site will result in more favorable user attitudes about the site than will an unfamiliar Web site. H3b: A familiar Web site will result in higher user performance than will an unfamiliar Web site. Interactions The model asserts that delay, depth, and familiarity have not only direct but also interactive effects on attitudes, performance, and finally, behavioral intentions. Though somewhat unusual, we also hypothesize all possible interactions because of expected synergistic relationships among the three conditions described below. The hypothesized interactions are based on the sharply rising amount of foraging required under unfavorable combinations of conditions. 10 Depth and Familiarity While increasing depth will decrease the rate of information acquired (Pirolli & Card, 1999), increasing the unfamiliarity of the terminology on the site will introduce trial-and-error foraging at the same time, decreasing dramatically the foraging rate as described below. A familiar site with L levels to traverse (from the main index page to a bottom-level page) would require an expected average of C clicks to reach the bottom node as follows: C fam L By definition, a perfectly familiar site causes no user errors in navigation; therefore the number of clicks to reach a bottom-level page is equal to the number of levels the user must traverse, with no backtracking. For example, if there are 5 levels from top to bottom, a user who starts at level 1 needs to traverse 4 levels to reach level 5, the goal. A perfectly unfamiliar site, however, without learning, would require a “brute force” systematic (next neighbor) search with backtracking. Such a search would require many more clicks to reach the correct bottom node. The expected number of clicks to reach the desired node is a function of the number of links per page and the number of levels: L 1 C unfam ( n i ) 1 i 1 where: n = the number of links per page (assumed to be constant throughout) L = the number of levels to traverse Counting from the level under the home page, at each level i, there are ni intermediate pages. In a “brute force” search, the user would traverse each of these pages, hoping to find a direct link to the bottom-level node sought. Each unsuccessful search would normally stop at level L-1, that is, one level above the bottom, in the absence of the desired link. In other words, if a user is looking for a certain model camcorder, and arrives at a page including only links to the pages of other mod- 11 els, he or she would not need to load each page containing the incorrect model, and would move on. Ultimately, when a link to the correct bottom page is found, he or she would need to click one last time to get there, hence the +1 at the end of the formula. Traversing through the structure requires backtracking after each “dead end,” suggesting two clicks per intermediate page (one down and one back up). However, on average, a randomly-placed goal would be found halfway through the search. Therefore, the effect of backtracking is cancelled by finding the goal halfway through the task (multiplying by 2 clicks per page, and then dividing by 2), so the formula provides the average number of clicks to arrive at the goal. If a site has 81 bottom level pages two options might be to arrange them into 4 levels each with 3 links (34=81) or into 2 levels each with 9 links (92=81). The differences between the expected number of clicks in a familiar and unfamiliar site using each option are striking (see Table 1), indicating theoretical support for interaction provided by the formula. Table 1: Average Number of Clicks Predicted for Navigating to a Given Node Familiar Site Unfamiliar Site Deep (4 levels to traverse, with 3 links each) 4 40 Broad (2 levels to traverse with 9 links each) 2 10 The pattern in Table 1 supports the following hypotheses: H4a: There will be an interaction between familiarity and depth on user attitudes. H4b: There will be an interaction between familiarity and depth on user performance. Depth and Delay According to Shneiderman (1998), deep systems with slow response time are “annoying to the user” because they result in “long and multiple delays” (p. 255). The formula and Table 2 support that assertion. The total expected delay time in an unfamiliar site is zero with instantaneous response time, and rises dramatically when an 8-second delay is imposed. Averaging across the famili- 12 arity treatment, the expected amount of delay jumps from 0 to 48 seconds [(16 + 80)/2] when delay is imposed in the sample broad site, and from 0 to 177 seconds [(32 + 320)/2] in the sample deep site. In short, Hypotheses 5a and 5b state that a “broad” design can partially make up for delay. Table 2: Delay Predicted for Navigating to a Given Node, 0 or 8 second delay Familiar Site Unfamiliar Site Deep (4 levels to traverse, with 3 links each) 0 second delay / 8-second delay 0 seconds / 32 seconds 0 seconds / 320 seconds Broad (2 levels to traverse with 9 links each) 0 second delay / 8-second delay 0 seconds / 16 seconds 0 seconds / 80 seconds H5a: There will be an interaction between delay and depth on user attitudes. H5b: There will be an interaction between delay and depth on user performance. Familiarity and Delay Likewise, averaging across the number of levels in Table 2, the worst case predicts an expected delay that jumps from 0 to 28 seconds for a perfectly familiar site and from 0 to 200 seconds for a perfectly unfamiliar site thanks to the “brute force” strategy that might be needed given the large differences in scent for the foraging task. Indeed, Galletta et al. (2004) found that attitudes “bottomed out” sooner for unfamiliar sites than familiar sites. H6a: There will be an interaction between familiarity and delay on user attitudes. H6b: There will be an interaction between familiarity and delay on user performance. Delay, Depth, and Familiarity The most striking conclusion of Table 2 lies in the likely three-way interaction that comes without abstracting across depth or familiarity dimensions. As the delay with each click increases, the differences among the cells increase dramatically. If pages loaded instantaneously, the expected delay (for all clicks to the goal) would be zero in all cells. However, if a site incurred an 8-second delay, a brute-force search in the deep site would impose an expected delay of 320 seconds (5.3 minutes), 13 while a brute-force search in the broad site would impose a delay of only 80 seconds (1.3 minutes). In a familiar site, the user would only need to click once at each level, with a delay of 32 seconds and 16 seconds in the deep and broad site, respectively. While previous researchers stopped short of examining these dimensions together, the mathematical and logical justification above suggests that: H7a: There will be an interaction between delay, familiarity, and depth on user attitudes. H7b: There will be an interaction between delay, familiarity, and depth on user performance. Is The Worst Case Realistic? To address the worst case, we examined logs of a subset of our subjects to see how closely they conformed to the idealistic numbers suggested by the formula and Tables above. Based on the first 128 participants (64 per cell) for whom working logs were available, Table 3 presents the actual number of clicks for each condition. As might be expected, user performance did not exactly match the predicted numbers, but was quite close. In general, participants in the familiar site performed at less than the expected rate, and participants in the unfamiliar site performed at better than “worst case” levels. The subjects in the familiar site might have had missed a few of the cues designed to be familiar, and participants in the unfamiliar site might have learned some of the paths along the way. Table 3: Average Actual versus Predicted Clicks for Navigation to a Given Node Familiar Site Unfamiliar Site Deep (4 levels to traverse, with 3 links each) 6.5 (predicted: 4) 26.5 (predicted: 40) Broad (2 levels to traverse with 9 links each) 3.8 (predicted: 2) 8.2 (predicted: 10) Attitudes and Behavioral Intentions One of the key Web design goals is to get potential users to visit and to return later. Recent research suggests that the use of behavioral intentions is appropriate in a Web context (Song & Zahedi, 2001), where a user’s behavioral intention (to return) serves as a useful summary variable indicating design success. Users form lasting attitudes about sites based on very brief encounters 14 (Chiravuri & Peracchio, 2003). Fishbein and Ajzen (1975) theorize attitudes to be a central antecedent of behavioral intentions. The relationship between attitudes and behavior has seen significant attention (Ajzen, 2001). Technology acceptance research (Davis, 1989; Venkatesh & Davis, 2000; Taylor & Todd, 1995) follows from this stream. Based on this work, we would expect a direct and positive relationship between attitudes and behavioral intentions: H8a: More favorable user attitudes will lead to more favorable behavioral intentions about the site. Performance and Behavioral Intentions The HCI literature has focused substantial attention on performance as the “ultimate concern” (p. 404) and central dependent variable for users of technology, whether evaluating designs of hardware or software (Card et al., 1983). Although their focus has been on choosing design factors based on their contributions to performance, HCI researchers have used quantitative performance models to make multi-million dollar adoption decisions (e.g., Gray et al., 1993). Over the last two decades, hundreds of HCI studies have provided quantitative analysis of user-system performance for adopting the best technological approach in a variety of situations. Design of both software and hardware almost always involves such analysis, whether the designer is considering how to enter formulas into a spreadsheet, how an external mouse compares to a laptop pointing device, or how much time is saved by using a pupil tracking system. In the MIS literature, Compeau and colleagues have posited a link between performance and behavior for several years, based on Social Cognitive Theory (SCT) (see Bandura, 1977 and Compeau & Higgins, 1995, for a review of SCT). In a recent study by Compeau et al. (1999), the two strongest predictors of usage behavior were performance and attitudes (in both cases r = .25 and p < .001). While behavior is not the same as behavioral intentions, we suspect that a link exists between performance and behavioral intentions, and we present the final hypothesis: H8b: Higher user performance will lead to more favorable behavioral intentions about the site. 15 Dependent Variables: Summary To summarize, attitudes and performance are theorized to be important antecedents of behavioral intentions, when a user has other alternatives to using the system or browsing on a particular site. In this study, we focus on delay, site depth, and familiarity as important Web site design factors, serving as direct and interacting antecedents of user attitudes and performance, and ultimately (for our purposes), behavioral intentions. Research Methodology The study was conducted in an experimental setting to control delay, site depth, and familiarity, as well as to allow measurement of outcome variables. Two levels of each of three factors provided a 2x2x2 design with two between-subjects factors (delay and depth) and one within-subjects factor (familiarity). We employed a completely counterbalanced, fully factorial design, providing all 32 combinations of order, delay, depth, and familiarity.ii Operationalization of Variables Delay Delay, the central factor in this study, was manipulated with Javascript code to provide a constant eight-second delay per page for the slow site, with no such delay for the fast site. An eightsecond delay is about the middle of the range of what others have considered serious (Hoxmeier & DiCesare, 2000; Ramsay et al., 1998; Zona, 1999; Shneiderman, 1998; Kuhmann, 1989), and has been shown to be well past the patience levels of laboratory subjects (Galletta et al., 2004). The eightsecond delay also fits the simplicity of the pages (Sears & Jacko, 2000). A manipulation check for the delay variable asked subjects to evaluate the speed of display- 16 ing the Web pages. On a 7-point Likert scale, “fast” subjects, on average, rated the speed as 6.0 (standard deviation 1.2) and “slow” subjects rated the speed as 2.2 (standard deviation 1.3). The difference was significant (t=312.3; one-tailed p<.001). Familiarity Two artificial Web sites were created. The “familiar” site contained images, prices, and descriptions of products found in grocery and/or hardware stores almost anywhere, to ensure that subjects would recognize these products and categories of products (for example, a top level category was “Health Care Products”). In contrast, the “unfamiliar” site contained fictitious software products and accessories and their categories were meaningless (for example, a top level category was “Novo Products”). Subjects could not make use of previous experience in searching through the unfamiliar site. A manipulation check for the familiarity variable asked subjects to evaluate their level of familiarity with the subject matter in the Web site. Subjects on average rated the familiar site 5.4 (standard deviation 1.4) and unfamiliar site as 2.2 (standard deviation 1.4) on a 7-point Likert scale. The difference was significant (t=21.3; one-tailed p<.001). Depth The experimental Web sites were created with two different hierarchical depths for the 81 bottom level pages. A “broad” structure was created with 2 levels to traverse (9 hyperlinks per page), and a “deep” structure contained 4 levels to traverse (3 hyperlinks per page). These levels were chosen after considering previous studies (Norman & Chin, 1988; Landauer & Nachbar, 1985; Kiger, 1984; Miller, 1981), which used very similar schemes. The Web sites utilized a linear design where lower child pages were accessible only through their parent page. We did not provide the ability to skip through levels by shortcuts to bottom nodes 17 or other non-linear methods of navigation. The only exception was a link to the site’s Home page located on every page to provide a lifeline to a well-known starting point. A manipulation check for the depth variable asked subjects to evaluate the number of further choices or links on each page visited in the Web site. On a 7-point Likert scale, subjects on average rated the broad site 4.1 (standard deviation 1.9) and the deep site 3.9 (standard deviation 1.6). Although in the expected direction, the difference was not significant (t=1.0; one-tailed p<.16), suggesting that either the condition or the manipulation check measure was inadequate. Further testing on the manipulation check for depth revealed that users had trouble judging “absolute” depth or breadth in this setting. Because the effect of depth has greatest implications in the unfamiliar site, effectively multiplying the number of required clicks by at least 5, we omitted the scores of subjects who went from the unfamiliar to the familiar sites, assuming that the extra two clicks in the deep familiar site would be too trivial to comprise a meaningful difference in perceived depth. For the subjects who browsed the unfamiliar site as the second site, the mean score was 4.6 for the broad site and 3.9 for the deep site, differing significantly on the depth variable (t=1.7; onetailed p<.046). Thus, there is some evidence that the subjects could perceive and judge the depth/breadth manipulation, though not as strongly as the other factors. Dependent Variables Table 4 provides a summary of the items and their scales. Table 4: Instrument Reliabilities Dependent Measure Attitudes Performance Behavioral Intentions Number of Items Type of Items 7 9 2 9-point scale Dichotomous 7-point scale Reliability alpha = .95 KR-20 = .90 alpha = .94 18 Attitudes Attitudes about the sites were measured averaging the responses to a set of seven 9-point Likert-type questions adapted from Part 3 of the long form of the QUIS (Questionnaire for User Interaction Satisfaction) (Shneiderman, 1998, p. 136), which has demonstrated reliability and validity (Chin et al., 1988). Cronbach’s alpha, was .95. Performance Performance was operationalized using an average score for nineiii search tasks that were developed for this study. The nine tasks required subjects to visit each third of the site‘s hierarchy the same number of times, providing a reasonably balanced overall view of the site. For accomplishing their search tasks the questions for each site were identical except for the search object names (general store products versus software products). Subjects needed to visit exactly the same position of the structure in each site, and the only differences between the familiar and unfamiliar sites were the products and product categories themselves. The nine tasks required browsing the site for facts (such as prices or packaging) about various products and received 1 point for each correct answer. Short, dichotomous instruments are not normally subjected to reliability analysis, so the results of such analysis should not be viewed with the same expectations as Likert-type instruments or dichotomous instruments with over 20 items (Nunnally, 1978). Nevertheless, the statistic of the Kuder-Richardson-20, or KR-20 test (analogous to alpha, but for dichotomous items), was quite high in this study (.90). Behavioral Intentions Behavioral intentions were measured using the average of two original questions that focus on two related future behaviors: how readily the subject would visit the site again and how likely he or she would recommend that others visit the site (7-point scales). The alpha score for this very 19 short instrument was also extremely high, at .94, as in a previous study (Galletta et al., 2004). Subjects Undergraduate business majors enrolled in sections of an upper-division undergraduate MIS course at a large Northeastern U.S. university were invited to participate in the study. Of 191 students invited, 177 (93%) volunteered to participate. The first 160 served to fill 5 completely counterbalanced groups of 32 each (77 females, 83 males). The last 17 subjects were discarded. Subjects were given the opportunity to participate during one of their class periods, and instructors offered to provide extra credit points to stimulate participation. The within- and between-subjects design provided an equal number of subjects (40) for each of the 8 cells. A pre-test employing a separate group of 32 subjects from the same sampling pool (1 for each of the 32 combinations of conditions) in a previous term helped us refine the instruments and procedures. Those data are not included in our results. Subjects were randomly assigned to one of the 32 counterbalanced conditions. In general, ANOVA revealed no differences among the 8 cells on gender, computer experience (measured by the instrument from Compeau & Higgins, 1995), and computer efficacy (measured by two scales adapted from Nelson, 1991; following the procedure of Hartzel, 1999). Of the 21 2x2x2 tests (3 main effects and 4 interactions for each of the three demographic variables), only one difference was significant: computer efficacy differed along a 2-way interaction of delay and depth (F=4.7; p=.032; 1,149 df). One test out of 21 is expected to be significant at the .05 level based purely by chance, and a Bonferroni adjustment renders it nonsignificant. Procedure The Web sites were created and their 32 combinations written on CDs to precisely control the browser’s response time.iv A computer laboratory containing 46 identical Pentium II computers 20 and 17” XGA screens further controlled the subjects’ environments. Upon their arrival, subjects were asked to sign an “informed consent” form, and were told that participation was voluntary and they could leave the study at any time. They were then instructed to enter the randomly-assigned code number (1-32) printed on their packet to trigger the system to load the correct treatment condition. Logs provided by background software and manipulation check questions described earlier provided assurance that the correct number was entered. A very small sample site provided a “warm-up” and the subjects were prompted to complete the required 9 tasks for each of the two main sites. At the conclusion of each main site, the subjects were instructed to close the browser window and complete the questions addressing attitudes and behavioral intentions. On average, subjects completed the experiment in about an hour. While page simplicity could endanger task realism, our site is typical of most web sites that categorize product pages into intermediate categories that are linked together from a top page, and the task typifies common questions for which users might seek information. Many sites require that users seek the best scent possible to navigate to pages lower in the hierarchy toward the foraging goal. Any failures in realism would be likely to mute our findings. Results Because the correlation between performance and attitudes was .50 (p<.001), our analysis of each hypothesis from H1 through H7 began with a multivariate ANOVA followed by a univariate ANOVA for each dependent variable. Because each MANOVA test was significant, univariate tests were run. SPSS version 10 was used for all of the analysis in this study. Tables 5 and 6 provide means for each dependent variable and Table 7 provides the multivariate tests of H1 through H7. The tests reveal that every main and every possible interaction effect is significant (p<.002). Therefore, univariate testing was performed for each hypothesis. 21 Individual ANOVA tests were run for both variables, and the amount of variance explained by each analysis (including interaction effects) was .557 for attitudes and .498 for performance (adjusted R2). Table 5: Cell Means for Attitudes Treatment Mean Deep Unfamiliar Familiar Broad Unfamiliar Familiar Slow Standard Deviation Mean Fast Standard Deviation 1.8 4.8 1.0 1.9 2.8 6.4 1.5 1.6 2.5 6.2 1.4 1.7 4.0 6.6 2.1 1.2 Table 6: Cell Means for Performance Treatment Deep Unfamiliar Familiar Broad Unfamiliar Familiar Slow Mean Standard Deviation Mean Fast Standard Deviation .42 .95 .29 .09 .80 .98 .30 .05 .81 1.0 .23 0 .93 1.0 .17 .02 Table 7: Overall MANOVA: Both Dependent Variables (for H1-H7) Factor Delay Familiarity Depth Delay * Familiarity Delay * Depth Depth * Familiarity Delay * Familiarity * Depth F 37.3 216.0 35.8 17.2 6.4 15.6 7.7 Sig .000 .000 .000 .000 .002 .000 .001 Means for the dependent variables, and results of hypothesis tests for the main effects are presented in Tables 8-10. All 6 tests are significant (in the predicted direction) at the .001 level, providing strong support for the main effects predicted by the model. It is notable that the 22 depth/breadth treatment shows strong results; it is apparent that the weakness in the depth/breadth treatment was in the manipulation check, and not the actual treatment itself. Table 8: The Delay Main Effect Dependent Measure Attitudes Performance Slow Mean Standard Deviation 3.8 2.3 .80 .30 Mean 5.0 .92 Fast Standard Deviation 2.3 .19 Difference (F, p) at 1,312 df F=42.1; p<.001 F=40.8; p<.001 Table 9: The Depth Main Effect Dependent Measure Attitudes Performance Deep Broad Mean Standard Mean Standard Deviation Deviation 3.9 2.4 4.8 2.3 .79 .31 .93 .16 Difference (F, p) at 1,312 df F=27.3; p<.001 F=52.0; p<.001 Table 10: The Familiarity Main Effect Dependent Measure Attitudes Performance Unfamiliar Mean Standard Deviation 2.8 1.7 .74 .31 Familiar Standard Deviation 6.0 1.8 .98 .06 Mean Difference (F, p) at 1,312 df F=331.7; p<.001 F=144.0; p<.001 Main Effects Attitudes and performance were significantly more positive in the faster site, broader site, and familiar site, providing clear support for Hypotheses H1a, H1b, H2a, H2b, H3a, and H3b. Interaction Effects All main (reported above) and interaction effects were assessed using the same model, resulting in analysis similar to the multivariate analysis in Table 7. Depth and Familiarity Hypotheses H4a and H4b predict that the positive effects of breadth are more important for 23 an unfamiliar site than for a familiar site (and conversely, the negative effects of depth are more severe for an unfamiliar site than a familiar site). Although the interaction effect was not significant for attitudes (H4a), the performance effect seems to hold very clearly, and precisely in the expected direction (H4b; F=31.4; p<.001). Depth and Delay Hypotheses H5a and H5b predict that the positive effects of breadth (rather than depth) are enhanced more for a slower site than for a faster site. As in the analysis of the previous interaction, such an interaction was found for performance (H5b; F=12.57; p<.001) but not attitudes (H5a). The means are in the expected direction, demonstrating that breadth indeed reduces the performance disadvantage of a slow site. Familiarity and Delay Hypotheses H6a and H6b predict that the positive effects of familiarity are enhanced more for a slower site than for a faster site. Once again, the interaction effect on performance was significant (H6b; F=34.5; p<.001) but not on attitudes (H6a). The results demonstrate a strong interaction, as predicted, indicating that familiarity reduces the performance disadvantage of a slow site. Delay, Depth, and Familiarity Hypotheses H7a and H7b predict that the combined positive effects of familiarity and breadth are enhanced more for a slower site than for a faster site. The results support both H7a and H7b, demonstrating a three-way interaction for both attitudes (F=5.9; p<.016) and task performance (F=7.98; p<.005). The cell means reveal that familiarity and breadth interact to reduce the performance and attitudinal disadvantage of a slow site. Antecedents of Behavioral Intentions H8a and H8b state that behavioral intentions can be predicted by attitudes and performance, 24 respectively. Intercorrelations among the constructs indicated general support for both hypotheses (p<.001), with a .774 correlation between attitudes and behavioral intentions and a .392 correlation between performance and behavioral intentions. To determine the relative contribution of each variable in predicting behavioral intentions, a multiple regression approach was also used. The regression model included intentions as a dependent variable and attitudes and performance as independent variables, and was found to explain significant variance in behavioral intentions (adjusted R2=.597). However, only the attitude measure contributed significantly to the model (t=18.8; p<.001). In a stepwise regression, only attitudes entered the model (adjusted R2=.598). Table 11 provides a summary of the Hypotheses and results. The results indicate that, perhaps unsurprisingly, there are indeed powerful main effects from Web site delay, depth, and familiarity on user attitudes and performance in searching through the site. However, beyond those intuitive main effects, for performance we found strong two-way interactions between depth and familiarity, depth and delay, and between familiarity and delay as expected. A strong three-way interaction among all three factors was found for both performance and attitudes. Finally, attitudes appear to be a powerful antecedent of behavioral intentions. Interestingly, performance effects were consistently stronger in every main and interaction test than effects on attitudes. Meanwhile, 2-way interaction effects were completely absent for attitudes, and while most of the hypotheses received strong support from manipulating the three factors, the strongest findings are in the arena of performance. Simultaneously, attitudes are the strongest predictor of behavioral intentions, overwhelming any additional impacts of performance. Therefore, a focus only on performance or attitudes appears to be rather incomplete. Table 11: Summary of Findings 25 H1a H1b H2a H2b H3a H3b H4a H4b H5a H5b H6a H6b H7a H7b H8a H8b Expectation Effect of Factors Result Attitudes: Fast > Slow Performance: Fast > Slow Attitudes: Broad > Deep Performance: Broad > Deep Attitudes: Familiar > Unfamiliar Performance: Familiar > Unfamiliar Attitudes: Interaction between Depth & Familiarity Performance: Interaction between Depth & Familiarity Attitudes: Interaction between Depth & Delay Performance Interaction between Depth & Delay Attitudes: Interaction between Familiarity & Delay Performance: Interaction between Familiarity & Delay Attitudes: Interaction between Delay, Depth, & Familiarity Performance: Interaction between Delay, Depth, & Familiarity Attitudes as an antecedent of Behavioral Intentions Performance as an antecedent of Behavioral Intentions Main Main Main Main Main Main 2-way 2-way 2-way 2-way 2-way 2-way 3-way 3-way direct direct Supported Supported Supported Supported Supported Supported Not Supported Supported Not Supported Supported Not Supported Supported Supported Supported Supported Mixed Discussion Our model predicted all possible main and interactive effects of delay, familiarity, and depth on users’ attitudes and performance on nine search tasks. The model also predicted that behavioral intentions would, in turn, be influenced by both attitudes and performance. The 3-way ANOVAs explained 56% of the variance in attitudes and 50% of the variance in performance. As expected, all main effects were highly significant on both of the hypothesized outcome variables. Attitudes and performance on the search task were strongly degraded by delay, unfamiliar structure and content, and deep structure. Those results are consistent with HCI research outside of the Web arena, and provide some assurance that the independent variables chosen in this study did have strong effects on the outcomes we examined. The interaction effects seemed to have the strongest effects on user performance. All possible 2-way interactions among the factors showed significant effects on performance, but not on attitudes. The 3-way interaction was significant for both performance and attitudes. Although it is rela- 26 tively rare to theorize and find 3-way interactions, it appears that the three factors indeed have a synergistic relationship. The failure of 2-way interactions for attitudes led us to examine the means graphically. Figures 2 and 3 provide 3-D bar charts of both performance and attitudes, and it becomes apparent that the unfamiliar treatment has such a devastating effect on attitudes that there is much less room for interaction effects to take hold. Stated another way, the familiarity main effect seems to overshadow the 2-way interactions for attitudes. It is possible that the 3-way interaction remains highly significant for attitudes because of the high strength of the synergies among the three factors. Further, examination of Figure 3 seems to indicate that familiarity has even stronger effects than delay on attitudes, suggesting that familiarity might actually be the core problem. 0.8 0.6 0.4 0.2 Unfam Fam 0 De ep -Sl De ow ep - Fa Bro s t ad Sl o Bro w ad - Fa st 7 6 5 4 3 2 1 0 Unfam 1 Figure 3: 3-Way Interaction for Attitudes Fam Figure 2: 3-Way Interaction for Performance De ep De -S l ep -Fa ow Br oa st d -S Br low oa d-F as t Behavioral intentions correlated significantly with both attitudes (r=.774; p<.001) and performance (r=.392; p<.001), suggesting that both H8a and H8b are supported. While performance explains 15.1% of the variation in behavioral intentions (p<.001), a multiple regression revealed that the effects of attitudes appeared to overshadow completely the effects of performance on behavioral intentions. In other words, if we did not measure attitudes, we would consider H8b to be supported rather strongly. We suspected multicollinearity between attitudes and performance, but the correlation between the two was .495, below the traditional threshold of .6 and the VIF score was 1.32, be- 27 low the general threshold of 4. Therefore, our conclusion is that support for H8a is strong and support for H8b is “mixed,” demonstrating that studying performance without attitudes could provide misleading results. Limitations As in most studies, there are several limitations that should be kept in mind when considering the results of this study. First, the subjects were college students, and the results might not generalize to the rest of the population. Fortunately, Voich (1995) found students to be particularly representative of values and beliefs of individuals employed in a variety of occupations. Also, we did not expect students to react differently to the stimuli than would other individuals (for example, students would probably not be appropriate for studying how loan officers make decisions). Thus, the materials were designed to tap rather “invariant” (Simon, 1990) activities such as aspects of Web use. Another limitation is that the task might not represent typical tasks that on-line shoppers undertake. To address this potential problem, we attempted to make the tasks as realistic as possible. It is perhaps frequent to find shoppers browsing through a site, looking for facts and details of products that they might buy. Our task required precisely this activity. The main departure from traditional usage was the lack of “search” facilities. A search option would have destroyed our control over the paths followed and would have confounded our results. Finally, incentives of laboratory browsers and home shoppers might differ. In short, home shoppers can choose a different store if they became frustrated searching for a desired product, while our users did not desire any of the products but participated to receive extra credit. While extra credit is indeed an incentive it does not reflect incentives of actual shopping. Tracking real shoppers on real on-line stores would have provided greater realism, but the cost would be a loss of con- 28 trol. Our sacrifice provided greater assurance that the effects we found were caused by the treatments. We probably muted our findings because students in a lab with a set of tasks placed in front of them would not be as likely to care very much about the task or be as likely to suffer reduced performance (quit before their tasks were all complete). Our findings are therefore perhaps more conservative than necessary. Implications and Conclusions This study is meant to provide implications for both researchers and practitioners. Web researchers can use, with confidence, two-decade-old findings of research on main effects delay, depth, and familiarity in the context of time-sharing menu systems. Interactions among those factors provide strong findings beyond the main effects that have been studied separately for so long, and studying both performance and attitudinal outcomes together provide a more complete story than studying only performance or attitude outcomes. Information foraging theory appears to be a promising basis for research in Web design, as users appear to have strong reactions as the costs of browsing for information increase, possibly exceeding its benefits. Furthermore, time is a potent component of foraging cost, but only considering delay time would omit other important factors such as familiarity and depth, two factors that add to foraging cost. These factors allow a researcher to perform the more complete analysis that is suggested by estimating foraging cost with a formula that predicts the nature of the users’ likely path to a goal and the distance to that goal. While our model appears to explain a large amount of variation in attitudes and task performance, there is ample opportunity to focus on other outcomes and other antecedents which might lead to greater understanding of the usability of Web sites. Also, while we were able to demonstrate correlations between behavioral intentions and both attitudes and task performance, other studies 29 should focus very closely on these three constructs to better understand their relationships under different situations. Practitioners should be advised to pay close attention to each of the three individual factors within their control to the greatest extent possible. Page loading time should be minimized, links to bottom level pages should use the most familiar possible terminology, and intermediate pages should have more, rather than fewer, links per page. More importantly, however, although practitioners are not usually made aware of interactions among experimental factors, the ones in this study provide some clear advice: First and foremost, designers should ascertain if the terminology used to categorize Web content is familiar to users. One promising tool suggested for testing a potential design is the cognitive walkthrough (Blackmon, et al., 2003). When it is difficult or impossible to expect a reasonable amount of familiarity in target users, then a great deal of effort should be expended to maximize breadth and speed (either by minimizing graphic content, streamlining page design, or assuring proper server configuration, redundancy, and/or bandwidth). When familiarity is expected to be high, pages can be made more attractive by using graphics, and the site can be made with greater depth than might otherwise have been considered advantageous. When it is likely that delay will be an issue because of the need for graphical content or likely bottlenecks, then breadth should be maximized and the site’s structure should employ very familiar, clear terminology. While analysis of single factors such as those investigated in this study are important for es- tablishing foundations for understanding basic design phenomena, analysis of interactions provide the basis for simultaneously more thorough understanding for researchers and more practical heuristics for designers. Refining and extending such interactive models would be a promising avenue for future work. 30 References Ajzen, I. “Nature and Operation of Attitudes,” Annual Review of Psychology, 52, 2001. 27–58 Bandura, A. “Self-Efficacy: Toward a Unifying Theory of Behavioral Change,” Psychological Review, 84:2, February 1977, 191-215. Barber, R.E. and Lucas, H.C., “System Response Time, Operator Productivity, and Job Satisfaction,” CACM, (26:11), November 1983, 972-986. Blackmon, M.H., Kitajima, M. and Polson, P.G. “Repairing usability problems identified by the cognitive walkthrough for the web,” SIGCHI 2003, Ft. Lauderdale, FL, pp. 497-504. Butler, T.W. “Computer Response Time and User Performance,” CHI 1983, Boston, 58-63. Card, S.K., Moran, T.P., and Newell, A. The Psychology of Human-Computer Interaction, 1983, Hillsdale, NJ: Lawrence Erlbaum Associates. Chae, M. and Kim, J. “An Empirical Study On The Breadth And Depth Tradeoffs In Very Small Screens: Focusing On Mobile Internet Phones,” AMCIS 2003, pp. 2083 – 2093. Chau, P.Y.K. Au, G. and Tam, K.Y. “Impact of information presentation modes on online shopping: An empirical evaluation of a broadband interactive shopping service,” Journal of Organizational Computing and Electronic Commerce, 10 (1), 2000, 1-22. Chi, E.H., Pirolli, P. and Pitkow, J. “The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site,” SIGCHI Proceedings, 2000, The Hague, The Netherlands, pp. 161-168. Chin, J.P., Diehl, V.A. and Norman, K.L. “Development of an Instrument Measuring User Satisfaction of the Human-Computer Interface,” CHI 1988, New York, 213-218. Chiravuri, A. and Peracchio, L. “Investigating Online Consumer Behavior Using Thin Slices Of Usability Of Web Sites,” AMCIS 2003, pp. 2105-2110 Compeau, D.R. & Higgins, C.A. “Computer Self-Efficacy: Development of a Measure and Initial Test,” MIS Quarterly, (19:2), 1995, 189-211. Compeau, D.R., Higgins, C.A., and Huff, S. “Social Cognitive Theory and Individual Reactions to Computing Technology: A Longitudinal Study,” MIS Quarterly, 23, 1999, 145-158. Connolly, P.J., “Growing Pains Won’t Slow Down the Broadband Revolution,” Infoworld, November 6, 2000, 66 and 68. Crowston, K. and Williams, M., 2000, "Reproduced and Emergent Genres of Communication on the World Wide Web," The Information Society, 16, 201-215. 31 Dannenbring, G.L. “The Effect of Computer Response Time on user Performance and Satisfaction: a Preliminary Investigation,” Behavior Research Methods & Instrumentation, 15(2), 1983, 213-216. Dataquest/Gartner, 2002. http://www3.gartner.com/Init. Davis, F.D. "Perceived Usefulness, Perceived Ease of Use and User Acceptance of Information Technology," MIS Quarterly (13:3), September, 1989, 319-340. Dellaert, B.G.C. and Kahn, B.E., “How Tolerable is Delay? Consumers’ Evaluations of Internet Web Sites after Waiting,” CentER Discussion Paper, MIT e-Commerce Research Forum, http://e-commerce.mit.edu/cgi-bin/viewpaper?id=8, 1999. Doherty, W. and Kelisky, R. “Managing VM/CMS Systems for User Effectiveness,” IBM Systems Journal, (18:1), 1979, 143-163. Dumais, S. T. and Landauer, T. K. “Using Examples to Describe Categories,” Proceedings of the CHI Conference on Human Factors in Computing Systems, Boston, 1983, 112-115. Edwards, D.M. and Hardman, L. 'Lost in hyperspace': cognitive mapping and navigation in a hypertext environment. Hypertext: theory into practice. Edited by R. McAleese. Oxford: BSP, 1989. Fishbein, M. and Ajzen, I. Belief, Attitude, Intention and Behavior: An Introduction to Theory and Research, Addison-Wesley Publishing Company, Reading, MA, (1975). Frese, M. “A theory of control and complexity: Implications for software design and integration of the computer system into the work place,” In Frese, M., Ulich, E. & Dzida, W. (eds.), Psychological issues of human-computer interaction at the work place. North-Holland, Amsterdam, 1987, 313-337. Galletta, D., Henry, R., McCoy, S., and Polak, P. (2004) “Web Site Delays: How Slow Can You Go?” Journal of AIS, Volume 5, Number 1, pp. 1-28. Gauch, S. and Smith, J.B. “Search Improvement via Automatic Query Reformulation,” ACM Transactions on Information Systems, (9:3), 1991, 249-280. Gray, W.D., John, B.E. & Atwood, M.C. (1993) "Project Ernestine: Validating GOMS for Predicting and Explaining Real-World Task Performance," Human-Computer Interaction, 8(3), 237-309. Grossberg, M., Weisner, R.A., and Yntema, D.B., “An Experiment on Problem Solving With Delayed Computer Responses,” IEEE Transactions on Systems, Man, and Cybernetics, 6, 1976, 219-222. Hartzel, K.S. “Understanding the Role of Client Characteristics, Expectations, and Desires in Packaged Software Satisfaction,” unpublished doctoral thesis, University of Pittsburgh, Katz Graduate School of Business, 1999. Hoxmeier, J. A. and DiCesare, C. “System Response Time and User Satisfaction: An Experimental Study of Browser Based Applications", AMCIS, 2000, 140-145. 32 In-Stat/MDR, 2002; http://www.instat.com/ Jacko, J. A., Salvendy, G. and Koubek, R. J. “Modeling of Menu Design in Computerized Work,” Interacting with Computers, (7:3), 1995, 304-330. Jacko J. A. and Salvendy, G. “Hierarchical menu design: Breadth, depth, and task complexity,” Perceptual and Motor Skills, (82:3), 1996, 1187-1202. Jul, S. and Furnas, G.W. Navigation in Electronic Worlds: ACM SIGCHI Bulletin (29:4), 1997 Kaindl, H., Kramer, S., and Afonso, L.M. “Combining Structure Search and Content Search for the World-Wide Web,” Ninth ACM Conference on Hypertext, Pittsburgh, PA, 1998, 217-224. Katz, M.A. and Byrne, M.D., “The Effect of Scent and Breadth on Use of Site-Specific Search on Ecommerce Web Sites”, ACM Transactions on Human-Computer Interaction, (10:3), 2003, 198-220. Kiger, J., “The depth/breadth trade-off in the design of menu-driven user interfaces,” International Journal of Man-Machine Studies, 20, 1984, 201-213. Koblentz, E. “Web Hosting Partner Keeps Stone Site Rolling,” Eweek, October 30, 2000, p. 27. Korthauer, R.D. and Koubek, R.J., 1994, "An Empirical Evaluation of Knowledge, Cognitive Style, and Structure upon the Performance of a Hypertext Taks," International Journal of HumanComputer Interaction, 6(4), 373-390. Kuhmann, W. “Experimental Investigation of Stress-inducing Properties of System Response Times,” Ergonomics, (32:3), 1989, 271-280. Landauer, T. and Nachbar, D. “Selection from alphabetic and numeric menu trees using a touch screen: Breadth, depth, and width,” Proceedings of CHI’85, New York 1985, 73-78. Larson, K. and Czerwinski, M., “Web Page Design: Implications of Memory, Structure and Scent for Information Retrieval,” Proceedings of CHI’98, Los Angeles, 1998, 25-31. Lederer, A.L., Maupin, D.J., Sena, M.P., and Zhuang, Y. “The Technology Acceptance Model and the World Wide Web,” Decision Support Systems, (29:3), 2000, 269-282. Lee, A.T., “Web Usability: A Review of the Research,” SIGCHI Bulletin, (31:1), 1999, 38-40. Lee, E. and MacGregor, J. “Minimizing User Search Time in Menu Retrieval Systems,” Human Factors, 27, 1985, 157-162. Lohse, G.L. and Spiller, P. “Electronic Shopping,” Communications of the ACM, (41:7), 1998, 81-87. Martin, G.L., and Corl, K.G., “System Response Time Effects on User Productivity,” Behavior and Information Technology, (5:1) 1986, 3-13. 33 McKinney, V., Yoon, K. & Zahedi, F. (2002). “The Measurement of Web-Customer Satisfaction: An Expectation and Disconfirmation Approach,” Information Systems Research, 13 (3), 296-315. Miller, D. “The Depth/Breadth Tradeoff in Hierarchical Computer Menus,” Proceedings of the 25th Annual Meeting of the Human Factors Society, New York, 1981, 296-300. Miller, R.B. “Response Time in Man-Computer Conversational Transactions,” Proceedings of the Fall Joint Computer Conference, 1968, 267-277. Muramatsu and Pratt, “Transparent Queries: investigation users' mental models of search engines,” Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 217-224, 2001. Nah, F. “A Study of Web Users’ Waiting Time,” in Intelligent Support Systems Technology: Knowledge Management, Vijayan Sugumaran (editor), IRM Press, 2002, pp. 145-152. Nah, F.H. and Kim, K. “World Wide Wait,” in Managing Web-Enabled Technologies in Organizations: A Global Perspective, Medhi Khosrowpour (editor), Idea Publishing Group, (2000), pp. 146-161. Nelson, R.R. “Educational Needs as Perceived by IS and end-user personnel: A Survey of Knowledge and Skill Requirements,” MIS Quarterly, (15:4), 1991, 503-525. Nielsen, J. “Who Commits The "Top Ten Mistakes" of Web Design?,” http://www.useit.com/alertbox/990516.html, 1999a. Nielsen, J. “The Top Ten New Mistakes of Web Design,” http://www.useit.com/alertbox/990530.html, 1999b. Nielsen, J. “The Need for Speed,” http://www.useit.com/alertbox/9703a.html, 1997 Norman, K. and Chin, J. “The Effect of Tree Structure on Search in a Hierarchical Menu Selection System,” Behaviour and Information Technology, (7:1), 1988, 51-65. Nunnally, J. Psychometric Theory, McGraw-Hill Book Company, New York, MA, 1978. Olston, C. and Chi, E.H. “ScentTrails: Integrating Browsing and Searching on the Web,” ACM Transactions on Computer-Human Interaction, Vol. 10, No. 3, 2003, pp. 177–197. Paap, K. R. and Roske-Hofstrand, R. J., “Design of Menus”, In Handbook of Human-Computer Interaction, M. Helander (Ed.), Elsevier, Amsterdam, 205-235, 1988. Pardu, H. and Landry, J. “Evaluation Of Alternative Interface Designs For E-Tail Shopping: An Empirical Study Of Three Generalized Hierarchical Navigation Schemes,” AMCIS 2001, pp. 1335-1337. Parkinson, S., Hill, M., Sisson, N. and Viera, C. “Effects of Breadth, Depth, and Number of Responses On Computer Menu Search Performance,” International Journal of Man-Machine Studies, 34 28, 1988, 683-692. Pirolli, P. and Card, S.,1999, "Information Foraging," Psychological Review, 106(4), 643-675. Pitkow, C., Kehoe, J., and Rogers, J. GVU's 9th WWW User Survey, http://www.cc.gatech.edu/ gvu/user_surveys/survey-1998-04/reports/, 1998. Polak, P. (2002). “The Direct and Interactive Effects of Web Site Delay Length, Delay Variability, and Feedback on User Attitudes, Performance, and Intentions,” unpublished doctoral dissertation, University of Pittsburgh, Katz Graduate School of Business. Ramsay, J., Barbesi, A., Preece, J. “A Psychological Investigation of Long Retrieval Times on the World Wide Web,” Interacting with Computers, (10:1), 1998, 77-86. Robertson, G., McCracken, D., and Newell, A., “The ZOG Approach to Man-Machine Communication,” International Journal of Man-Machine Studies, 14, 1981, 461-488. Rose, G., Khoo, H., Straub, D.W., (1999) “Current Technological Impediments to Business-ToConsumer Electronic Commerce,” Communications of the AIS, 1, Article 16. Rose, G.M, Lees, J. & Meuter, M. (2001). “A refined view of download time impacts on e-consumer attitudes and patronage intentions toward e-retailers,” The International Journal on Media Management, 3 (2), 105-111. Rose, G.M. & Straub, D. (2001). “The Effect of Download Time on Consumer Attitude Toward the E-Service Retailer,” e-Service Journal, 1 (1), 55-76. Santosa, P.I. “Observing a User’s Mental Model of an Informational Website,” AMCIS 2003, pp. 2223-2230. Sawasdichair and Poggenpohl, “User purposes and information-seeking behaviors in web-based media: a user-centered approach to information design on websites,” Conference on Designing interactive systems: processes, practices, methods, and techniques, 2002, pp. 2-1-212. Schwartz, J. P. and Norman, K. L. “The Importance of Item Distinctiveness on Performance Using A Menu Selection System,” Behaviour and Information Technology, (5:2), 1986, 173-182. Sears, A. and Jacko, J.A. “Understanding the relation between network quality of service and the usability of distributed multimedia documents,” Human-Computer Interaction, (15:1), 2000, 43-68. Sears, A., Jacko, J.A., and Dubach, E.M. “International aspects of WWW usability and the role of high-end graphical enhancements,” International Journal of Human-Computer Interaction, (12:2), 2000, 243-263. Shneiderman, B. Designing the User Interface, Addison-Wesley, Boston, 1998. Simon, H.A., “Invariants of Human Behavior,” Annual Review of Psychology, (41), 1990, 1-19. 35 Somberg, B. and Picardi, M. “Locus of the Information Familiarity Effect in the Search of Computer Menus,” Proceedings, 27th Annual Meeting of the Human Factors Society, Virginia, 1983, 826-830. Song, J. and Zahedi, F., “Web Design in E-Commerce: A Theory and Empirical Analysis,” Proceedings of the International Conference on Information Systems, December 17-20, 2001, Session T3.3. Taylor, S. and Todd, P.A., “Understanding Information Technology Usage: A Test of Competing Models,” Information Systems Research, 6, 1995, 144-176. Teal, S.L. and Rudnicky, A.I., “A Performance Model of System Delay and User Strategy Selection,” Proceedings of the 1992 Conference on Computer-Human Interaction, ACM, New York, 295-305. Thuring, M., Hannemann, J., and Haake, J. M. 1995. Hypermedia and cognition: Designing for comprehension. Communications of the ACM, 38:8 57-66. Torkzadeh, G. & Dhillon, G. (2002). “Measuring Factors that Influence the Success of Internet Commerce,” Information Systems Research, 13 (2), 187-204. Turban, E. and Gehrke, D. “Determinants of e-commerce Website” (sic), Human Systems Management, 19, 2000, 111-120. Venkatesh, V. and Davis, F.D. "A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies," Management Science, (46:2), February 2000, 186-204. Voich, D., Comparative Empirical Analysis of Cultural Values and Perceptions of Political Economy Issues, Westport, Connecticut: Praeger, 1995. Wonnacott, L. “When User Experience is at Risk, Tier 1 Providers can Help,” Infoworld, November 6, 2000, p. 76. Zaphiris, P. and Mtei, L. “Depth vs Breadth in the Arrangement of Web Links,” http://www.otal.umd.edu/SHORE/, 1997. Zona Research Study, “The Need for Speed,” http://www.zonaresearch.com/info/press/99%2Djun30.htm, 1999. 36 End Notes Although a search engine could allow a user to jump directly to a needed resource, full-text searches cannot be counted on for exclusive access to documents (Olston & Chi, 2003). First, they are not always available, or when available, are often not very useful. Few people use such tools effectively, partly due to the unique syntax and behavior of each search engine (Lohse & Spiller, 1998), partly because searching is a difficult task for most users (Muramatsu & Pratt, 2001; Gauch & Smith, 1991), and partly because most search outcomes lack precision (Kaindl et al., 1998). In short, the typical search engine interface is “notorious for its lack of usability” (Lee, 1999, p. 38). Sometimes terminology itself is the problem, such as searching for the old term “PCMCIA” versus the new term “PC Card.” ii To prevent any learning effects in the second task, we counterbalanced completely the order for withinsubjects factors. 32 combinations = 2 familiarity orders (familiar-unfamiliar and unfamiliar-familiar) x 4 speed conditions (fast-fast, fast-slow, slow-fast, and slow-slow) x 4 depth conditions (deep-deep, deep-broad, broad-deep, broad-broad). iii We performed hypothesis tests (as shown later in the Analysis section) for all tasks separately and found nearly identical results. We also examined the progression of performance from one task to the next, and found that there was no systematic degradation of performance over time. iv The CD, as well as all instruments and materials, are available from the first author on request. i