Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 WEB PAGE AESTHETICS AND PERFORMANCE: A SURVEY AND AN EXPERIMENTAL STUDY Kristi E. Schmidt, Michael Bauerly, Yili Liu, and Srivatsan Sridharan Department of Industrial and Operations Engineering The University of Michigan 1205 Beal Avenue Ann Arbor MI 48109-2117 Corresponding author’s e-mail: krischmi@umich.edu Abstract: A dual-process research and evaluation methodology was used to identify the underlying clusters of design variables affecting aesthetic judgment of a Web page, and to examine user preference, ease of interaction, and interaction speed for Web pages with various font and graphic sizes. To identify the clusters of variables, 57 design variables were identified by conducting a content analysis on relevant literature and by conducting structured interviews. A balanced incomplete block survey of the 57 variables was administered. Cluster analysis of the results revealed 10 underlying clusters, two of which were selected to conduct a 2 × 10 experiment that explored Web pages with two levels of graphic size and ten levels of font size. User preference and ease of interaction increase as font size increases and graphic size decreases. There was no difference in interaction speed among Web pages with varying font or graphic sizes. 1. INTRODUCTION The World Wide Web (WWW) has grown in user population and breadth to a great extent since its creation over three decades ago. It is now viewed as a backbone for several sectors including advertising and marketing, business and ecommerce, entertainment, healthcare, communication, education, religion, and government (Nua, 2003). There were nearly 414 million global home Internet users who each spent an average of over twelve hours online during the month of July 2003, up 1.46% from the previous month (Nielsen//NetRatings, 2003). The breadth, volume, and accessibility of the Internet has made it popular for individuals and organizations to create and maintain Web pages that go beyond communication to collaboration and even Internet commerce. The surge in Internet presence was not, however, paired with widespread design savvy or consideration for usability. The increased complexity of the Internet and the sheer volume of Internet users make the World Wide Web a very complex and often competitive environment. If users are unable to find what they need from a given Web page due to the lack of information or the complexity of navigation, they will become frustrated and move on to another site. On average, users spend one minute viewing each Web page (Nielsen//NetRatings, 2003). This relatively short amount of time demands the Web page to communicate critical information rapidly and demands high information processing capability of the user. Variables affecting user judgments of a Web page have often been intuitively defined and communicated by Web page designers through instructional Web page design manuals. Oliver (2003) defines four principles of web interface design and development: 1) usability—how intuitively or easily the media item is navigated and processed; 2) visualization—creation of visually interesting and aesthetically pleasing media items while avoiding potentially distracting or unnecessary features; 3) functionality—features of the media item and how useful they are for supporting a given task; and 4) accessibility—tools that help users access the site in alternative formats and provide increased functionality. Burstein (2003) groups Web page design variables into fifteen design elements: links, color issues, images, image maps, animated images, spacing, tables, frames, style sheets, cookies, JavaScript, Java, plug-ins, screen size, and file distribution. These online Web page design manuals are often based upon designer intuition and qualitative evaluation of existing Web pages. Research about Web page usability, preference, and performance has also been published in peer-reviewed journals, yet this research often represents qualitative survey results and literature reviews of case studies, rather than empirical quantitative research. Turner (2002) identified seven categories affecting Web page usability: navigation, page design, content, accessibility, media use, interactivity, and consistency. Cox and Dale (2002) developed a conceptual model to assess how a Web page can meet user expectations based upon six quality factors in Web page design and use: 1) clarity of purpose; 2) design; 3) accessibility and speed; 4) content; 5) customer service; and 6) customer relationships. Design was 478 Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 further broken down into five key issues: 1) links; 2) consistency, menus and site maps; 3) pages, text and clicks; 4) communication and feedback; and 5) search and fill-in forms. In addition to overall conceptual Web page design considerations being defined in a scientific and quantitative means, relative importance of the design elements that are defined must be determined and quantified in order to prioritize efforts in Web page design. Checklists such as the Heuristic Evaluation by Proxy (HEP test) of Web page usability (Turner, 2002) aim to quantitatively and qualitatively evaluate several observations and criteria, yet checklists such as this are often based on surveys, compilation of existing literature, or case studies rather than empirical research work. Clearly, current literature lacks definition of factors affecting user judgments of a Web page based upon rigorous measurement tools and lacks further definition of the quantitative relationship among these factors. This study quantitatively obtains and determines relationships among Web page design variables. The questions addressed in this study were: 1) what are the ranked importance of design variables and the underlying variable clusters affecting the user’s judgment of Web page aesthetics; and 2) what are the tradeoffs between Web page font size and Web page graphic size? A dual-process engineering aesthetics research and evaluation methodology (Liu, 2001, 2003) was used to scientifically and quantitatively investigate these issues. This dual-process methodology utilizes two parallel but closely related types of research methods that are aimed at achieving a comprehensive, rigorous, and quantitative understanding of aesthetic response, in this case with respect to Web page design. The two types of research methods that define the dual-process research and evaluation methodology and that are carried out simultaneously are multidimensional construct analysis and psychophysical analysis. Multidimensional construct analysis is a global top-down analysis that quantitatively answers questions involving the conceptual and mathematical structure of the aesthetic constructs involved in aesthetic judgment, the definition and measurement of the major psychological and physical dimensions involved, the identification of the relative importance and relationship of these dimensions, and the development of a multidimensional evaluation scale to measure the aesthetic construct with accurate validity and reliability. In this study, multidimensional construct analysis is used to identify variables of importance with respect to Web page design using content analysis and structured interviews, ranks those identified variables according to user preference by surveying several participants, and then uses multivariate statistical data reduction to cluster and factor the ranked variables according to relationships involving user perception taken from the survey’s profile data. Psychophysical analysis is a local bottom-up analysis that establishes a quantitative view of how user preference changes as a function of specific aesthetic variables identified in the multidimensional construct analysis. Specifically, user ability to perceive and judge values, changes and variations in design parameters, and corresponding preferences of the levels of values of aesthetic variables are of interest. In this study, a psychophysical experiment was conducted to quantitatively investigate user preference, ease of interaction, and performance tradeoffs between two variables from two separate clusters of variables identified in the multidimensional construct analysis and to identify and establish a quantitative relationship among these variables. 2. MULTIDIMENSIONAL CONSTRUCT ANALYSIS 2.1 Method Texts were selected that addressed aesthetics, usability, and/or design guidelines for Web page design (Nielsen, 2003; De Graff, 2003; Gibbs and Szentivanyi, 2003; Hom, 2003; Ericsson, 2003; Perlman, 2003; Marion, 2003; Burstein, 2003; and Instone, 2003). A content analysis was performed on these texts to obtain relevant variables affecting Web page aesthetics, Web page usability, and Web page design guidelines by extracting all meaningful words. Meaningful words are those words that aren’t just transition words and proposition words (i.e. not a, and, then, with, or to). A parallel structured interview process was conducted upon twenty undergraduate engineering students. Participants ranged in age from 18-22 years, each had at least five years of Internet experience, and each logged onto the Internet at least once daily. Each participant was specifically instructed to list variables that they thought were important in affecting Web page usability while browsing Web pages at their leisure. The participants were encouraged to list as many items as they could think of and to not limit their thought and creativity. The results of the content analysis and the structured interview were combined to obtain a list of variables including frequency of appearance. Fifty-seven variables were ultimately extracted based upon a tradeoff between the total number of variables collected, the frequency of each variable collected, and the limitations of the survey design, including potential central fatigue of the participants due to the length of the survey. 479 Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 A survey was conducted to rank the fifty-seven variables that resulted from the content analysis and structured interview. The survey used a Balanced Incomplete Block (BIB) design (Dunn-Rankin, 1983), in which a large set of ranking items are broken up into smaller groups. This survey design reduces the cognitive load on the participant by only having to rank 8 variables at a time rather than ranking the full list of fifty-seven variables. There were a total of fifty-seven small groups of 8 variables each that every participant (n=20) ranked, thereby comparing each variable to the rest of the variables twice. 480 Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 CLUSTER RANK 40 Page Progression/ 38 Targeting Strategy 42 34 6 53 36 Im age and Text 32 Balance 22 26 48 28 24 14 Navigation 29 25 30 19 27 23 18 Inform ation Value 20 17 21 15 39 2 3 4 5 Relevance/Speed 12 16 1 9 8 10 Trust (Security) 7 Platform 13 11 Independence 57 Marketing 52 55 46 33 35 Appeal/Diversion 43 37 31 56 54 41 51 Accessibility and 45 Multim edia 50 49 44 47 VARIABLE Frames Non-Frames Opening of New Brow ser Window Visual Design Cues Visual Groups Coordinated Audio and Video Pictures Instead of Description Simple Images Graphic File Size Font Size Graphics for Graphics and Text for Text Position in the Screen Clear Exits Wait-Time Feedback Printable Contents Navigation Support Back Button Grouping and Subheadings Simple Headlines and/or Titles Innovative Provide Search Length of an Article Interactive Minimized Scrolling Simple Uniform Resource Identifiers (URIs) Accurate Plain-Language Error Messages Server Response Times Time to Load Dow nload Time Speed Timely Information Updated Regularly Information Layout Location of Information Credible and Original Information Privacy Security Brow ser Independent System Independent Advertisements Banners Sudden Pop-Up Window s Animations Graphics Background Images Entertainment Drop Dow n Menus Free Service Songs Movies Games Icons Logo 3-D Images Multiple Colors Standard Colors for Links Accessible for Users w ith Disabilities 481 DENDROGRAM Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 Figure 1. Ten Clusters Consisting of 57 Ranked Variables and Corresponding Dendrogram 2.2 Results Overall rank for the fifty-seven variables for each participant as well as an overall rank considering all participants’ responses is obtained from the BIB survey profile data. A multivariate statistical data reduction using K-means nonhierarchical and hierarchical cluster analyses as well as a traditional factor analysis and a factor analysis with Varimax rotation were also carried out on the profile data from the BIB survey. The cluster analyses identify clusters, or groups, of similar variables according to the underlying user perception of the variables. The factor analyses describe the relationship among the observed variables in terms of a few underlying, but unobservable, constructs called factors. Figure 1 illustrates the fifty-seven variables that were identified, ranked, and clustered. BIB ranking yielded the five most important variables affecting Web page design (most important first): information layout, server response time, time to load, download time, and speed. The five least preferred variables affecting Web page design (least preferred first) were: advertisements, songs, sudden pop-up windows, movies, and coordinated audio and video. Hierarchical cluster analysis performed using the BIB ranking results produced the dendrogram that illustrates the underlying relationships among the variables by grouping similarly quantified entities at various stages of relationship formation. Ten clusters were identified based upon these underlying relationships: page progression/targeting strategy, image and text balance, navigation, information value, relevance/speed, trust (security), platform independence, marketing, appeal/diversion, and accessibility and multimedia. Factor analysis yielded similar results as the hierarchical cluster analysis. The similarity of results between the factor analysis and hierarchical cluster analysis illustrates consistency among the various analyses and proves the method is valid. 3. PSYCHOPHYSICAL EXPERIMENT 3.1 Method Two clusters of variables identified in the multidimensional construct analysis were selected for the psychophysical experiment: 1) appeal/diversion; and 2) image and text balance. One variable from each of the two clusters, respectively, was selected for this illustrative psychophysical experiment: 1) graphics; and 2) font size. This phase of the experiment quantitatively examined how user preference and ease as well as interaction speed is modified as a function of Web page graphic size and Web page font size. Twenty participants aged 22.3 to 29.3 years (mean 24.7 years, standard deviation 1.9 years) participated in the experiment. All participants had normal or corrected-normal vision and normal color vision. Each participant accessed the Internet on average at least two hours a day, and each participant has had at least four years of prior Internet experience. The participants were compensated $10.00 for approximately one hour of their time. Each participant participated in twenty experimental trials that involved viewing twenty Web pages that were designed based upon The New York Times on the Web (2003). The participant was instructed to read the article text displayed on the Web page and then to click a link at the bottom of the page when they were finished. At the conclusion of every trial, participants answered a four-choice multiple choice comprehension question about the article and then to rate their preference for the Web page and ease of their interaction with the Web page on a scale from 0 (low) to 10 (high). The total time the participant spent viewing each Web page was also recorded. Among the twenty Web pages each participant viewed, the content of the article, the graphic size, and the font size varied. A total of twenty different news articles were selected for the experimental stimuli and then reduced in length to between 180 and 190 words. There were two graphics that corresponded to each article, a small graphic that ranged from 124-300 pixels high by 175-200 pixels wide, and a large graphic that ranged from 224-382 pixels high by 552 pixels wide. The two graphic sizes corresponded to the small picture/large picture format of The New York Times on the Web (2003). There were ten font sizes possible for each article: 7.5, 8.5, 9, 9.5, 10, 10.5, 11, 12, 13, or 14 point. Twenty articles, two graphic sizes, and ten font sizes combined to create four hundred unique Web pages. No article content/graphic size/font size condition was replicated within-subject or between-subjects. Each participant viewed each article once, each graphic size ten times, and each font size two times. Each Web page was presented to the participant on a traditional 17 inch CRT visual display terminal with 60 Hertz refresh rate and 1280x1024 pixel resolution. Participants used a mouse with a scroll wheel as an input device. 482 Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 10 10 60 8 7 40 6 30 5 4 20 3 2 10 9 50 Time (seconds) Time (seconds) 50 Rating (0=low to 10=high) 9 8 7 40 6 30 5 4 20 3 2 10 1 0 1 0 Large 0 Small 0 7.5 8.5 9 9.5 Graphic Size Interaction time Preference Rating (0=low to 10=high) 60 10 10.5 11 12 13 14 Font Size (point) Ease of interaction Figure 2. Average Interaction Time, Preference, and Ease of Interaction for Each Graphic Size Condition Interaction time Preference Ease of interaction Figure 3. Average Interaction Time, Preference, and Ease of Interaction for Each Font Size Condition 3.2 Results A 2 x 10 repeated measures analysis of variance (ANOVA) was performed with the 400 data points from the experiment (20 participants, 2 graphic sizes, 10 font sizes). The statistical analysis was a within subject design. The within subject measures included Web page graphic size (small or large) and Web page font size (7.5, 8.5, 9, 9.5, 10, 10.5, 11, 12, 13, or 14 point). Figure 2 displays the average interaction time, the average preference rating, and the average ease of interaction rating by each graphic size condition. The main effect of graphic size on interaction time was not significant (p=0.1771). The main effect of graphic size on user preference was significant (p=0.0027). The main effect of graphic size on ease of interaction was also significant (p=0.0179). Figure 3 displays the average interaction time, the average preference rating, and the average ease of interaction rating by each font size condition. The main effect of font size on interaction time was not significant (p=0.4913). The main effect of font size on user preference was significant (p<0.0001). The main effect of font size on ease of interaction was also significant (p<0.0001). 4. DISCUSSION AND CONCLUSIONS This study applied a dual-process engineering aesthetics research and evaluation methodology to Web page design evaluation. The multidimensional construct analysis (top-down) approach yielded 57 ranked variables affecting user judgments of a Web page as well as identified ten clusters grouping the 57 variables that reflect the underlying mental structure of the user preferences. The ten clusters summarizing user judgment of a Web page are: page progression/targeting strategy, image and text balance, navigation, information value, relevance/speed, trust (security), platform independence, marketing, appeal/diversion, and accessibility and multimedia. The 57 variables (Figure 1) and ten clusters identified using the dual-process engineering aesthetics research and evaluation methodology are relatively consistent with previous studies that were based upon intuition, case studies, and literature review alone. The ranked variables as well as the ten descriptive clusters provide valuable insights to Web page designers regarding the underlying motivations and perceptions of Web page users in a quantitative and analytical manner as opposed to an intuitive generation of a prioritized list or a literature review based upon case studies or qualitative surveys. The psychophysical analysis (bottom-up) approach illustrated the ability to quantify relationships between variables that summarize user judgment of a Web page. Results showed that user preference and ease of interaction increase as font size increases and graphic size decreases, however, there was no difference in speed of interaction among Web pages with varying font size or graphic size. This finding provides useful insight to Web page designers that performance may not be the best indicator of Web page preference or ease of use. This psychophysical analysis also provides an 483 Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 introduction to further investigation of relationships among the variables underlying clusters identified in the multidimensional construct analysis. Future study may investigate further issues such as age, task of Web page browsing (general purpose browsing or a directed fact-finding search), content of Web pages (user interest or choice of the researcher) (Marchionini and Shneiderman, 1988), and different methods of achieving high level of aesthetics without significantly degrading loading speed. 5. REFERENCES 1. Burstein, C.D. (2003). Viewable with any browser: Campaign. http://www.anybrowser.org/campaign/ 2. Cox, J. and Dale, B.G. (2002). Key quality factors in Web site design and use: An examination. International Journal of Quality and Reliability Management, 19(7): 862-888. 3. De Graaff, H. (2003). HCI index. http://degraaff.org/hci/ 4. Dunn-Rankin, P. (1983). Scaling Methods. Lawrence Erlbaum Associates, Hillsdale, New Jersey. 5. Ericsson, M. (2003). HCI resources: Guidelines, styleguides, standards. http://www.ida.liu.se/~miker/hci/guidelines/ 6. Gibbs, S. and Szentivanyi, G. (2003). Index to multimedia information sources. http://viswiz.gmd.de/MultimediaInfo/ 7. Hom, J. (2003). The usability methods toolbox. http://www.best.com/~jthom/usability/usable.htm 8. Instone, K. (2003). Usable web. http://www.usableweb.com/ 9. Liu, Y. (2001). Engineering aesthetics and aesthetic ergonomics: A dual-process methodology and its applications. Proceedings of the International Conference on Affective Human Factors Design, pp. 248-255. 10. Liu, Y. (2003). Engineering aesthetics and aesthetic ergonomics: A dual-process methodology and its applications. Ergonomics, (in press). 11. Marchionini, G. and Shneiderman, B. (1988). Finding facts vs. browsing knowledge in hypertext systems. IEEE Computer, 21(3): 70-79. 12. Marion, C. (2003). Software design smorgasbord. http://www.chesco.com/~cmarion/ 13. The New York Times on the Web. (2003). http://nytimes.com/ 14. Nielsen, J. (2003). Jakob Nielsen on Usability and Web Design. http://www.useit.com/ 15. Nielsen//NetRatings: The global standard for digital media measurement and analysis. (2003). http://www.nielsennetratings.com/news.jsp?section=dat_gi 16. Nua. (2003). Online Internet http://www.nua.ie/surveys/index.cgi surveys, demographics, statistics and market 17. Oliver, K. (2003). Web Interface Design. http://www.edtech.vt.edu/edtech/id/interface/index.html 18. Perlman, G. (2003). HCI Sites. http://www.hcibib.org/hci-sites/ 19. Turner, S. (2002). The HEP test for grading Web site usability. Computers in Libraries, 22(10): 37-39. 484 research.