Optimizing Web pages Hyun Joo Lee The University of Texas at Austin Graduate School of Information LIS 385 T: Information Architecture and Design Prof. Don Turnbull – Spring 2003 February 20, 2003 Optimizing Web Pages 2 Introduction Making Web pages user-friendly To make Web sites user-friendly, the primary thing to consider is optimizing Web pages. In his book of Designing Web Usability: The Practice of Simplicity (2000), Jakob Nielsen says, “fast response times are the most important design criterion for web pages.” The increasing size of digital media and limited bandwidth make it difficult for users to access Web pages. The size of the Web page is determined by HTML files and any graphics, background images, included elements such as image, and JavaScript (.js), External Sheets (.css) and multimedia files such as sound, video, flash files in the page. By minimizing the amount of data that travels through the bandwidth, Web pages can be simple and loaded faster. Optimizing techniques make Web pages download faster and increase acceptability. According to Andrew B. King (2003, p. 5), it is important to make Web sites that people actually use and speed is a key component of usability, which helps determine sites acceptability. Newbytes News Network analyzed that 50 percent of online transactions are aborted before their completion (May 11, 2001). The primary reason is poor Web performance. Users do not wait long and most users access the Internet at 56Kbps or less. Research of human factors has shown that “users will wait at most 8 to 10 seconds for a Web page to display” (Newsbytes, 2001). Eight to ten seconds for downloading means 30-40 Kbytes total in terms of page size at 56.6 Kbps bandwidth since 2Kbytes takes almost 1 second at 56.6 Kbps (Table 1). Nowadays the thresholds are becoming shorter. Web pages that violate this limit could lose intended users who do not wait for the pages to download. Optimizing Web Pages 3 Table 1. Maximum Page Size for Various Connection Speeds and Attention Thresholds (King, 2003, p. 20) Attention Threshold Bandwidth 56.6 Kbps ISDN T1 1 second 2KB 8KB 100KB 2 seconds 4KB 16KB 200KB 10 seconds 34KB 150KB 2MB History After the birth of the Web, the online environment has been studied broadly. Network latency, a delay between requesting resources and receiving, is not always the same. “The more resources a page has (graphics, multimedia), the less predictable the response rate” (King, 2003, p. 8). HCI (Human-computer interaction) researchers studied the effects of fixed response times on user satisfaction and simulated variable response rate for more real-world results. In the late 1990s and early 2000s, researchers began to look at “Shackel’s likability” (Appendix A) dimension by studying “the effects of download delays on user perceptions of web sites, flow states and emotional appeal” (King, p. 8). There are several ways for optimizing web pages, such as optimization by coding (HTML, DHTML) techniques, using simple graphical design, using cache. Search engine optimization and Web server (e.g. HTTP) optimization are usually handled outside of IA. So, this research paper mainly deals with optimization by markup coding and graphic design techniques. Speeding Up HTML Optimization HTML optimization is “the process of minimizing HTML file size to maximize page display speed” (Jupitermedia, 2001). Typical Web pages have extra characters (comments, spaces, returns and redundant attributes) that can safely be removed with no change in appearance. There Optimizing Web Pages 4 are several techniques to make a HTML file optimized, such as limiting 255 line length, removing whitespace, omitting redundant tags and attributes, minimizing alt values and so on. Most of these techniques will not make much difference but they would impact Web page development. Examples of HTML optimizations are as below. Examples) HEAD minimization By minimizing the size of HEAD, the initial display time can be shorten because browser interact with servers in discrete-sized packets and the HEAD is parsed before the rest of the page is rendered (Jupitermedia, 2001). Tables By simplifying complex tables, breaking up into separate tables, such as a simple fast loading table on the top and using the colored table cells instead of graphics, the display time can be faster. Height/Width The height and width attributes give information to the browser about the size of graphics. Using height and width attributes means that the browser doesn't have to calculate the image size because the height and width values are already decided. If the height and width attributes are not included, the browser has to load the entire image, then calculate its size before displaying it. In addition, HTML editors such as FrontPage pollute the HTML page with a lot of extra and unwanted tags. Therefore, by removing these redundant tags in a text editor like Notepad, the loading time can be shorter. There are several applications that can help with this matter. XHTML(Extensible HTML) “X” means that HTML’s addition to a large family of extensible, interoperable, and selfdescribing markup language and act as a bridge between the HTML and XML (King, P. 99). XHTML structures content logically, works well with browsers and can separates presentation into a style sheet. Properly designed XHTML documents are typically smaller and less complex. Also, using XHTML “saves on bandwidth and maintenance costs”, while it increases “accessibility and interoperability on alternate platforms”(King, p. 100). Optimizing Web Pages 5 DHTML Optimization DHTML is basically a combination of Cascading Style Sheets (CSS) and Javascript to make Web pages more interactive (TheWebsEye, 2002). DHTML files take comparatively small capacity since it is created using text, rendering faster than alternatives such as Flash and Java in HTML. Cascading Style Sheets (CSS) Cascading Style Sheets (CSS) is a simple mechanism for adding style (e.g. fonts, colors, spacing) to Web documents (W3C, 2003). Cascading Style Sheets are one of the key components of DHTML and a good way to minimizing markup file for maximizing speed. It has more flexibility than HTML CSS that was first supported in IE 3.0. Converting old-style table/font layouts into CSS style code saves 25 to 50 percent off file sizes and gives benefits of adaptable structure. For example, using CSS instead of the font tag to set font display values, we can add style very simply and consistently. The font tag will let you create some of the same display effects, but it is no longer considered a valid tag. Using CSS makes documents be standardscompliant and also, it provide display control through code working with both HTML and XML. JavaScript JavaScript is ideally suited to data validation, interactive forms and enhancing navigation. Also, JavaScript offers rich opportunities for file-size and execution-speed optimization (King, p. 193). According to King’s analysis (pp. 194-246), JavaScript is relatively slow compared to compiled languages since it is run over a network connection. But, most JavaScripts are so fast and small that users do not notice the speed degradation. By using techniques like packing, compression, and obfuscation, 50 to 90 percent of the size of JavaScript files can be saved. Also, using JavaScript saves both extra clicks and a trip to the server. Optimizing Web Pages 6 Caching files (Re-using files) Cache is a small temporary storage area of the hard drive where browsers keep files while they display them. Any file that is in the cache can be displayed repeatedly without being downloaded again and again (Use cache, Retrieved February 5, 2003). The idea of using graphics on more than one page can be extended beyond the background or menu graphics using cache. In addition to reusing whole graphics, it is possible to reuse components shared in different graphics on more than one page and in more than one graphic. For example, if we have a company logo or title graphic that serves as the header for sections of the company's Web site, it is common to change only part of the graphic to reflect the particular section that it is labeling. Graphical design optimization Nowadays, images are everywhere in web pages and they determine the download time of web pages. The size of a HTML page depends on the number and size of images and multimedia objects, which make up over 50 percent of the average page. The way to optimize Web graphics depends on balancing between maximizing the quality of graphics and minimizing the download time. Effective graphics that load fast and do not dither optimize factors such as color depth, resolution and compression. Graphics programs optimize the file size and quality of GIFs, JPEGs, and PNGs to varying degree. Each has its own strengths and weakness. By “eliminating and replacing images with text and CSS, combining neighboring graphics and reusing graphics with the same URLs” (King, p. 250), the download time of graphics can be decreased. Optimizing Web Pages 7 Color Color depth determines the number of colors present in the image. It's an important factor when optimizing GIF images. If the number of colors become higher, the image size becomes greater. Between the image quality and size, there is always a trade-off. Also, storing images in either indexed or RGB color affect the file size. Indexed color is mapped to the smaller color palette. RGB formats known as true-color use 8 bits for each Red, Green and Blue value to form a 24-bit pixel palette that has 16.7 million colors. The native format of most image editing programs is RGB color. The GIF format is an indexed color format. There is a common and optimized “Web-safe” color pallete. Resolution Web pages are viewed on computer monitors, which typically display data at 72 ppi (pixels per inch). Therefore, saving files at a resolution of 72 ppi is a good way to minimize the file. If we are using graphics files, which were originally created for print media, they are typically saved at a high resolution (300 ppi) (Use Screen Resolution. Retrieved February, 5, 2003). Compression Compression methods can be divided by two ways: lossy and lossless (Webreference, 2000). Lossy compression creates smaller files by discarding some information about the original image. (WebReference, 1996). It removes details and color changes it deems too small for the human eye to differentiate. Lossless compression, on the other hand, never discards any information about the original file. Balancing between reducing the file size and graphical effectiveness are always trade-offs. In other words, matching the proper file format to the right image saves valuable download time and makes the image look better. Too much compression negatively impacts image quality, Optimizing Web Pages 8 while too little compression can slow your site to a painful crawl. The goal is to give readers something pleasing to see within the time they are willing to wait. The three graphic file formats that are supported by all browsers are GIF, JPEG and PNG (Appendix B). PNG are supported by newer browsers and designed for the Web. Each of these formats compresses graphic data using different methods. Each method works best with a certain type of graphical data: GIF or PNdG for line art with blocks of solid color and JPEG for photos or graphics with continuous tones (“Select the Best File Format”, Retrieved February 5, 2003). For future trends, JPEG 2000, which has functionality such as “multi-resolution, superior compression efficiency at low bit rates, lossy to lossless progression, and embedded bit stream archtecture,” is on the horizon (WebReference, 1996). Also, Exif, PNP, ART, Wavelet, Bravo and FlashPix challenge the GIF, JPEG monopoly in image compression. In addition, there are several other ways to optimize graphics, such as “cropping” which cuts off unwanted parts of a picture is a good way to enhance the focus of the graphic content and reduce file size. Also, “Anti-aliasing” can smooth out graphic’s edges but often adds size to graphic file and adds the page loading process. A “Thumbnail” is a miniaturized version of a full-sized image. As a design element it can enable you to show many images without the overhead of loading the full-sized versions. “Interlacing” is a technique for gradually loading graphic files. Interlaced GIFs display on the screen in a non-contiguous manner, and create the effect of low-resolution version of the graphic appearing very quickly. So interlacing can give the readers the illusion of higher bandwidth, because they have something to look at while the rest of the file is downloading. In case of multimedia contents, it is helpful to use smaller midi and audio file (.wav) if possible. For providing interactive Web pages using multimedia files, it is important to consider Optimizing Web Pages 9 not only creating multimedia contents but also delivering the contents efficiently since multimedia files dominates traffic. According to Maureen Chesire et all, a small fraction (3 percent) of streaming media is responsible for nearly half of the traffic although “most streaming-media files are low in but-rate (less than 56Kbps), small in size (less than 1MB), and short in duration (under 10minutes)” (King, p. 283). For video, provide a link to the file and leave the choice of viewing it to the visitor rather than forcing it to load in the page. Also, when optimizing Flash content, several variables should be considered: “audio, bit rate, reuse symbols, keep animation to a minimum, and generate a filesize report to test” (King, p. 324). Optimizing and IA Between the “speeding up” of the downloading time of the Web pages and graphically enriched design, we cannot tell which is more important. Optimization does not mean simply speeding up web pages. Balance between the speeding up and graphical design by user-centered aspects could make web pages more effective and usable. The information architecture of a Website is more closely related to user-centered aspects and the usability of Web sites rather than media-rich graphically designed Web sites. IAs should be able to “comprehend the whole picture rather than a single portion of it” and “seek design solutions from an objective point of view within the context of people’s needs, the content, the brand and the technology” (WebReference). Therefore, optimizing Web pages by balancing between speeding up and graphical enriched design using user-centered aspects is closely related to the tasks that information architects should do. In his book of Designing Web Usability: The Practice of Simplicity, Jakob Neilsen, usability specialist, emphasizes the simplicity of Web sites that respond users’ needs. The importance of Optimizing Web Pages 10 optimizing web pages is becoming more emphasized even though high bandwidth Internet services are spreading possibly because M-commerce and popular wireless handheld services rely on web page optimizing techniques. Conclusion As technology advances, web pages become more interactive and multimedia-based pages as designers believe that users want more complicated and activated web sites. Usability can be determined by “the users’ perception of the quality, which is based on the users’ ease of use, ease of learning and relearning, the intuitiveness for the users and the user’s appreciation of the usefulness of it” (Barnum, 2002, p. 6). Based upon the users’ perspectives, the primary thing that IAs should consider when they plan, design, and construct web pages is optimization that provides balances between usability and media-rich designed web pages. By optimizing web pages using several techniques, such as HTML coding and image file saving methods, IAs can enhance usability by speeding up the web pages and pursue the maximum effectiveness of graphical images at the same time. Optimizing Web Pages 11 References King, A. B. (2003). Speed up your site. Indianapolis, IN: New Riders. Neilsen, J. (2000). Designing web usability: The practice of simplicity. Indianapolis, IN: New Riders. Chesire, M. & Wolman, A. & Voelker, G.M. & Levy H.M. Measurement and Analysis of a Streaming-Media Workload. Berkeley, CA: USENIX Association 2001. Barnum, C.M. (2002). Usability testing and research. New York: Person Education, Inc. ISBN 0-205-31519-4. Newsbytes News Network. (May 11, 2001). Newsbytes Internet Week In Review: Almost 50 percent of online purchases aborted. Retrieved February 10, 2003 from http://www.webreference.com/authoring/languages/html/optimize/ WebReference Update Newsletter. (December 21, 2000). Information Architecture - A New Opportunity. Retrieved February, 12, 2003 from http://www.webreference.com/authoring/design/information/ia/ WebReference. (June 10, 1996). Compression. Retrieved February 4, 2003 from http://www.webreference.com/dev/graphics/compress.html Jupitermedia Corporation. (March 19, 2001) Extreme HTML Optimization. Retrieved February, 12, 2003 from http://webreference.com/authoring/languages/html/optimize/3.html http://www.webreference.com/authoring/languages/html/optimize/8.html TheWebsEye. (2002). Using DHTML-Dynamic HTML. Retrieved February 13, 2003, from http://www.thewebseye.com/DHTML.htm W3C. (2003, Feburary 10). Cascading Style Sheets. Retrieved February 13, 2003, from http://www.w3.org/Style/CSS/ W3C. (2003, January 9). PNG (Portable Network Graphics). Retrieved February 13, 2003, from http://www.w3.org/Graphics/PNG/ Webdevfp. (2003, Feburary 4). Use Cache. Retrieved February 5, 2004 from http://webdevfp.uwyo.edu/webdesign/optimizing/design/cache9.html Webdevfp. (2003, Feburary 4). Select the Best File Format. Retrieved February, 5, 2003 from http://webdevfp.uwyo.edu/webdesign/optimizing/graphics/fileformat5.html Webdevfp. (2003, Feburaury 4). Use Screen Resolution. Retrieved February, 5, 2003 from Optimizing Web Pages 12 http://webdevfp.uwyo.edu/webdesign/optimizing/graphics/resolution4.html Web site optimization, LLC. (2003, March 1). Shackel's Acceptability Paradigm. Retrieved March 4, 2003, from http://www.websiteoptimization.com/speed/1/1-1.html Roelofs, G. (1999). The Story of PNG. Retrieved February 5, 2003, from http://www.libpng.org/pub/png/slashpng-1999.html. Optimizing Web Pages 13 Appendix A Figure Shackel’s Acceptability Paradigm (quoted from Web site optimization, LLC) Utility Cost Speed Usability Likability Depending upon Shackel’s “system acceptability”, when users make a decision about using a web site, they weigh how useful it will be, its perceived ease of use, it suitability to the task, and how much it will cost them cost financially and socially (King. 2003, p. 6). Utility- Perceived usefulness Usability-Perceived ease of use Likability- The user’s subjective attitude about using the system Speed is an important part in all of three dimensions; therefore “it is an important determinant of system acceptability and usage” (King, p. 7). Optimizing Web Pages 14 Appendix B GIF The The Graphics Interchange Format (GIF) was developed by Compuserve in 1987 to “store multiple bitmap images into a single file for easy exchange over computer networks” (Webreference, 2000). The GIF is the oldest graphic file format on the Web, and supports up to 8 bits per pixel, which means a maximum of 256 colors (2^8=256 colors), 4-pass interlacing, transparency, and uses a lossless compression algorithm. GIFs compress by removing horizontal redundancy. For smaller GIFs, try not to introduce extra vertical detail or noise into GIF images. Horizontally oriented bands of color compress better than vertically oriented bands (FIG 1). When text uses as a graphic element, it is a good way to save it in GIF format unless it is added continuous tone effects added to the text. JPEG JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes. JPEGs work well on continuous tone images like photographs or natural artwork, photos of peoples, skin tones; not so well on sharp-edged or flat-color art like lettering, simple cartoons, and line drawings. JPEGs support 24-bits of color depth or 16.7 million colors (2^24=16,777,216 colors). Progressive JPEGs (p-JPEGs) are typically a couple percent smaller than baseline JPEGs; but their main advantage is that they appear in stages, similar to interlaced GIFs. JPEG is a lossy compression algorithm. JPEG works by converting the spatial image representation into a frequency map. The greater the compression, the greater the degree of information loss. Many commercial vendors of JPEG offer enhancements to speed, color quantization and quality. PNG (Portable Network Graphics) According to W3C’s definition (W3C. 2003), PNG is “an extensible file format for the lossless, portable, well-compressed storage of raster images.” PNG provides a patent-free replacement for GIF. Indexed-color, grayscale, and true color images are supported, plus an optional alpha channel for transparency. Netscape Navigator and Microsoft Internet Explorer pretty much define what counts as acceptable Web technology, and they only began supporting PNG natively in the autumn of 1997. In his The Story of PNG, Greg Roelofs said in 1999, “As various Slashdotters have noted, neither one really supports PNG well yet, at least with respect to alpha transparency and gamma correction, but that's coming”.