Instructor: Brian Davison Presented by : Hua Jiang Titima Boondarig Web workload model Examples Hot topics Problem-based approach Content covers: “A workload consists of the set of all inputs a system receives over a period of time.” ---- from the text book Inspected Analyzed Criticized A synthetic model Derives from an explicit mathematical model that can be Modeling real world Controlled manner Advantage Disadvantage Difficult to characterize Measurement differs Inter-affected Time-span effort “To show something, to test something, to verify that the design meets the requirements, to document the design!” ---Dr. John Hines, Air Force Research Laboratory Identifying parameters Analyzing measurement data Validate the model F(x) = e-λλx Exponential distribution F(x) = P(X>x) Complementary cumulative distribution function Probability distribution Mean Median Variance Statistics Pareto : F(x) = (k/x)a, x>=k P(r) = kr-c HTTP message characteristics Resources characteristics User behavior Response code Request method GET web page retrieval POST web forms method PUT file upload DELETE file deletion OPTIONS capabilities HEAD status 1xx informational 2xx success code 3xx redirection 4xx client error 5xx server error 10%~30% (304) 75%~90% small fraction majority Zipf-like Lognormal Resource popularity Resource changes Temporal locality Pareto Pareto(tail), Lognormal(body) Response sizes Number of embedded resources Pareto(tail), Lognormal(body) Content types Resource sizes Arun K. Iyengar, Mark S. Squillante, Li Zhang, "Analysis and Characterization of Large-Scale Web Server Access Patterns and Performance," World Wide Web, vol. 2, Baltzer, 1999. Exponential Pareto Pareto (tail) Session and request arrivals Clicks per session Request inter-arrival time Arun K. Iyengar, Mark S. Squillante, Li Zhang, "Analysis and Characterization of Large-Scale Web Server Access Patterns and Performance," World Wide Web, vol. 2, Baltzer, 1999. Combining workload parameters Validating the workload model Generating synthetic traffic Log & Privacy policies New technical developments Application of user-level data Server Browser WHY? Proxy Information available to software components HTTP header Configuring the browser Directing requests to an anonym zing proxy Using SSL and HTTPS Access to user-level data Identify performance Benchmarking web components Capacity planning Smaller and simple Reports how fast it can retrieve content Apachybench Flexible tool Fine-grained Similar methodology to SPECWeb99 Httperf Original and still popular web server benchmark Free benchmarking system Webstone A commercial- grade benchmark system $200-$800 SPECWeb99 First World Wide Web Server benchmark. Standardized workload, agreed to by major players in WWW market. Retired on April 24, 2000. SPECWeb96 K. Kant and Y. Won, “Server Capacity Planning for Web Traffic Workload”, IEEE trans. on knowledge and data engineering, Oct 1999, pp 731-747. K. Kant and Y. Won, “Server Capacity Planning for Web Traffic Workload”, IEEE trans. on knowledge and data engineering, Oct 1999, pp 731-747. Server application changes Client sophisticated Large and distributed servers Social factors Others Overload control Dynamic content Locality Is the property that an object whose appearance is unchanged regardless of the scale at which it is viewed. Self-similarity Web workload Modeling approach Characteristics Applications Future trends