Self Similarity in World Wide Web: Traffic Evidence and Possible Causes Mark E. Crovella and Azer Bestavros Computer Science Dept, Boston University Presented by Kalyan Boggavarapu CSC 497 Lehigh University Self-Similarity Def: is an object whose appearance is unchanged regardless of the scale it is used. Heavy tailed: a function exhibiting the power laws. E.g.: The geographical distribution of the people in the world. World Wide Web traffic can show SelfSimilarity 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 2 Data Set Traces from NCSA Mosaic Jan, Feb 1995 Logs: URL, session, User and workstation ID Experiment Environment: 37 SparkStation-2 workstations, 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 3 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 4 Self Similarity Characteristics Parameters Degree of self Similarity - H Hurst parameter H ,range of (1/2 , 1) H->1 is the max self-similarity In this paper we would see 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 6 Analysis in two stages Stage 1: what is the appropriate value of H. Stage 2: Which parameter accurately measures this parameter H. 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 7 Stage 1: Estimate the value of H Self Similarity for different time intervals Step 1: Estimate for short intervals ( 1 sec and above ) using: web traffic data for a single hr Plot: Variance Time plot, Rescaled range plot Periodogram plot Step 2: Estimate for scaling to large intervals Whittle Estimator 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 9 Self Similarity characteristics graphs 1 Slope => H This line is => H Slope is => H 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 10 Whilttle Estimator Estimates: the confidence range of H Based: a time series FGN – Fractional Gaussian Noise Model Now check: if timeseries aggregation or Estimated H is consistent or not ? Infer: www traffic at stub networks is self similar when traffic is high in demand. 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 11 Expected feature: aggregation => H Aggregation over a long range shows stability of the hypothesis Fully busy H Whittle estimator confirms our earlier calculations of H Variance of 95% Confidence Interval of H Least busy 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University H decreasing as it becomes less busy 12 Stage 2: Which parameter is useful to estimate the value of H Which parameter is responsible for self similarity? File requests => file transfers => unique files distribution Alpha = 1.2 H (.7-.8) 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 14 Its Available files Available files => Heavy tailed behavior of file transfer Conclusion: Distribution of available files => ( Web traffic self similarity = Heavy tailed distribution of file transfers) 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 15 Sources: “Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes” (1996) Mark Crovella, Azer Bestavros Proceedings of SIGMETRICS'96: The ACM International Conference on Measurement . and Modeling of Computer Systems 5/28/2016 Kalyan Boggavarapu CSC 497 Lehigh University 16