Comparing Performance Measurement Time Series Matthew S. Allen John Brevik Rich Wolski Network Measurement Tools • There are a variety of tools in existence that measure a link’s bandwidth (nws, nttcp, iperf, netperf, treno, remos) • Each uses slightly different techniques that cause differences in their output • There does not seem to be a clear methodology for comparing the output of these tools to see whether the results are, in fact, the same Two Questions • How can we tell whether measurements produced by two tools are exactly consistent with each other? • How can we tell whether measurements produced by two tools convey the same amount of information? Why Do We Care? • We are developing a new tool and we want to compare its output with that of a standard, more mature tool • We are trying to aggregate measurements made by two different tools • We want to understand the difference between intrusive, accurate probes and lightweight, less accurate probes Experiment Methodology 4Mb Experiment Size NWS 64Kb Send Size 16Kb TCP Buffer NTTCP Iperf Netperf 10,000 Measurements Measurement Time Series Bandwidth (Mb/s) 70 50 30 10 -10 1097533686 v 1097601486 1097669286 Time (UTC) 1097737086 Talk Outline • Capture Percentage – one way to determine whether two time series are consistent with each other • Autocorrelation of Differences – one way to determine whether two time series contain the same information Capture Percentage One way to tell whether measurements produced by two tools are consistent with each other Confidence Intervals N ∑ xi x= i=1 N N 2 s = 2 x − x ∑ i i=1 N • In set of data drawn from a normal distribution, we can say with 95% certainty that the average of the normal generating the values is between x1.96 s and x−1.96 s Model Building • We can calculate analogous values for a non-stationary series: – Prediction: pt = fp(x0, …, xt) 2 2 2 Predicted variance: e = f ((p – x ) , …, (p – x ) – t e 0 1 t-1 t ) • Use these to construct a prediction interval from pt+1.96et and pt-1.96et that we hope will capture 95% of the next values NWS Prediction Interval Bandwidth (Mb/s) 70 50 30 10 -10 1097533686 1097601486 1097669286 Time (UTC) 1097737086 Capture Percentage • The capture percentage is the percent of values that lie within the prediction interval • Empirically, we’ve seen that the capture percentage is very close to 95% • To determine if one stream accurately models another, we compare the capture percentage of stream 1’s prediction interval on stream 2’s values and vice versa Capture Example values model nws-1 nws-2 iperf nttcp nws-1 96.22 95.97 92.91 81.84 nws-2 96.26 96.35 93.26 82.56 iperf 96.64 96.54 95.42 84.36 nttcp 98.50 98.36 97.51 95.23 Autocorrelation of Differences One way to tell whether measurements produced by two tools convey the same information Correlation 3.2 2.4 2.4 1.6 1.6 0.8 0.8 0 0 Y Y 3.2 -0.8 -0.8 -1.6 -1.6 -2.4 -2.4 -3.2 -3.2 -2.4 -1.6 -0.8 -3.2 -3.2 -2.4 -1.6 -0.8 0 X 0.8 1.6 2.4 3.2 0 0.8 1.6 2.4 X • The relationship between pairs of variables • The correlation coefficient is the variance of points from the best fit line of the scatter plot (between 1 and -1) • The basis for autocorrelation 3.2 Measurement Autocorrelation 1 Correlation 0.8 0.6 0.4 0.2 0 0 5 10 15 20 Lag 25 30 35 40 Pair-wise Differencing • Each element from one measurement series is paired with the element from another series that is temporally closest to it • A new series is created from the difference between each pair of measurements • If this new series is independent, then intuitively the difference is noise and the deterministic components of each series carry the same information Difference Autocorrelation 1 Correlation 0.8 NWS-NWS 0.6 NWS-NTTCP 0.4 0.2 0 -0.2 0 5 10 15 20 Lag 25 30 35 40 Differencing Unrelated Series 1 Correlation 0.8 0.6 0.4 0.2 0 1 6 11 16 21 Lag 26 31 36 41 Conclusion • The heuristic test shown here allow developers to gain some information about the relationships between measurement time series – We can tell whether two tools produce output that is consistent with each other using the capture percentage – We can tell whether two tools produce equivalent information by looking at the autocorrelation structure of the differences Future Work • Although these techniques often work, they do not always work • Different prediction techniques produce different results • Techniques provide some insight as to whether two time series contain the same information, but not how two series are related