Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements Copyright 2004 David J. Lilja 1 Comparing more than two alternatives ANOVA Analysis of Variance Partitions total variation in a set of measurements into Variation due to real differences in alternatives Variation due to errors Copyright 2004 David J. Lilja 2 Comparing Two Alternatives 1. Before-and-after Did a change to the system have a statistically significant impact on performance? 2. Non-corresponding measurements Is there a statistically significant difference between two different systems? Copyright 2004 David J. Lilja 3 Before-and-After Comparison Assumptions Before-and-after measurements are not independent Variances in two sets of measurements may not be equal → Measurements are related Form obvious corresponding pairs Find mean of differences Copyright 2004 David J. Lilja 4 Before-and-After Comparison bi before measuremen t ai after measuremen t d i ai bi d mean value of d i sd standard deviation of d i (c1 , c2 ) d t1 / 2;n 1 Copyright 2004 David J. Lilja sd n 5 Before-and-After Comparison Measurement (i) Before (bi) After (ai) Difference (di = bi – ai) 1 85 86 -1 2 83 88 -5 3 94 90 4 4 90 95 -5 5 88 91 -3 6 87 83 4 Copyright 2004 David J. Lilja 6 Before-and-After Comparison Mean of difference s d 1 Standard deviation sd 4.15 From mean of differences, appears that change reduced performance. However, standard deviation is large. Copyright 2004 David J. Lilja 7 95% Confidence Interval for Mean of Differences t1 / 2;n 1 t0.975;5 2.571 n … 5 6 … ∞ 0.90 … 1.476 1.440 … 1.282 a 0.95 … 2.015 1.943 … 1.645 Copyright 2004 David J. Lilja 0.975 … 2.571 2.447 … 1.960 8 95% Confidence Interval for Mean of Differences sd c1, 2 d t1 / 2;n 1 n t1 / 2;n 1 t0.975;5 2.571 c1, 2 c1, 2 4.15 1 2.571 6 [5.36,3.36] Copyright 2004 David J. Lilja 9 95% Confidence Interval for Mean of Differences c1,2 = [-5.36, 3.36] Interval includes 0 → With 95% confidence, there is no statistically significant difference between the two systems. Copyright 2004 David J. Lilja 10 Noncorresponding Measurements No direct correspondence between pairs of measurements Unpaired observations n1 measurements of system 1 n2 measurements of system 2 Copyright 2004 David J. Lilja 11 Confidence Interval for Difference of Means 1. 2. 3. 4. 5. Compute means Compute difference of means Compute standard deviation of difference of means Find confidence interval for this difference No statistically significant difference between systems if interval includes 0 Copyright 2004 David J. Lilja 12 Confidence Interval for Difference of Means Difference of means : x x1 x2 Combined standard deviation : 2 1 2 2 s s sx n1 n2 Copyright 2004 David J. Lilja 13 Why Add Standard Deviations? x1 – x2 s1 s2 x1 x2 Copyright 2004 David J. Lilja 14 Number of Degrees of Freedom Not simply ndf n1 n2 2 2 1 1 2 2 2 2 s s n n ndf s 2 1 2 2 2 n1 s n2 n1 1 n2 1 Copyright 2004 David J. Lilja 2 15 Example – Noncorresponding Measurements n1 12 measuremen ts n2 7 measuremen ts x1 1243 s x2 1085 s s1 38.5 s2 54.0 Copyright 2004 David J. Lilja 16 Example (cont.) x x1 x2 1243 1085 158 38.52 54 2 23.24 sx 7 12 ndf 2 38.5 54 7 12 38.5 2 2 2 54 2 7 12 7 1 12 1 2 Copyright 2004 David J. Lilja 2 9.62 10 17 Example (cont.): 90% CI c1, 2 x t1 / 2;ndf s x t1 / 2;ndf t0.95;10 1.813 c1, 2 158 1.81323.24 c1, 2 [116,200] Copyright 2004 David J. Lilja 18 A Special Case If n1 <≈ 30 or n2 <≈ 30 And errors are normally distributed And s1 = s2 (standard deviations are equal) OR If n1 = n2 And errors are normally distributed Even if s1 is not equal s2 Then special case applies… Copyright 2004 David J. Lilja 19 A Special Case 1 1 n1 n2 (c1 , c2 ) x t1 / 2;nd f s p ndf n1 n2 2 sp s (n1 1) s (n2 1) n1 n2 2 2 1 2 2 Typically produces tighter confidence interval Sometimes useful after obtaining additional measurements to tease out small differences Copyright 2004 David J. Lilja 20 Comparing Proportions m1 # events of interest in system 1 n1 total # events in system 1 m2 # events of interest in system 2 n2 total # events in system 2 Copyright 2004 David J. Lilja 21 Comparing Proportions Since it is a binomial distribution mi Mean pi ni pi (1 pi ) Variance ni Copyright 2004 David J. Lilja 22 Comparing Proportions p ( p1 p2 ) sp p1 (1 p1 ) p2 (1 p2 ) n1 n2 (c1 , c2 ) p z1 / 2 s p Copyright 2004 David J. Lilja 23 OS Example (cont) Initial operating system Upgrade OS n1 = 1,300,203 interrupts (3.5 hours) m1 = 142,892 interrupts occurred in OS code p1 = 0.1099, or 11% of time executing in OS n2 = 999,382 m2 = 84,876 p2 = 0.0849, or 8.5% of time executing in OS Statistically significant improvement? Copyright 2004 David J. Lilja 24 OS Example (cont) p = p1 – p2 = 0.0250 sp = 0.0003911 90% confidence interval (0.0242, 0.0257) Statistically significant difference? Copyright 2004 David J. Lilja 25 Important Points Use confidence intervals to determine if there are statistically significant differences Before-and-after comparisons Noncorresponding measurements Find interval for mean of differences Find interval for difference of means Proportions If interval includes zero → No statistically significant difference Copyright 2004 David J. Lilja 26 Comparing More Than Two Alternatives Naïve approach Compare confidence intervals Copyright 2004 David J. Lilja 27 One-Factor Analysis of Variance (ANOVA) Very general technique Also called Look at total variation in a set of measurements Divide into meaningful components One-way classification One-factor experimental design Introduce basic concept with one-factor ANOVA Generalize later with design of experiments Copyright 2004 David J. Lilja 28 One-Factor Analysis of Variance (ANOVA) Separates total variation observed in a set of measurements into: Variation within one system 1. Variation between systems 2. Due to random measurement errors Due to real differences + random error Is variation(2) statistically > variation(1)? Copyright 2004 David J. Lilja 29 ANOVA Make n measurements of k alternatives yij = ith measurment on jth alternative Assumes errors are: Independent Gaussian (normal) Copyright 2004 David J. Lilja 30 Measurements for All Alternatives Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 31 Column Means Column means are average values of all measurements within a single alternative Average performance of one alternative y. j n y i 1 ij n Copyright 2004 David J. Lilja 32 Column Means Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 33 Deviation From Column Mean yij y. j eij eij deviation of yij from column mean error in measuremen ts Copyright 2004 David J. Lilja 34 Error = Deviation From Column Mean Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 35 Overall Mean Average of all measurements made of all alternatives y.. k n j 1 i 1 yij kn Copyright 2004 David J. Lilja 36 Overall Mean Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 37 Deviation From Overall Mean y. j y.. j j deviation of column mean from overall mean effect of alternativ e j Copyright 2004 David J. Lilja 38 Effect = Deviation From Overall Mean Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 39 Effects and Errors Effect is distance from overall mean Error is distance from column mean Horizontally across alternatives Vertically within one alternative Error across alternatives, too Individual measurements are then: yij y.. j eij Copyright 2004 David J. Lilja 40 Sum of Squares of Differences: SSE yij y. j eij eij yij y. j 2 2 SSE eij yij y. j k n j 1 i 1 k n j 1 i 1 Copyright 2004 David J. Lilja 41 Sum of Squares of Differences: SSA y. j y.. j j y. j y.. 2 2 SSA n j n y. j y.. k j 1 k j 1 Copyright 2004 David J. Lilja 42 Sum of Squares of Differences: SST yij y.. j eij tij j eij yij y.. 2 2 SST tij yij y.. k n j 1 i 1 k n j 1 i 1 Copyright 2004 David J. Lilja 43 Sum of Squares of Differences 2 SSA n y. j y.. k j 1 2 SSE yij y. j k n j 1 i 1 2 SST yij y.. k n j 1 i 1 Copyright 2004 David J. Lilja 44 Sum of Squares of Differences SST = differences between each measurement and overall mean SSA = variation due to effects of alternatives SSE = variation due to errors in measurments SST SSA SSE Copyright 2004 David J. Lilja 45 ANOVA – Fundamental Idea Separates variation in measured values into: Variation due to effects of alternatives 1. • SSA – variation across columns Variation due to errors 2. • SSE – variation within a single column If differences among alternatives are due to real differences, • SSA should be statistically > SSE Copyright 2004 David J. Lilja 46 Comparing SSE and SSA Simple approach SSA / SST = fraction of total variation explained by differences among alternatives SSE / SST = fraction of total variation due to experimental error But is it statistically significant? Copyright 2004 David J. Lilja 47 Statistically Comparing SSE and SSA Variance mean square value total variation degrees of freedom SSx 2 sx df Copyright 2004 David J. Lilja 48 Degrees of Freedom df(SSA) = k – 1, since k alternatives df(SSE) = k(n – 1), since k alternatives, each with (n – 1) df df(SST) = df(SSA) + df(SSE) = kn - 1 Copyright 2004 David J. Lilja 49 Degrees of Freedom for Effects Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 50 Degrees of Freedom for Errors Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 51 Degrees of Freedom for Errors Alternatives Measure ments 1 2 … j … k 1 y11 y12 … y1j … yk1 2 y21 y22 … y2j … y2k … … … … … … … i yi1 yi2 … yij … yik … … … … … … … n yn1 yn2 … ynj … ynk Col mean y.1 y.2 … y.j … y.k Effect α1 α2 … αj … αk Copyright 2004 David J. Lilja 52 Variances from Sum of Squares (Mean Square Value) SSA s k 1 SSE 2 se k (n 1) 2 a Copyright 2004 David J. Lilja 53 Comparing Variances Use F-test to compare ratio of variances sa2 F 2 se F[1 ;df ( num),df ( denom)] tabulated critical values Copyright 2004 David J. Lilja 54 F-test If Fcomputed > Ftable → We have (1 – α) * 100% confidence that variation due to actual differences in alternatives, SSA, is statistically greater than variation due to errors, SSE. Copyright 2004 David J. Lilja 55 ANOVA Summary Variation Alternativ es Error Total Sum of squares Deg freedom SSA k 1 SSE k (n 1) SST kn 1 Mean square sa2 SSA (k 1) se2 SSE [k (n 1)] Computed F sa2 se2 Tabulated F F[1 ;( k 1),k ( n 1)] Copyright 2004 David J. Lilja 56 ANOVA Example Alternatives Measurements 1 2 3 1 0.0972 0.1382 0.7966 2 0.0971 0.1432 0.5300 3 0.0969 0.1382 0.5152 4 0.1954 0.1730 0.6675 5 0.0974 0.1383 0.5298 Column mean 0.1168 0.1462 0.6078 Effects -0.1735 -0.1441 0.3175 Copyright 2004 David J. Lilja Overall mean 0.2903 57 ANOVA Example Variation Alternativ es Error Sum of squares SSA 0.7585 Deg freedom k 1 2 k (n 1) 12 Mean square sa2 0.3793 se2 0.0057 Computed F Tabulated F 0.3793 0.0057 66.4 F[ 0.95; 2,12] 3.89 Total SSE 0.0685 SST 0.8270 Copyright 2004 David J. Lilja kn 1 14 58 Conclusions from example SSA/SST = 0.7585/0.8270 = 0.917 → 91.7% of total variation in measurements is due to differences among alternatives SSE/SST = 0.0685/0.8270 = 0.083 → 8.3% of total variation in measurements is due to noise in measurements Computed F statistic > tabulated F statistic → 95% confidence that differences among alternatives are statistically significant. Copyright 2004 David J. Lilja 59 Contrasts ANOVA tells us that there is a statistically significant difference among alternatives But it does not tell us where difference is Use method of contrasts to compare subsets of alternatives A vs B {A, B} vs {C} Etc. Copyright 2004 David J. Lilja 60 Contrasts Contrast = linear combination of effects of alternatives k c w j j j 1 k w j 1 j 0 Copyright 2004 David J. Lilja 61 Contrasts E.g. Compare effect of system 1 to effect of system 2 w1 1 w2 1 w3 0 c (1)1 (1) 2 (0) 3 1 2 Copyright 2004 David J. Lilja 62 Construct confidence interval for contrasts Need Estimate of variance Appropriate value from t table Compute confidence interval as before If interval includes 0 Then no statistically significant difference exists between the alternatives included in the contrast Copyright 2004 David J. Lilja 63 Variance of random variables Recall that, for independent random variables X1 and X2 Var[ X 1 X 2 ] Var[ X 1 ] Var[ X 2 ] Var[ aX 1 ] a Var[ X 1 ] 2 Copyright 2004 David J. Lilja 64 Variance of a contrast c Var[ c] Var[ j 1 ( w j j )] k j 1 Var[ w j j ] k j 1 w2j Var[ j ] k s 2 c k j 1 2 2 j e (w s ) kn SSE 2 se k (n 1) df ( s ) k (n 1) 2 c Assumes variation due to errors is equally distributed among kn total measurements Copyright 2004 David J. Lilja 65 Confidence interval for contrasts (c1 , c2 ) c t1 / 2;k ( n 1) sc sc k 2 2 j e ( w s ) j 1 kn SSE s k (n 1) 2 e Copyright 2004 David J. Lilja 66 Example 90% confidence interval for contrast of [Sys1- Sys2] 1 0.1735 2 0.1441 3 0.3175 c[1 2 ] 0.1735 (0.1441) 0.0294 s c se 12 (1) 2 0 2 0.0275 3(5) 90% : (c1 , c2 ) (0.0784,0.0196) Copyright 2004 David J. Lilja 67 Important Points Use one-factor ANOVA to separate total variation into: – Variation within one system – Variation between systems Due to random errors Due to real differences (+ random error) Is the variation due to real differences statistically greater than the variation due to errors? Copyright 2004 David J. Lilja 68 Important Points Use contrasts to compare effects of subsets of alternatives Copyright 2004 David J. Lilja 69