Illustration Section 6.1 with R for data of Example 6.1 We would like to test whether an exam is too difficult, i.e. whether the location of underlying distribution F of exam results is smaller than 6 or not. > x=c(3.7,5.2,6.9,7.2,6.4,9.3,4.3,8.4, 6.5,8.1,7.3,6.1,5.8) #exam results Estimate location: > mean(x) [1] 6.553846 > median(x) [1] 6.5 Based on these numbers, it seems OK, but more precise investigation needed. Two (equivalent) ways: 1) by performing a test: 2) by computing confidence interval. First investigate shape of distribution with respect to normality, symmetry, outliers to see which test/conf.int could be appropriate: > > > > > par(mfrow=c(2,2)) hist(x,prob=T) qqnorm(x) symplot(x) boxplot(x) These plots show that the underlying distribution F could very well be symmetric and even normally distributed, and that we do not see outliers. However, the number of data is small, so we should not draw too strong conclusions. 1) Performing a test: H0: location smaller than 6 H1: location larger t-test: parameter of location is mean μ Test with level α=0.05 H0: μ < = 6 H1: μ > 6 #Compute values for test: > t.test(x,mu=6,alt="g") One Sample t-test data: x t = 1.2569, df = 12, p-value = 0.1164 alternative hypothesis: true mean is greater than 6 95 percent confidence interval: 5.768463 Inf sample estimates: mean of x 6.553846 Conclusion: ?? Sign test: parameter of location is median m Test with level α=0.05 H0: m < = 6 H1: m > 6 #Compute value of test statistic: > sum(x>6) [1] 9 #Check for values equal to 6: > sum(x==6) [1] 0 #no observations equal to 6 #Compute values for test: > binom.test(9,length(x),alt="g") #which p? Exact binomial test data: 9 and length(x) number of successes = 9, number of trials = 13, p-value = 0.1334 alternative hypothesis: true probability of success is greater than 0.5 95 percent confidence interval: #what is this? 0.4273807 1.0000000 sample estimates: probability of success 0.6923077 Conclusion: ?? Signed rank test or symmetry test of Wilcoxon: parameter of location is point of symmetry m Test with level α=0.05 H0: m < = 6 H1: m > 6 #Compute values for test: > wilcox.test(x,mu=6,alt="g") Wilcoxon signed rank test data: x V = 64, p-value = 0.1082 alternative hypothesis: true location is greater than 6 Conclusion: ?? NB. R computes the statistic V+ For confidence interval include additional parameter conf.int=TRUE Which test to be preferred? t-test: p-value = 0.12 sign test: p-value = 0.13 signed rank test: p-value = 0.11 In this case, for reasonable values of α the conclusions of the tests are the same and the p-values are not very different. Based on the plots, the t-test and signed rank test probably are the best here. However, in view of the small number of observations, normality is perhaps a too strong assumption. Moreover, in the context of the data--exam grades-- normality is not very realistic. Hence, for these data the result of the signed rank test probably should be trusted most. 2) Computing confidence interval Based on test: interval contains those values of location for which H0 is not rejected. t.test and wilcox.test ( …., conf.int=T) give interval corresponding to test; for sign test you have to check step by step which values in H0 are not rejected. Given a confidence interval, when is exam OK? Illustration for Section 6.2.: check signif. level and power: i) 1000 times a sample of size 100 from N(0,1) was generated and counted how many times the 3 tests rejected H0: μ =0 while testing with sign. level α=0.05. What do you expect? ii) a. 1000 times a sample of size 100 from N(0.1,1) was generated and counted how many times the 3 tests rejected H0: μ =0. What do you expect? b. 1000 times a sample of size 100 from N(0.2,1) was generated and counted how many times the 3 tests rejected H0: μ =0. What do you expect? i) 1000 times a sample of size 100 from N(0,1) was generated and counted how many times the 3 tests rejected H0: μ =0 while testing with sign. level α=0.05. t-test sign test wilcoxon 49 45 40 # number of times (out of 1000) the test rejected H_0 55 55 57 # (once more) So in about 0.05 % of the cases H0 is rejected when H0 is true. All test have correct significance level of 0.05! ii) a. 1000 times a sample of size 100 from N(0.1,1) was generated and counted how many times the 3 tests rejected H0: μ =0. t-test sign test wilcoxon 259 174 256 # number of times (out of 1000) the test rejected H0 272 175 261 # (once more) So t-test and Wilcoxon reject H0 more often than sign test when H0 is not true. Power of t-test and Wilcoxon better for normal distribution; t-test is best, but Wilcoxon is almost as good. This is according to the theory (see Table 6.1 for N0,1)): are(t-test,sign test)= π/2=1.57; are(wilcoxon, sign test) =3/2=1.5; are(ttest,wilcoxon)=π/3=1.05 b. 1000 times a sample of size 100 from N(0.2,1) was generated and counted how many times the 3 tests rejected H0: μ =0. t-test sign test wilcoxon 632 467 625 # number of times (out of 1000) the test rejected H0 639 462 627 # (once more) In this case the data came from a distribution that was shifted further away from H0 than in a), namely with a shift of 0.2 instead of 0.1. We see that the tests reject more often than in a). They can indeed distinguish the alternative from H0 better than in a).