Stat 921 Notes 4 Reading: Observational Studies, Chapters 2.5-2.7 I. Testing General Hypotheses (Section 2.6.1) So far, we have considered testing the null hypothesis of no treatment effect. There is an extension to test any hypothesized treatment effect. Let rT rC be the effect of treatment. Suppose we want to test H 0 : 0 (for no treatment effect 0 0 ). Under H 0 , rCsi rTsi si Z si . Thus, we can compute rC under H 0 based on the observed responses R and the observed treatment assignment Z ; call this value of rC under H 0 , rC ( 0 ) . Our test statistic is t ( Z , rC (0 )) and we reject for large values of the test statistic. The null hypothesis of the test statistic can be computed because H 0 , rC rC (0 ) . Example: Suppose we want to test the additive treatment effect H 0 : rT rC 1 Unit Group Observed Adjusted Ranks of Response Response Adjusted Responses I Zi Ri Zi Ri 1 qi 1 1 9 8 7 2 0 1 1 1 3 0 3 3 2 4 0 4 4 3 5 1 7 6 5 6 1 11 10 8 7 1 8 7 6 8 0 5 5 4 Let t ( Z , rC (0 )) be the Wilcoxon rank sum statistic, i.e., the sum of the ranks of the adjusted responses in the treated group. Then, the observed test statistic is T 7 5 8 6 26 . This is the largest possible rank sum for N 8 and the p-value is 1 8 1/ 70 0.014 . 4 For the creativity study, suppose we want to test the hypothesis that the intrinsic motivation increases scores by 2. intrinsic=c(12,12,12.9,13.6,16.6,17.2,17.5,18.2,19.1,19.3,19.8,20.3,20.5,20.6,21.3, 21.6,22.1,22.2,22.6,23.1,24,24.3,26.7,29.7); extrinsic=c(5,5.4,6.1,10.9,11.8,12,12.3,14.8,15,16.8,17.2,17.2,17.4,17.5,18.5,18.7, 18.7,19.2,19.5,20.7,21.2,22.1,24); wilcox.test(intrinsic-2,extrinsic,exact=TRUE,alternative="greater"); Wilcoxon rank sum test with continuity correction 2 data: intrinsic - 2 and extrinsic W = 330, p-value = 0.1274 alternative hypothesis: true location shift is greater than 0 Warning message: In wilcox.test.default(intrinsic - 2, extrinsic, exact = TRUE, alternative = "greater") : cannot compute exact p-value with ties There is no strong evidence against the hypothesis that the intrinsic treatment has an additive effect of 2. For a two-sided test, wilcox.test(intrinsic-2,extrinsic,exact=TRUE); Wilcoxon rank sum test with continuity correction data: intrinsic - 2 and extrinsic W = 330, p-value = 0.2548 alternative hypothesis: true location shift is not equal to 0 Note, when there are ties, wilcox.test does not compute exact pvalues and instead uses the normal approximation. To obtain exact p-values, we can install the exactRankTests package. library(exactRankTests); wilcox.exact(intrinsic-2,extrinsic,alternative="greater"); Exact Wilcoxon rank sum test data: intrinsic - 2 and extrinsic W = 330, p-value = 0.1276 alternative hypothesis: true mu is greater than 0 3 II. Confidence Intervals by Inverting a Test Under the model of an additive treatment effect, rTsi rCsi , a 1 confidence set for is obtained by testing each value of and collecting all values not rejected into a set A . For an effect increasing statistic, which includes all the tests from Chapter 2.4.3 of the book (e.g., Wilcoxon rank sum, Wilcoxon signed rank, difference in means, etc.), the test statistic is a decreasing function of . The argument is the following: * * Let . Let rC ( ), rC ( ) denote the potential responses * under control under , respectively based on the observed * * responses R , i.e., rC ( R Z ), rC ( R Z ) . Then, for any * z, (rsi rsi )(2 zsi 1) 0 for all s, i . Then, for an effect * increasing statistic, t ( Z , rC ( )) t ( Z , rC ( )) Since the test statistic is a decreasing function of , we can find the confidence interval by the bisection method (see Chapter 9 of Numerical Recipes in C, http://www.nrbook.com/a/bookcpdf.php ) The function uniroot in R finds the zero of a one-dimensional monotonic function using a bisection method. # Find one-sided 95% lower confidence interval for tau pval.minus.alpha.func=function(tau0,ytreated,ycontrol,alpha=.05){ pval=wilcox.exact(ytreated-tau0,ycontrol,alternative="greater")$p.value; pval-alpha; 4 } lower.ci.limit=uniroot(pval.minus.alpha.func,c(10,10),ytreated=intrinsic,ycontrol=extrinsic)$root; > lower.ci.limit [1] 1.400071 A one-sided 95% confidence interval for is approximately (1.40, ) . Two Sided Confidence Interval A two sided 95% confidence interval can be found by taking the intersection of two 97.5% one-sided confidence intervals (this is the shortest interval containing all that are rejected by neither of two one-sided, 0.025 level tests). # Find two-sided confidence inteval lower.twosided.ci.limit=uniroot(pval.minus.alpha.func,c(10,10),ytreated=intrinsic,ycontrol=extrinsic,alpha=.025)$root; upper.twosided.ci.limit=uniroot(pval.minus.alpha.func,c(10,10),ytreated=intrinsic,ycontrol=extrinsic,alpha=.025,side="less")$root; > lower.twosided.ci.limit [1] 1.000062 > upper.twosided.ci.limit [1] 6.599908 III. Point Estimates: Unbiased Estimates of the Average Effect (Section 2.7.1) Point Estimates: Unbiased Estimates of the Average Effect 5 Randomized experiments enable us to obtain unbiased estimates of the average treatment effect. Suppose there are N subjects and m of the subjects are randomly assigned to treatment, the rest to take the control Consider estimating the average causal effect of the treatment among the population of these N subjects, 1 N ACE rTsi rCsi , by the differences between the sample N i 1 mean of the outcomes in the treated group and the sample mean of the outcomes in the control group, N N 1 1 ˆ ACE (1 Z i ) Ri . Zi Ri N m m i 1 i 1 ˆ is unbiased estimator of ACE . Proposition: ACE Proof: Taking the expectation over the distribution of N equally likely random assignments, we have m 6 N 1 1 N ˆ E ( ACE ) E Z i Ri (1 Z i ) Ri N m i 1 m i 1 N 1 1 N E Z i rTsi (1 Z ) r i Csi N m i 1 m i 1 N 1 N 1 ( m / N ) rTsi ( N m) / N rCsi m i 1 N m i 1 1 N rTsi rCsi N i 1 ■ Comment: The proposition says that a randomized experiment provides an unbiased estimate of the mean treatment effect 1 N among the subjects in the study, N rTsi rCsi . It follows that i 1 if the units are randomly sampled from an infinite population, a randomized experiment provides an unbiased estimate of the mean treatment effect in the population over repeated experiments (where each experiment consists of randomly sampling the units and then randomly assigning the sampled units to the treatments). An unbiased estimate of the median treatment effect in the population cannot be obtained. To see this, consider two populations of units, one in which P(r 6, r 4) 1/ 3, P(r 8, r 6) 1/ 3, P( r 10, r 8) 1/ 3 , and another in which Ti Ci Ti Ci Ti 7 Ci P(rTi 10, rCi 4) 1/ 3, P(rTi 8, rCi 8) 1/ 3, P( rTi 6, rCi 6) 1/ 3 In the first population, the median treatment effect is 2 while in the second population, the median treatment effect is 0. But the marginal distributions of rCi and rTi are the same for the two populations, so the distribution of the treated and control subject outcomes will be the same in repeated experiments in which the units are randomly drawn from an infinite population. 8