Solutions exam 07

advertisement
Exercise 1.
a) Null hypothesis: βpainrel = 0. Alternative hypothesis: βpainrel ≠ 0. A p-value is the
probability of observing the data in the sample, or something more extreme, if the
null hypothesis is true. The effect of treatment is significant, and the regression
coefficient means that patients using pain reliever 2 scores 32 units higher on the
pain scale than patients using pain reliever 1.
b) A 4 year increase in age yields a 0.637*4=2.6 units increase in pain. Since the
confidence interval does not include 0, we reject the null hypothesis.
c) The constant is the pain score at hospital 1 (the baseline hospital). The coefficient
for Hospital 2 means that the pain scores are 22.8 units higher in hospital 2 than in
Hospital 1. It does not seem necessary to have two dummies here, since the
difference in the regression coefficients of hospital 2 and 3 is very small (22.8 vs.
24.4, but note that you don’t get a p-value for this difference in the output). One
could consider collapsing the groups hospital 2 and 3 into one group.
d) The constant is now the pain level for a 0 year old patient in hospital 1, using pain
reliever 1. The prediction yields: 34.8+0.006*70+1.79*1+30.83*0=37.0. The
regression coefficients for age and hospital have changed due to the fact that they
are correlated to other independent variables in the model. This is called
confounding, but it is still difficult to say exactly what causes this confounding.
Age and hospital can be correlated to each other, but also to pain reliever.
e) You can test whether you have autocorrelated errors (or residuals). This would
mean that the residuals were correlated, and one of the basic assumptions in linear
regression is that they are not correlated. The number in the table is the value of a
Durbin-Watson test statistic (i.e. a certain boundary in the Durbin-Watson
distribution, just as 1.96 is a boundary in the standard normal distribution).
f) From the regression analysis of pain reliever, you know that pain reliever 1
appears to be more effective than pain reliever 2. From the additional information,
you see that pain reliever 1 is only used in hospital 1, not in the others. Hence,
hospital will seem important in the simple regression, but not in the multiple
regression. Also, pain reliever 1 is given to younger patients than pain reliever 2.
Again, it will appear that age has an effect on pain level in the simple regression,
but not in the multiple regression, where you adjust for type of pain reliever.
Exercise 2.
a) Null hypothesis: pnewtreat = poldtreat . Alternative hypothesis: pnewtreat ≠ poldtreat. In order to
calculate the test statistic, we need the common proportion (see p. 382 in textbook):
80 * 0.15  80 * 0.05
 0.1 .
160
0.15  0.05
Test statistic:
 2.11 . The test statistic is approximately
0.1 * (1  0.1) 0.1 * (1  0.1)

80
80
standard normal, and should be compared to the 0.025% and 97.5% percentiles of the
standard normal distribution, +/-1.96. Hence, the conclusion is that the new medicine is
significantly better than the old one. You can get the same conclusion if you use a Chisquare test instead!!
b) It is not very meaningful to give the confidence intervals for each group proportion
individually, since we are really testing for the difference in group proportions. Hence, a
better presentation would be to give the confidence interval for the difference. That way
one can easily get a picture on the uncertainty and magnitude of the difference between
the two groups!! Another thing is to state the actual p-value, not just writing p<0.05.
c) Chi-square test.
d) Null hypothesis: μnewtreat = μoldtreat . Alternative hypothesis: μnewtreat ≠ μoldtreat.. Test
51500  51000
statistic:
 10 . Have 158 d.f., and find that we should reject H0 if the
100000 100000

80
80
test statistic is smaller than -1.96 or larger than +1.96 (the t-distribution is approx normal
for this many d.f.), and find that we should reject the null hypothesis. This is surprising,
since we only observe a mean difference of 500 kroner, which is nothing as long as
patients cost more than 50000 kroner to treat. This is due to the fact that we have a fairly
large data set (80 in each group), and that the pooled variance actually is pretty small
compared to the magnitude of the costs. Of course, one would recommend the new
treatment, since the difference in costs is so small (even though it is significantly more
expensive!!)
e) One has to check that the variances are the same in both groups, since we use the
pooled variance in the test above. This is done by a test which follows the F-distribution.
f) Mann-Whitney test (=Wilcoxon rank sum, NOT signed rank test).
Download