Suggested solutions for the konte

Suggested solutions to the exam
Exercise 1: a) Significance level: The probability of rejecting a null hypothesis that is
true. If it is 5%, that means the upper probability of falsely rejecting a true null
hypothesis is 5%. If the p-value is above the sig-level, we cannot reject the null
hypothesis, since the probability of falsely rejecting it is then above 5%. If the p-value is
below 5%, however, we safely reject the null hypothesis.
b) The paired-samples t-test is ok, because it is based on the differences between the two
measurements. They may well be independent, since the individuals are randomly
sampled. Assumptions: The differences should be independent and normally distributed.
c) H0: No difference between before and after measurements. Alternative: There is a
difference between before and after measurements.
d  0 120.5  101.9
 8.09
sd / n
12.6 / 30
We have 30-1=29 d.f., and find that we should reject H0 if the test-statistic is larger than
2.045 from the table. Hence, we reject H0, and have shown that there is an effect of Drug
Exercise 2: a) The line is the median of the blood pressure data (the observation for
which 50% of the other observations are smaller and 50% are larger). The lower edge is
the 25%-quartile, and the upper edge is the 75%-quartile. The blood pressure
measurements look normally distributed, since the median is around the middle of the
b) The Pearson correlation is 0.763. The regression coefficient tells us that for every year
older, blood pressure decrease by 1.23mmHg (note: It is true that blood pressure usually
increases with age, but remember that in this study, all subjects had blood pressure
around 150 to begin with. The outcome is measured after a month’s use of the drugs, and
it is difficult to say in advance what the effect of age would be in this study). The p-value
tells us that this is significant (it is below 0.001). Difference between a 45-year-old and a
57-year-old: 194-1.23*45-(194-1.23*57)=14.8. Hence, the blood pressure for a 45-yearold is 14.8 units higher in this sample.
c) The blood pressure for females is 2.6mmHg lower than for males, but this is not
significant (p-value 0.513). By using the t-distribution with 60-2=58 d.f., we get the 95%
confidence interval:
-2.6+2*3.9=(-10.4, 5.2).
d) Mean blood pressure for patients using Drug A: 111.8. Drug B: 111.8+23.0=134.8.
Hence, since all patients started out with blood pressure around 150, Drug A seems to
work better than Drug B.
e) In this analysis, the effect of Drug is controlled for Gender and Age. Hence, when
saying that patients using Drug B have 13.7mmHg higher blood pressure compared to
those using Drug A in this analysis, it is conditional on comparing two patients of the
same gender and age. Predicted blood pressure of a 60-year-old male using drug A:
159.8+13.7*0-1.67*0-0.75*60=114.1. If the regression coefficient for Drug had been
identical from the univariate to the multivariate analysis, Drug had to be independent of
Gender and Age.
f) The tests are an independent samples t-test, and a Mann-Whitney/Wilcoxon test. For
the standard t-test, you assume data from two normally distributed, independent groups,
with the same variance in each group. For the Mann-Whitney test, you assume data from
two independent groups, with symmetric distributions for each group. For the t-test, you
should read the first line, since the variance in both groups is the same (comes from the
test on equal variances, where the null hypothesis is that the variances are indeed
identical. The p-value for this test is 0.807, hence we keep H0).
g) The histograms show that the first group is slightly skewed to the left, and the second
group is slightly skewed to the right. However, there are no serious deviations from the
normal distribution, so I use the t-test (can also be argued that we should use the MannWhitney test). The p-value for this test is 0.134, so there is no difference in costs between
the two drugs. Based on the total results from the study, it seems like Drug A is the best
to use.
h) Costs would be ordinal, since it is divided into ordered categories (there is no point in
putting e.g. the category 3500-4500kr before the category 2500-3500kr). The data could
then be analyzed be a Chi-square test.