Suggested solutions to the exam Exercise 1: a) Significance level: The probability of rejecting a null hypothesis that is true. If it is 5%, that means the upper probability of falsely rejecting a true null hypothesis is 5%. If the p-value is above the sig-level, we cannot reject the null hypothesis, since the probability of falsely rejecting it is then above 5%. If the p-value is below 5%, however, we safely reject the null hypothesis. b) The paired-samples t-test is ok, because it is based on the differences between the two measurements. They may well be independent, since the individuals are randomly sampled. Assumptions: The differences should be independent and normally distributed. c) H0: No difference between before and after measurements. Alternative: There is a difference between before and after measurements. d 0 120.5 101.9 8.09 Test-statistic: sd / n 12.6 / 30 We have 30-1=29 d.f., and find that we should reject H0 if the test-statistic is larger than 2.045 from the table. Hence, we reject H0, and have shown that there is an effect of Drug A. Exercise 2: a) The line is the median of the blood pressure data (the observation for which 50% of the other observations are smaller and 50% are larger). The lower edge is the 25%-quartile, and the upper edge is the 75%-quartile. The blood pressure measurements look normally distributed, since the median is around the middle of the box. b) The Pearson correlation is 0.763. The regression coefficient tells us that for every year older, blood pressure decrease by 1.23mmHg (note: It is true that blood pressure usually increases with age, but remember that in this study, all subjects had blood pressure around 150 to begin with. The outcome is measured after a month’s use of the drugs, and it is difficult to say in advance what the effect of age would be in this study). The p-value tells us that this is significant (it is below 0.001). Difference between a 45-year-old and a 57-year-old: 194-1.23*45-(194-1.23*57)=14.8. Hence, the blood pressure for a 45-yearold is 14.8 units higher in this sample. c) The blood pressure for females is 2.6mmHg lower than for males, but this is not significant (p-value 0.513). By using the t-distribution with 60-2=58 d.f., we get the 95% confidence interval: -2.6+2*3.9=(-10.4, 5.2). d) Mean blood pressure for patients using Drug A: 111.8. Drug B: 111.8+23.0=134.8. Hence, since all patients started out with blood pressure around 150, Drug A seems to work better than Drug B. e) In this analysis, the effect of Drug is controlled for Gender and Age. Hence, when saying that patients using Drug B have 13.7mmHg higher blood pressure compared to those using Drug A in this analysis, it is conditional on comparing two patients of the same gender and age. Predicted blood pressure of a 60-year-old male using drug A: 159.8+13.7*0-1.67*0-0.75*60=114.1. If the regression coefficient for Drug had been identical from the univariate to the multivariate analysis, Drug had to be independent of Gender and Age. f) The tests are an independent samples t-test, and a Mann-Whitney/Wilcoxon test. For the standard t-test, you assume data from two normally distributed, independent groups, with the same variance in each group. For the Mann-Whitney test, you assume data from two independent groups, with symmetric distributions for each group. For the t-test, you should read the first line, since the variance in both groups is the same (comes from the test on equal variances, where the null hypothesis is that the variances are indeed identical. The p-value for this test is 0.807, hence we keep H0). g) The histograms show that the first group is slightly skewed to the left, and the second group is slightly skewed to the right. However, there are no serious deviations from the normal distribution, so I use the t-test (can also be argued that we should use the MannWhitney test). The p-value for this test is 0.134, so there is no difference in costs between the two drugs. Based on the total results from the study, it seems like Drug A is the best to use. h) Costs would be ordinal, since it is divided into ordered categories (there is no point in putting e.g. the category 3500-4500kr before the category 2500-3500kr). The data could then be analyzed be a Chi-square test.