THE UNIVERSITY OF NEW SOUTH WALES DEPARTMENT OF STATISTICS Exercises for MATH3811, Statistical Inference Contingency Tables 1. A sample of 46 students randomly selected from private high schools and a sample of 82 students randomly selected from public high schools were given standardized achievement tests with the following results: Score 0-275 Score 276-350 Score 351-425 Score 426-500 Private School 6 14 17 9 Public School 30 32 17 3 Test the null hypothesis that the distribution of test scores is the same for private and public high school students at α = 0.05 against a two-sided alternative. 2. The following (3 × 3) contingency table is of 500 psychiatric patients classified by their degree of depression and suicidal tendencies. Observed Frequencies Not Depressed Moderately Depressed Severely Depressed Attempted Suicide 26 39 39 Contemplated Suicide 20 27 27 Neither 195 93 34 a) Assuming independence between the two factors, calculate the expected frequencies b) Carry out a test for independence 3. In an experiment to study the dependence of hypertension on smoking habits, the following data were taken on 180 individuals: Non Smokers Moderate Smokers Heavy Smokers Hypertension 21 36 30 No Hypertension 48 26 19 Test the hypothesis that the presence or absence of hypertension is independent of smoking habit. 4. The following table was obtained as a result of Random Breath Testing during the Easter Holidays in NSW: Tested and Charged Tested and Not charged Area: Sydney 99 11669 Area: country 115 15299 Test whether there is any association between area of testing and outcome of random breath test during such a holiday period. 5. The following 2 × 3 × 4 contingency table presents data from the study of coronary heart disease. The first factor concerns the presence or absence of coronary heart disease, the second factor is serum cholesterol at three levels, and the third variable is systolic blood pressure with four categories. Coronary Disease Serum Cholesterol Level (mg/100cc) Pressure <127 Pressure: 127-146 Pressure: 147-166 Pressure >166 < 200 2 3 3 4 Present 200-259 11 13 6 9 ≥ 260 7 12 11 11 < 200 117 121 47 22 Absent 200-259 204 307 111 63 ≥ 260 67 99 46 33 Construct appropriate two-way contingency tables and test the following three hypotheses: 1 a) H01 : No association between coronary disease and systolic blood pressure. b) H02 : No association between serum cholesterol and systolic blood pressure c) H03 : No association between coronary disease and serum cholesterol. 6. Two batches of 15 experimental animals each, with one batch inoculated and the other not inoculated, were exposed to infection under comparable conditions. The results of the experiment are given in the following (2 × 2) table: Frequencies Died Survived Inoculated 5 10 Not Inoculated 9 6 On the hypothesis of independence, calculate the probability of a result at least extreme as that actually observed by: a) The Chi-squared test without continuity correction. b) The Chi-squared test with Yates’s continuity correction. c) Is Fisher’s exact test suitable to be applied here? Give reasons for your answer. d) Do you accept the hypothesis of independence? 7. Fourteen newly hired business majors, 10 males and 4 females, all equally qualified, are being assigned by the bank president to their new jobs. Ten of the new jobs are tellers, and four are account representatives. The null hypothesis is that males and females have equal chances at getting the more desirable account representative job. The one-sided alternative of interest is that females are more likely than males to get the account representative jobs. Only one female is assigned a teller position. Using Fisher’s exact test, can you reject the null hypothesis against in favour of the onesided alternative? Answers 1. df= 3, Q = 17.3 > 7.815, reject H0 . 2. df= 4, Q = 71.47 > 9.49, reject H0 . 3. df= 2, Q = 14.464 > 5.99, reject H0 . 4. df= 1, Q = 0.774 < 3.84, accept H0 . 5. a) Q = 64.6641, df = 3, reject H0 ; b) Q = 19.48, df = 6, reject H0 ; c) Q = 31.3893, df = 2, reject H0 . 6. a) Q = 2.1429, df = 1, p-value= .1432. b) Q = 1.2054, df = 1, p-value=.2723; c) Not suitable. 7. p-value = 0.041, hence the hypothesis is to be rejected at α = 0.05. 2