SLE251 Research Methods and Data Analysis MOCK EXAM 2018 SLE251 Research Methods and Data Analysis Trimester 1, 2018 Mock Exam You have 90 minutes Instructions: • Calculators permitted. • Answer all questions. • Section 1 answers would be completed on multiple choice answer sheet. All other answers would be written in your Answer Booklet(s). • • • Total marks = 60 marks in total. Section 1 (multiple choice) = 10 questions – 2 marks each (20 marks) Section 2 (4 multi-part short-answer questions) – marks for each question as indicated in text (40 marks) 1 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 Section 1 (20 marks) Multiple choice: choose the one letter which indicates the most correct answer for each of questions 1 to 10 and mark the appropriate letter for each question on your multiple choice answer sheet (2 marks each): Q1. The Pearson correlation coefficient (r) can be interpreted as (a) (b) (c) a measure of the unexplained variation in the response (Y) variable the significance of the test for zero slope the difference between group means Q2. Which of the following is an assumption of chi-square tests of goodness-of-fit? (a) (b) (c) (d) straight line relationship equal variances data in each cell must be percentages average expected value across all cells is greater than 2 Q3. The assumption of independence in an analysis of variance: (a) (b) (c) (d) is only relevant for untransformed data means that one observation should not influence the value of any other observation only matters if the null hypothesis is true implies that the distribution of the dependent (response) variable must be normal Q4. You get a significant result from the F-ratio test in a one factor ANOVA with four groups. What is considered the best way to accurately determine the differences between the group means? (a) (b) (c) (d) use a Tukey’s multiple comparison test do a correlation test of the relationship between group means and group variances look to see whether the boxplots overlapped for the four groups do a chi-square test goodness-of-fit test Q5. The slope of a regression equation measures the: (a) (b) (c) (d) value of Y when X equals zero unexplained variation in Y correlation between Y and X change in Y for a unit change in X (d) the strength of the linear relationship between two variables 2 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 Section 1 (20 marks) Continued Questions 6 and 7 refer to the same example: You are conducting a study to investigate the effects of a new drug used to treat high blood pressure during pregnancy. You have 50 women who are given the drug, and 50 women who are given a placebo. You also wish to know if there are any differences in the effect of the drug in relation to whether the women had previously given birth: 25 women in each drug treatment group had previously given birth, but for the other 25 women it was their first pregnancy. Each women has their blood pressure measured 10 times during the study period. Q6. What is the appropriate test to establish whether there are differences in blood pressure that are predicted by the drug treatment and pregnancy history? (a) (b) (c) (d) Chi-squared test of heterogeneity Single-factor Analysis of Variance Two-factor Analysis of Variance Linear regression Q7. What is the appropriate independent unit of replication in the above study? (a) (b) (c) (d) The 1000 measures of blood pressure in the study The 100 women in the study The 2 drug treatment groups in the study The 2 pregnancy history groups in the study Q8. In a two-factor Analysis of Variance, a P-value of 0.497 associated with the interaction term indicates: (a) (b) (c) (d) There is no significant effect of either predictor factor on the response variable The effect of the first predictor on the response variable is the same regardless of the state of the second predictor The effect of the first predictor on the response variable is only significant when the effect of the second predictor is also significant The variances of the groupings are not different from each other Q9. When is it appropriate to use a Fisher exact test? (a) (b) (c) (d) When investigating the relationship between two binary (i.e. two-state) categorical variables and you have a small sample size When investigating the relationship between two continuous variables and the scatterplot indicates curvilinearity When the continuous data in an analysis of variance are non-normally distributed When a chi-square test gives you a non-significant result Q10. A researcher collates data from health records of 13,000 Australians, and observes a strongly significant (P<0.001) positive correlation between blood pressure and number of GP 3 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 visits per year. He concludes that visiting a GP causes stress, elevating blood pressure. Why might this be a bad conclusion? (a) (b) (c) (d) Analysis of variance, not correlation, is the appropriate test that should be used The large sample size makes statistical analysis, with P-values, unnecessary Measures of blood pressure are likely to vary between individual GPs, making the results unreliable Correlation does not automatically imply causation, further evidence (experimental or indirect) is needed. 4 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 Section 2 (40 marks) Answer all parts of questions 11 to 14 in your answer book. Marks per question are indicated. Q11. (10 marks) A team of researchers wished to investigate whether being held in captivity affected stress levels in elephants. They compared three different groups of elephants: those found in the wild, those housed in confined enclosures in zoos, and those found in large free-ranging enclosures at safari parks. They took blood samples from drugged elephants and measured corticosterone levels (a hormone that is found in higher concentrations when the animal is stressed) in microliters of corticosterone per litre of blood. The results are presented below: R Output 3000 2500 1500 2000 corticosterone 3500 4000 4500 Results using raw data Safaripark Wild Zoo Conditions mean sd data:n Safaripark 1758.279 336.3332 30 Wild 2889.206 602.5395 30 Zoo 3432.790 2531.3777 30 Levene's Test for Homogeneity of Variance (center = mean) Df F value Pr(>F) group 2 9.3022 0.0002182 *** 87 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 5 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 3.4 3.3 3.2 logcorticosteron 3.5 3.6 Results using log-transformed data 3.1 73 Safaripark Wild Zoo Conditions mean sd data:n Safaripark 3.237201 0.08486613 30 Wild 3.450229 0.10133602 30 Zoo 3.446849 0.13773478 30 Levene's Test for Homogeneity of Variance (center = mean) Df F value Pr(>F) group 2 2.938 0.05825 . 87 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > summary(AnovaModel.2) Df Sum Sq Mean Sq F value Pr(>F) Conditions 2 0.8934 0.4467 36.77 2.66e-12 *** Residuals 87 1.0568 0.0121 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = logcorticosteron ~ Conditions, data = Dataset) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) Wild - Safaripark == 0 0.21303 0.02846 7.486 <1e-06 *** Zoo - Safaripark == 0 0.20965 0.02846 7.367 <1e-06 *** Zoo - Wild == 0 -0.00338 0.02846 -0.119 0.992 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Adjusted p values reported -- single-step method) > cld(.Pairs) # compact letter display Safaripark Wild Zoo "a" "b" "b" 6 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 (a) (b) (c) (d) What null hypothesis was being tested by the ANOVA? (2 marks) The data were log-transformed - why? (2 marks) What conclusions would you draw from the ANOVA F-test and Tukey’s test? (4 marks) Other than homogeneity of variances, what are the other 2 main assumptions about the data that must be met for this ANOVA test to be reliable? (2 marks) Q12. (10 marks) A study was done off Philip Island to test whether the size of starfish varies with the colour of the animal (red, blue, albino) and its sex (male, female). Thirty starfish were randomly chosen within each combination of colour and sex and their size was measured. The results were (bar colour indicates starfish colour): 5.0 Mean body size ez is 4.0 h isf ra 3.0 ts n a 2.0 e M 1.0 Male Female ANOVA Source of variation Colour Sex Colour x Sex Residual (a) (b) (c) df 2 1 1 174 F-ratio 5.7 4.2 5.1 P value 0.015 0.021 0.010 What are the 3 null hypotheses are being tested in the ANOVA? (3 marks) What are the conclusions from the three statistical tests of the null hypotheses? (3 marks) How would you interpret this result, based on the bar graph of means? (4 marks) Q13. (8 marks) A biologist wished to model the relationship between a predictor variable, body weight (measured to nearest 100g), and a response variable, number of eggs produced (measured to nearest 103), for a species of fish called the cabezon (Scorpaenichthys moratus). He sampled 11 fish and recorded body weight and number of eggs spawned for each fish. He then did a linear regression analysis of number of eggs spawned against body weight. The results are presented below. 7 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 R output : (a) (b) What % of the variation in egg production was explained by body weight? (1 mark) Complete the regression equation by filling in the blanks and writing it out fully in your answer book. (2 marks) # eggs = (c) (d) (e) + * body weight Using this regression equation calculate the predicted number of eggs that would be produced by a cabezon weighing 1000 g (1 mark) What two null hypotheses are being tested with the output shown above? (2 marks) What biological conclusion would you draw from this analysis? (2 marks) Q14. (12 marks) A biologist was studying the amount of damage by koalas to juvenile leaves of eucalypts (gum trees). She collected samples of juvenile leaves from a number of different trees of five species (A, B, C, D, E) and recorded the numbers of leaves damaged and not damaged by koalas. The data were: Species No. leaves damaged No. leaves not damaged A 10 40 B 38 8 C 15 48 D 15 35 E 68 19 The c2 analysis output was as follows: 8 of 9 SLE251 Research Methods and Data Analysis MOCK EXAM 2018 Pearson's Chi-squared test data: .Table X-squared = 90.4111, p-value < 2.2e-16 > .Test$expected # Expected Counts 1 2 3 4 5 1 24.66216 22.68919 31.07432 24.66216 42.91216 2 25.33784 23.31081 31.92568 25.33784 44.08784 > round(.Test$residuals, 2) # Chi-square Components 1 2 3 4 5 1 -2.95 3.21 -2.88 -1.95 3.83 2 2.91 -3.17 2.84 1.92 -3.78 (a) (b) (c) (d) How many degrees of freedom are there for this test? (2 marks) What null hypothesis is being tested by the c2 test? (2 marks) What are the assumptions of the c2 test? Are they met in this analysis? (4 marks) What biological conclusions should the ecologist draw from the c2 test and the standardised residuals? (4 marks) END OF MOCK EXAM 9 of 9