2003 examination, question 6 (i) The two resampling methods she might use are randomisation and bootstrapping. good -- I’d probably give two marks each ie 4 of the 20 available for this part just for getting the two names correct (a) To randomise the influences on the cognitive test score, she would first create a table with column headings for: Subject Number; Age Group (Old/Young); Gender (Male/Female); Test Score. good -- this is actually explaining the mechanics of what she would have to do She would then run an analysis with a large number of samples, say 10,000. good -- shows the need for large numbers -- might say why (ie to get reliable distribution data for use later on) For each sample she would take each subject and retain the Age Group and Gender, but would allocate a Test Score at random without replacement (i.e. the test scores would be shuffled amongst the subjects). good -- might emphasize that this shuffling is at random Over the 10,000 samples this would approximately reproduce the distribution of test scores that would occur under the complete null hypothesis, i.e. that test score was not related to age group or gender. excellent -- explaining the logic; might really make this clear by saying that, even if age and or gender affected performance, then shuffling the observations across these classifications would redistribute the age and or gender effects randomly across the groupings so removing the systematic effects of these IVs For each sample she would calculate statistics for her hypotheses of interest: A main effect of Age Group, which could be calculated as (mean of Old group – mean of Young group)/(overall standard deviation) An interaction of Age Group × Gender, which could be calculated as [(mean of Old Males – mean of Young Males) – (mean of Old Females – mean of Young Females)]/(overall standard deviation) (There are other statistics which could be calculated for each effect, but they would give equivalent p values.) yes -- you could use these statistics or more “simply” calculate the usual F statistics for the main effects and interactions; clarify that it doesn’t much matter what stats you decide to test as long as they are affected by the hypotheses under test; (the reason the researcher used the resampling method is because she was worried that the F-statistics would not be distributed as an F distribution and so the p values would be biased!) For each of these effects, she would arrange the randomised statistics in order of size, and read off the values at the 2.5th and 97.5th percentile (assuming that she is using a two tailed hypothesis). if you had used an F statistic then these are one-tailed tests If the results in her sample fall outside those limits for either or both hypotheses, she can reject the appropriate null hypothesis at an alpha level of .05. Alternatively, she could read off the percentile in the randomised statistics at which her observed value fell (doubling the figure for a two tailed hypothesis) and quote this as the observed p level. excellent (b) to randomise each of the correlations, she would select only the Old subjects, as those are the only ones covered by her hypothesis. She would create a table with the following headings: Subject Number; Test Score; Variable A (where variable A is a variable of interest). Similarly to the previous analysis she would run a large number of samples. For each sample she would take each subject and retain their Test Score, and shuffle the scores on Variable A. As before, this reproduces the distribution under the null hypothesis, where the Test Score is not related to Variable A. Also as before, she would find the 2.5th and 97.5th percentile readings for the correlation. If her observed correlation was outside these limits, she would reject the null hypothesis and conclude that there was a significant correlation at the 5% level. excellent again (c) To bootstrap the influences on the cognitive test score, she would sample from a pseudopopulation for each of the cells of her design separately (WE THINK), i.e. one each for Old Males, Young Males, Old Females, Young Females. the answer below for bootstrapping is correct but involves the more complex type of bootstrapping which I didn’t teach you (but which is covered in Howell’s stats pages on his website). I would not expect you to know this. What I taught you about bootstrapping is a simpler alternative. Howell does have this in one of his examples but raises some issues with it. Basically it is just like randomisation but you do the randomisation with replacement rather than without. To do this she would start with the table she created for the randomisation ((a) above). Again, she would run a large number of samples (e.g. 10,000) but this time she would proceed as follows for each sample. For each cell, she would sample, with replacement, the same number of cases that were in the observations for that cell. For example, if the cases for Old Males were numbered 26-50, she would generate 25 random numbers between 26 and 50 inclusive, with no restrictions on how many times each number could occur. Each time, she would take the observations for the case with that number and include them in the randomised cell. this aspect is correct here but is where the critical difference occurs wrt the bootstrapping method I taught you -- bootstrapping here leaves the effect “in” (by randomising within IV categories) whereas my method removes it (by randomisation across IV categories) Having done this for each cell, she would calculate her statistics of interest, in the same way as for the randomisation. Once again, for each statistic she would arrange her 10,000 bootstrapped values in order of size and find the 2.5th and 97.5th percentiles. However, this time the figures give confidence limits on the true value of the statistic (the parameter). the common method for bootstrapping (not what you were taught) does raise some issues for the statistic one chooses and calculates, in that it needs to have a null hypothesis value, so that one can tell if the confidence intervals embrace that value. So you couldn’t really use the F statistic as the F values for the main effects and the interactions under the null hypothesis are not zero (and what they are is, to me, slightly unclear, without parametric assumptions). Opinions vary on whether the confidence limits should be given as calculated (e.g. Efron) or whether the range of the confidence limits should be reversed (e.g. Lunneberg). This argument arises because the bootstrapped figures are the distribution obtained by repeated sampling from the observed sample, which Lunneberg argues is displaced from the distribution which would be obtained from the true population. To follow Lunneberg’s method, one would proceed as follows: Subtract the bootstrapped 2.5% figure from the observed figure; call this a Subtract the observed figure from the bootstrapped 97.5% figure; call this b Report the confidence limits as being: o Lower limit: observed figure minus b o Upper limit: observed figure minus a. The null hypothesis is rejected if the confidence limits do not include 0, the value that the statistic would have if the null hypothesis were true. this is good but too complex a topic for an MSc exam which is why i didn’t get into it and why it is not needed here (d) To bootstrap each of the correlations, she would use the table she created in (b) above and again create a large number of samples. In each sample she would generate n random numbers from 1 to n (with replacement, i.e. no restrictions on how many times a given number can appear) where n is the number of subjects in the Old group. Her sample would contain the observations for the cases with these numbers. As in (c), she would calculate the correlation for each sample and arrange the correlations in order of size, to read off the confidence limits. Again, the null hypothesis is rejected if the confidence limits do not include zero, since this statistic also has a value of zero under the null hypothesis. this part of bootstrapping is the same (single sample) I’d give this answer about 20 of the 20 marks available (ii) For the influences on cognitive performance, she could use a method due to Conover and Iman. good this is what I was after -- it is always good to state at the start of your answer something which tells me you know what you are doing! She would take all the observed scores and rank them from 1 to N (where N is the sample size) regardless of group. She would then carry out an ANOVA with the ranks as the DV (and of course Gender and Age Group as the IVs). She would report the p level from this analysis. Note, however, that any effect sizes reported from the ANOVA relate to the ranks, not to the scores. good For the correlation, she could calculate Spearman’s rho in the normal manner. This is a nonparametric test (which is equivalent to calculating Pearson’s r on the ranks of the observations). excellent although brief this is good at least 8 out of 10 I suppose for 10/10 you could perhaps state that although ranking creates a nonnormal distribution of ranks, the work of people like Conover and Iman shows that many parametric methods (such as ANOVA) are robust enough to deal with the kind of uniform distributions produced by ranks. (iii) She could have created a table showing: Subject Number, Score on Test 1, Score on Test 2… Score on Test 9. For each subject she could have retained their score on Test 1 but shuffled the scores (i.e. randomised them without replacement) on the other tests between subjects. As before, she would do a separate shuffle for each of the 10,000 randomisations. Also as before, this replicates the situation under the null hypothesis where each score is unrelated to the others. good once again this is what i was after -- it is perfectly Ok to refer back to previous parts of the question without repeating yourself. maybe could have said “as described in part (i)” at least 9/10 for this part (iv) In the percentiles table, the 95% column shows the eigenvalues that would be obtained for each component under the null hypothesis, i.e. if there were no real relationship and any component derived were just due to chance relationships between variables, at an alpha level of 5%. good The eigenvalues for components 1-3 maybe should just show me that you have read the right values (2.6, 2.0 and 1.3) exceed these critical values, again state that the 95% for eigenvalue number 3 is 2.7 (and we are looking only for eigenvalues above this) so the null hypothesis can be rejected in relation to them. Those for 4-9 do not exceed the critical values, and hence the null hypothesis cannot be rejected for them. It is therefore concluded that (at an alpha level of .05, uncorrected) three components are supported by the randomisation. excellent -- that is pretty much all there is to say thus 9/10 so the total answer is almost 100% -- an extremely strong distinction level answer (shame I can’t give extra credit fro all the work in going beyond what I taught you about bootstrapping)