2003 examination, question 6

advertisement
2003 examination, question 6
(i)
The two resampling methods she might use are randomisation and bootstrapping.
good -- I’d probably give two marks each ie 4 of the 20 available for this part just for
getting the two names correct
(a) To randomise the influences on the cognitive test score, she would first create a
table with column headings for:
Subject Number; Age Group (Old/Young); Gender (Male/Female); Test Score.
good -- this is actually explaining the mechanics of what she would have to do
She would then run an analysis with a large number of samples, say 10,000.
good -- shows the need for large numbers -- might say why (ie to get reliable
distribution data for use later on)
For each sample she would take each subject and retain the Age Group and Gender,
but would allocate a Test Score at random without replacement (i.e. the test scores
would be shuffled amongst the subjects).
good -- might emphasize that this shuffling is at random
Over the 10,000 samples this would approximately reproduce the distribution of test
scores that would occur under the complete null hypothesis, i.e. that test score was not
related to age group or gender.
excellent -- explaining the logic; might really make this clear by saying that, even if
age and or gender affected performance, then shuffling the observations across these
classifications would redistribute the age and or gender effects randomly across the
groupings so removing the systematic effects of these IVs
For each sample she would calculate statistics for her hypotheses of interest:
 A main effect of Age Group, which could be calculated as (mean of Old
group – mean of Young group)/(overall standard deviation)
 An interaction of Age Group × Gender, which could be calculated as
[(mean of Old Males – mean of Young Males) – (mean of Old Females –
mean of Young Females)]/(overall standard deviation)
(There are other statistics which could be calculated for each effect, but they would
give equivalent p values.)
yes -- you could use these statistics or more “simply” calculate the usual F statistics
for the main effects and interactions; clarify that it doesn’t much matter what stats you
decide to test as long as they are affected by the hypotheses under test; (the reason the
researcher used the resampling method is because she was worried that the F-statistics
would not be distributed as an F distribution and so the p values would be biased!)
For each of these effects, she would arrange the randomised statistics in order of size,
and read off the values at the 2.5th and 97.5th percentile (assuming that she is using a
two tailed hypothesis).
if you had used an F statistic then these are one-tailed tests
If the results in her sample fall outside those limits for either or both hypotheses, she
can reject the appropriate null hypothesis at an alpha level of .05. Alternatively, she
could read off the percentile in the randomised statistics at which her observed value
fell (doubling the figure for a two tailed hypothesis) and quote this as the observed p
level.
excellent
(b) to randomise each of the correlations, she would select only the Old subjects, as
those are the only ones covered by her hypothesis. She would create a table with the
following headings:
Subject Number; Test Score; Variable A
(where variable A is a variable of interest).
Similarly to the previous analysis she would run a large number of samples. For each
sample she would take each subject and retain their Test Score, and shuffle the scores
on Variable A. As before, this reproduces the distribution under the null hypothesis,
where the Test Score is not related to Variable A. Also as before, she would find the
2.5th and 97.5th percentile readings for the correlation. If her observed correlation was
outside these limits, she would reject the null hypothesis and conclude that there was
a significant correlation at the 5% level.
excellent again
(c) To bootstrap the influences on the cognitive test score, she would sample from a
pseudopopulation for each of the cells of her design separately (WE THINK), i.e. one
each for Old Males, Young Males, Old Females, Young Females.
the answer below for bootstrapping is correct but involves the more complex type of
bootstrapping which I didn’t teach you (but which is covered in Howell’s stats pages
on his website). I would not expect you to know this. What I taught you about
bootstrapping is a simpler alternative. Howell does have this in one of his examples
but raises some issues with it. Basically it is just like randomisation but you do the
randomisation with replacement rather than without.
To do this she would start with the table she created for the randomisation ((a) above).
Again, she would run a large number of samples (e.g. 10,000) but this time she would
proceed as follows for each sample.
For each cell, she would sample, with replacement, the same number of cases that
were in the observations for that cell. For example, if the cases for Old Males were
numbered 26-50, she would generate 25 random numbers between 26 and 50
inclusive, with no restrictions on how many times each number could occur. Each
time, she would take the observations for the case with that number and include them
in the randomised cell.
this aspect is correct here but is where the critical difference occurs wrt the
bootstrapping method I taught you -- bootstrapping here leaves the effect “in” (by
randomising within IV categories) whereas my method removes it (by randomisation
across IV categories)
Having done this for each cell, she would calculate her statistics of interest, in the
same way as for the randomisation. Once again, for each statistic she would arrange
her 10,000 bootstrapped values in order of size and find the 2.5th and 97.5th
percentiles.
However, this time the figures give confidence limits on the true value of the statistic
(the parameter).
the common method for bootstrapping (not what you were taught) does raise some
issues for the statistic one chooses and calculates, in that it needs to have a null
hypothesis value, so that one can tell if the confidence intervals embrace that value.
So you couldn’t really use the F statistic as the F values for the main effects and the
interactions under the null hypothesis are not zero (and what they are is, to me,
slightly unclear, without parametric assumptions).
Opinions vary on whether the confidence limits should be given as calculated (e.g.
Efron) or whether the range of the confidence limits should be reversed (e.g.
Lunneberg). This argument arises because the bootstrapped figures are the
distribution obtained by repeated sampling from the observed sample, which
Lunneberg argues is displaced from the distribution which would be obtained from
the true population.
To follow Lunneberg’s method, one would proceed as follows:
 Subtract the bootstrapped 2.5% figure from the observed figure; call this a
 Subtract the observed figure from the bootstrapped 97.5% figure; call this b
 Report the confidence limits as being:
o Lower limit: observed figure minus b
o Upper limit: observed figure minus a.
The null hypothesis is rejected if the confidence limits do not include 0, the value that
the statistic would have if the null hypothesis were true.
this is good but too complex a topic for an MSc exam which is why i didn’t get into it
and why it is not needed here
(d) To bootstrap each of the correlations, she would use the table she created in (b)
above and again create a large number of samples.
In each sample she would generate n random numbers from 1 to n (with replacement,
i.e. no restrictions on how many times a given number can appear) where n is the
number of subjects in the Old group. Her sample would contain the observations for
the cases with these numbers. As in (c), she would calculate the correlation for each
sample and arrange the correlations in order of size, to read off the confidence limits.
Again, the null hypothesis is rejected if the confidence limits do not include zero,
since this statistic also has a value of zero under the null hypothesis.
this part of bootstrapping is the same (single sample)
I’d give this answer about 20 of the 20 marks available
(ii)
For the influences on cognitive performance, she could use a method due to Conover
and Iman.
good this is what I was after -- it is always good to state at the start of your answer
something which tells me you know what you are doing!
She would take all the observed scores and rank them from 1 to N (where N is the
sample size) regardless of group. She would then carry out an ANOVA with the
ranks as the DV (and of course Gender and Age Group as the IVs). She would report
the p level from this analysis. Note, however, that any effect sizes reported from the
ANOVA relate to the ranks, not to the scores.
good
For the correlation, she could calculate Spearman’s rho in the normal manner. This is
a nonparametric test (which is equivalent to calculating Pearson’s r on the ranks of the
observations).
excellent
although brief this is good at least 8 out of 10
I suppose for 10/10 you could perhaps state that although ranking creates a nonnormal distribution of ranks, the work of people like Conover and Iman shows that
many parametric methods (such as ANOVA) are robust enough to deal with the kind
of uniform distributions produced by ranks.
(iii)
She could have created a table showing:
Subject Number, Score on Test 1, Score on Test 2… Score on Test 9.
For each subject she could have retained their score on Test 1 but shuffled the scores
(i.e. randomised them without replacement) on the other tests between subjects. As
before, she would do a separate shuffle for each of the 10,000 randomisations. Also
as before, this replicates the situation under the null hypothesis where each score is
unrelated to the others.
good once again this is what i was after -- it is perfectly Ok to refer back to previous
parts of the question without repeating yourself. maybe could have said “as described
in part (i)”
at least 9/10 for this part
(iv)
In the percentiles table, the 95% column shows the eigenvalues that would be
obtained for each component under the null hypothesis, i.e. if there were no real
relationship and any component derived were just due to chance relationships
between variables, at an alpha level of 5%.
good
The eigenvalues for components 1-3
maybe should just show me that you have read the right values (2.6, 2.0 and 1.3)
exceed these critical values,
again state that the 95% for eigenvalue number 3 is 2.7 (and we are looking only for
eigenvalues above this)
so the null hypothesis can be rejected in relation to them.
Those for 4-9 do not exceed the critical values,
and hence the null hypothesis cannot be rejected for them.
It is therefore concluded that (at an alpha level of .05, uncorrected) three components
are supported by the randomisation.
excellent -- that is pretty much all there is to say thus 9/10
so the total answer is almost 100% -- an extremely strong distinction level answer
(shame I can’t give extra credit fro all the work in going beyond what I taught you
about bootstrapping)
Download