Model Answers Summer exams 2003

advertisement
STATS MODEL ANSWERS
Model Answers Summer exams 2003
SECTION A
Q1
Topic: Factor Analysis
(i)
[10% of marks] There are a number of differences: PCA derives components
(which are composite “summary” variables formed from linear combinations of
the measured variables) rather than factors (FA assumes latent unobserved
variables which cause the variations in the observed variables); PCA analyses all
variance (shared variance between variables, error variance and systematic
variance unique to a particular variable) whereas FA analyses only covariance
(ie just the shared variance); the previous point means that if as many
components are retained as there are variables PCA exactly reproduces the data
whereas FA merely approximates it; mathematically in PCA each variable
contributes a unit of variance by placing a 1 on the leading diagonal of the
variance-covariance matrix, whereas FA starts by estimating communalities and
placing them on the leading diagonal (an estimate is made using the squared
multiple correlation of for each variable as predicted by all the other variables).
(ii)
[30% of marks]

Varimax rotation is a form of orthogonal rotation of the solution (ie all
factors are uncorrelated) which is designed to maximizes the variance of
the factor loadings over the variables (hence the name). This simplifies the
factors by having either high or loading variables and avoiding variables
with mid-loadings. This makes factor interpretation/labelling easier (hence
popularity of method).

A scree plot is a method for graphically determining the number of factors
to be retained in the analysis. It is achieved by plotting the eigenvalues
(which reflect the amount of total variance/covariance explained by the
factor) for each factor in size order. The number of factors or components
to be reatianed is determined by the elbow in the plot (where the size of
the eigenvalues changes relatively little from one fator to another).
Accurate to +/- a factor or so. Some debate about whether to include factor
at elbow or not.

Exclusion of cases listwise means that if there are a set of variables to be
used in the PCA or FA then for a case’s data to be included that case must
contribute a datapoint for each variable. If the case has missing data for
any variable the case is deleted. The alternative would be to estimate the
correlation matrix (for FA or PCA) based on the maximum amount of data
for each variable concerned.

KMO measure of sampling adequacy (MSA) and Bartlett’s test of
sphericity are means for establishing the factorisability (or factorability) of
a correlation matrix. In other words for checking whether there are
meaning ful relationships between subsets of the variables which can
cluster into factors/components. KMO has to be >0.6 to indicate

factorisability. Bartlett’s test being significant means that the hypothesis
that there are no factors can be rejected but this test is overly sensitive.
The anti-image correlation matrix (AICM) is another means for
determining factorisability. To get the off-diagonal elements of the AICM
one calculates the partial correlation between variable X and Y partialling
out all the other variables (and then multiply this correlation by -1). Even
if X and Y are related then, if other variables covary with X and Y (ie can
form a factor with X and Y), the partial correlation between X and Y will
be small. So the off-diagonal elements of the AICM should be zero. The
KMO sampling adequacy measures for each variable are put on the
diagonal of the AICM and these values should be as close to 1 as possible.
(iii) [30% of marks]
 The KMO MSA is above 0.6 which means that the matrix is factorisable, the same
conclusion is indicated by the less useful Bartlett’s test (which is significant and
thus rejects the null hypothesis that there are no factors). Notice that the MSAs for
the individual variables along the leading diagonal of the AICM are between
0.529 and 0.787, values which are reasonably close to 1. The off diagonal values
of the AICM are mostly close to zero, again indicating factorisability of the matrix
although there is a large negative value -0.557 for EPQ-N and CogDis schizotypy.
These variables -- and only these variables among the 7 in the analysis -- are
expected to load on Eysenck’s neuroticism factor and so little of the shared
variance between EPQ-N and CD schizotypy will be accounted for by other
variables, thereby leaving the AICM partial correlation value large (and negative
because multiplied by -1). A similar explanation can account for the large positive
AICM correlation between Extraversion and IA schizotypy measure (E and IA are
negatively correlated and the only variables which load on Eysenck’s extraversion
factor). Having fewer than 3 variables for each factor will naturally produce these
kinds of findings.
 If we look at the eigenvalues for the 7 components that can be extracted (given 7
variables in the PCA) these give us an indication of the proportion of variance in
the data which is captured by the components. The first 3 components explain a
total of 79% of the variance in the variables (38.5%, 25%, and 15% respectively)
with small contributions from the next 4 components (6.5 to 3.5%). This is
reflected in the scree plot which has a sharp elbow at the 4th factor. This indicates
3 or 4 factors (there is debate in text books but logically retaining 3 factors is
indicated as if we retained the 4th component we should retain the 5th as it explains
the same amount of variance as the 4th). Notice that 3 components are predicted by
Eysenck’s theory. Note also that the 3 retained components have eigenvalues >1
(which is another criterion for retention, although these days this is not one which
is considered very useful).
 The rotated component matrix shows which variables load onto which
components. Component 1 is loaded on largely by 4 of the variables (EPQ-N, CD
schizotypy, IN schizotypy and RD schizotypy; absolute values of loadings >0.65
with other variables having [absolute] loadings below 0.12); component 2 is
loaded on by Extrvaersion and IA schizotypy only (absolute loadings >0.87, and
component 3 is loaded on by EPQ-P (0.936) and somewhat by IN schizotypy
(0.53). Thus the solution does not achieve “simple structure” because IN
schizotypy loads on more than 1 component. Nor does the solution fit closely with
Eysenck’s model (component 1 including more variables than expected and not
having a simple overall interpretation).
(iv) [20% of marks].
The second PCA adds two measures of anxiety which should load on the Neuroticism
factor. The obtained solution is a very close fit to Eysenck’s model. Looking at the
rotated component matrix, component 1 has four variables which load on it (loadings
>0.69). These are the two anxiety variables, EPQ-N and CD schizotypy just as
Eysenck would predict. Component 2 has 3 component which load on it (EPQpsychoticism, IN and RD schizotypy) again exactly as Eysenck would predict
(loadings >0.71). The 3rd component has 2 loading variables (Extrvaersion and IA
schizotypy [negative], with absolute loadings >0.81). However, thre are hints of
further factorial complexity (and lack of simple structure) as EPQ-Psychoticism has a
sizeable cross-loading (-0.45) on component 1. Perhaps other variables related to this,
or to the other variables related to the factor on which EPQ-P loads, would simplify
the factor structure further.
(v) [10% of marks]
 Good PCA and FA solutions are ones that are interpretable -- certainly this
qualifies as it is has pretty easy to interpret components, and ones that were
predicted by theory.
 The slight hint of a lack of simple structure (noted above) indicates that further
variables might have been profitably included (this can answer the “suggest
further analyses” part of this question element)
 On the numbers and indices calculated the solution should have a reasonable
chance to behave well: The requirements of sample size, noting that there is no single opinion on this
matter (e.g. Comfrey & Lee = a minimum of 300 cases for a good factor analysis
or ratio of cases to variables - Nunnally 10:1, Guildford 2:1, Barrett &Kline find
2:1 replicates structure while 3:1 is better). Current data look adequate -- 211
subjects and 9 variables
 Could mention ratio of Variables to Factors (e.g. Tabachnik & Fidell 5 or 6:1;
Klein 3:1; Thurstone 3:1; Kim & Mueller 2:1). As hinted at above (3 factors from
9 variables) we might be a bit less satisfactory here.
 MSA is better here than in analysis 1 (0.76), as are the MSAs for the individual
variables.
 Solution still explains 75% of variance with 3 factors even though we’ve added 2
new variables.
 Some of the high AICM off-diagonal values in analysis 1 have been reduced by
addition of extra variables as one would expect (EPQ-N and CD schizotypy now
only -0.37 cf -0.56).
 In sum, this looks like a pretty good PCA
Q2
Topic: Multiple Regression
The answer must lead up to carrying out a hierarchical (aka sequential) multiple
regression. In this approach the 4 control variables (age, SES, road use experience
gender) are all simultaneously entered on block 1 of the analysis and the 4
independent variables of primary interest (anxiety, impulsivity, psychomotor speed,
distance estimation) are subsequently all simultaneously entered in block 2. This
approach is conservative wrt the variables of interest in that all the “credit” for DV
variance explained by the control variables is given to the control variables, and only
unique additional DV variance related to the variables of interest is ascribed to them.
To explain this more fully: It might be the case that one of the variables of interest [eg
distance estimation ability] causally affects the DV and one of the control variables
[eg road use experience]. That portion of the DV would be ascribed to the control
variable rather than to the variable of interest.
 We should penalise anyone who talks about a stepwise or statisical MR. They
have been strongly taught that this technique is unsafe (in terms of significance
levels) and replicability of findings.
 Anyone who mentions a single block forced-entry regression with all 8 predictors
can also get a reasonably good mark as long as they emphasize that they are
interested in the “coefficients table” for the 4 IVs of interest (which will give their
individual influence on the DV independent of the control variables, exactly as in
the hierarchical analysis). What this approach will not give the researcher is the
overall R-squared change for the 4 variables of interest as a set, so it will be less
informative.
To go through the analysis in more detail step-by-step:Must mention Importance of Data Screening Steps
 Note the sample size is adequate; neither too large nor too small (according to e.g.
Green, 91), although with 1000 particiapnts were are likely to have power to
detect fairly small effect sizes.
 Data is broadly adequate for MR – all ordinal/linear (scale) data or dummy
variables (very good answer will probably explicitly mention how gender can be
coded -- basically with any two numbers, but typically 0 and 1 or 1 and 2).
 Will want to screen the data checking for normality of distribution, univariate and
bivariate outliers (frequency and scatterplots) and multivariate outliers
(Mahalanobis distance), illegal values.
 Collinearity between pairs of IVs can be checked by their bivariate correlations
(should be below 0.9) and multicollinearity within the set of IVs used (explain
what this is) should be assessed either by calculating tolerances (see below for
what these are and what are likely to be unsafe values) or by using collinearity
diagnostics. The Tolerance statistics are simply derived by treating each IV in the
model as a DV and performing a multiple regression using the remaining IVs.
Tolerance is (1-R2) from such a model. Multicollinearity occurs when an IV is
extremely well-predicted by a linear combination of the other IVs in the model,
thus Tolerance (TOL) should not be low (various figures are suggested by
different authorities -- not below 0.25 or 0.1). VIF (variance inflation factor) is
simply 1/TOL.

Multivariate normality can be assessed by scatterplots on selected pairs of
variables (checking for linearity, normality and homoscedasticity); variables with
very different skews may be useful to plot in this connection. Alternatively,
violations of multivariate normality can be revealed by examining plots of
residuals against predicted DVs
Next the answer needs to describe or reiterate the hierarchical nature of the analysis

The model summary table will show the overall findings for the first model 1
(i.e. the model entered on block 1 containing the control variables). This will give
the R-squared (and thus proportion of variance in the DV) explained by the control
variables. There will also be an F-test for this model which will indicate whether
the collective effect of the control variables explain a significant portion of DV
variance. (Given what the control variables are it seems likely, with such a large
sample that this overall model will be highly significant.) A good answer might
note that we should pay more attention to adjusted R2 statistics throughout as these
take account of the number of IVs included and also adjust for the fact that
unadjusted R2 overestimates the population value of R2.
 The next model shown in the table is model 2 (after inclusion of the 4 IVs of
interest in block 2) The key statistics are the change statistics for block 2 relative
to block 1 (i.e. the improvement of model fit after adding in the new IVs on block
2). We are looking for the R-squared change and the associated F-change
statistics. If the F-change is significant this means that the 4 IVs of interests, as a
group, explain significant additional variance in the DV (risky road use behaviour
in the simulator). The R2 change shows what additional % of DV variance these
IVs explain over and above the control variables. With such a large sample the
change might be significant even with a small change in R2.
 The coefficients table will give the regression coefficients (B) for the constant
and each of the IVs in each of the 2 models. The table for Model 2 (with all 8 IVs)
is of interest and in particular the coefficients for the 4 IVs of interest. Each
coefficient is also reported with the std error for that coefficient and the t-test
statistic which test whether each coefficient is significantly different from zero.
Note that the t value is just the coefficient divided by its std error. A standardised
coefficient (beta) is also reported; this is simply the coefficient for the regression
equation had both the DV and the IVs been standardised (and allow a better
comparison of the relative importance of the separate IVs). This table allows one
to determine which of the 4 IVs of interest explained a significant portion of DV
variance independent of the effects of the 4 control IVs and the other 3 IVs of
interest. These are the hypotheses that the researcher wanted to test. (A very good
answer might note that coefficients table is where the collinearity information is
printed out in SPSS and if any multicollinearity exists then some of the IVs need
to be dropped or recombined, and the analysis re-run.)
Q3
Topic: Logistic Regression
The easiest way to answer this question is to go through the printouts section by
section, describing the types of model fitted and leading to the recommendations
made as we go:STUDY 1







The first output from Study 1 is a crosstab table which need not be discussed as
it is reproduced in the crosstabs given later on.
The model fitting information table includes information about the fit of two
models: one is a model with no effects just the intercept (intercept only model)
and the other (final model) is the model specified for this stage (for stage 1 the
model is as described in (i) above). The -2 * log likelihood (-2LL) values for
each model are given in the table with larger values indicating worse fitting
models. The difference between -2LL values for two models is a likelihood
ratio test statistic and this is distributed approximately as the chi-squared
distribution with degrees of freedom (df) equal to the difference in number of
parameters between the two models. This value is shown in the Chi-square
column (=65.488) and the statistic is highly significant for 5 df indicating that
there is be a statistically significant deterioration in fit from the final model to
the intercept only model. This means that some or all the parameters in the final
model are useful in explaining variance in outcome (i.e., the 2-category DV of
chicken survival).
A good answer might also explain why there are 5 df. The final model contains
terms for breed (3 levels), which requires 2 parameters (3-1); the food factor (2
levels) requires 1 parameters (2-1); and the breed*food interaction also requires
2 parameters. This explains the 5 df (=2+1+2).
A good answer will note that effects in logistic regression are really Effect*DV
interactions, although with 2 levels of the DV this means that the number of
parameters needed is the number of parameters for the factor (eg breed =2)
times by (2-1) for the DV.
This shows that the model fitted for study 1 is a full factorial model (it has all
the possible factors and their interactions included). It is also a saturated model
as when only factorial IVs are included (ie no covariates), and the model is full,
there are no more degrees of freedom.
This is confirmed by the goodness-of-fit (GOF) table, which compares the
deterioration of fit between a saturated model (i.e., a model with 0 df, that
provides the best possible fit to the data) and the final model for that stage of the
analysis. The two statistics (Pearson and Deviance) calculate the goodness of fit
statistic in slightly different ways (Deviance is equivalent to a log likelihood
ratio test). The zero df for the GOF statistics indicates that the model tested is a
saturated one (as it contains the same number of parameters as the saturated
model with which it is compared).
The likelihood ratio tests table provides information on the deterioration in the
fit from the model fitted in this stage to reduced models in which particular
terms are removed from the model. It is possible to remove the breed*food
interaction but it is not possible independently to remove the intercept or main



effects of breed and food (as these are nested under the breed*food interaction),
and so a zero df likelihood ratio test is reported for breed and food. The
breed*food row shows the -2LL value after removing the interaction effect from
the model. Note that the -2LL value is higher indicating a worse fitting model.
The difference in -2LL values is shown in the chi-square column of the table
(=256.124) and this is the likelihood ratio test statistic for the deterioration in
model fit and it has 2 df which is the number of parameters associated with the
breed*food interaction effect (n.b. really breed*food*survived, hence 2*1*1
parameters). This statistic is significant and so we would have a significantly
worse-fitting model if we were to remove the interaction term from the model.
The significant interaction term means that chicken survival varies significantly
as a function of the combination of breed and food.
Model fitting stops here as we can’t remove the highest level term from the
model without a significant deterioration.
The crosstabs and associated chi2 stats allow us to interpret the interaction. For
Breed 1 there was no significant effect of food on survival although there was a
10% trend for more to survive with food 1 than food 2. For breeds 2 and 3
significantly more chicks survived with food 2 than food 1 (and the effect was
numerically much more marked for breed 3). So the recommendation to the
farmer would be that if he wants to sell all breeds 2 and 3 (maybe these are very
popular with customers) then he should adopt food 2 with these birds (assuming
there was no huge price differential on the foods). If he wanted also to sell breed
1, then there was no strong evidence that it would do worse with food 2 than
food 1, but he should expect significantly poorer survival for this breed (relative
to the other 2 breeds) with food 2. If it wasn’t impractical he might be better
advised to use food 1 with breed 1.
STUDY 2

The first step of this analysis is again to fit a full factorial saturated model, but
this time there was absolutely no hint of a breed*food interaction (chi2=0.26,
df=2, p>0.8). Therefore, a reduced model could be fitted on the second step of
the analysis. This reduced model is called a main effects model because it
contains terms just for the main effects of breed and food (but not their
interaction).

In step 2 the likelihood ratio test table shows the consequences of removing each
main effect from the model. In each case (ie for breed and food) there was a
significant reduction in the fit of the model. This tells the researcher that there
are systematic differences in survival rate as a function of breed (irrespective of
food type given) and systematic differences in terms of food (irrespective of
breed). Further simplification of the model is therefore not warranted.

A really good answer might note that the Parameter estimates table (PET)
conveys largely the same information as the likelihood ratio tests table. The
difference is that the parameter estimates table provides a test of the effect
within a model containing the other terms, while the likelihood ratio tests
provide information on the comparison of two models -- one with the effect and
one without. The parameter estimates are more informative in that the effects are
broken down into each single df, whereas they are aggregated together in the
likelihood ratio tests.

In the PET, there is further confirmation that both breed and food are predictive
of survival. The rows marked “breed 4” compare breed 4 with breed 6, whereas


the row marked “breed 5” compares breed 5 with breed 6 (breed 6 is noted as
being redundant when compared with itself). The effects are log odds ratios
(given by the B parameter) or the odds ratio (given by Exp(B)). The odds ratio
(OR) is the change in odds of not surviving (outcome ‘survived=0’, ie died,
shown in the table) relative to the other outcome (survived=1, not shown in the
table). The log odds ratio is significantly below zero (as indicated by the Wald
test: B2/std error2, which is distributed as chi-square with 1 df) for breeds 4 and
5 relative to breed 6. This means that these breeds are likely to die less often.
This means that the ORs are significantly below 1, about 0.5 for either breed 4
or 5, relative to breed 6. (This is confirmed by the fact that the 95% confidence
intervals for the breed ORs fall below 1.)
The PET also shows that odds of dying relative to surviving for food 1 are over
2 times higher than the same odds for food 2 (OR=2.29; significantly above 1).
The conclusions from the PET are supported by the cross-tabs which also show:
(a) that breeds 4 and 5 are more likely to survive than breed 6; and (b) that food
2 tends to produce higher survival than food 1. So if food 2 was not much more
expensive the researcher would recommend this food for breeds 4 to 6, but
would avoid breed 6 unless there was a strong demand for this particular breed.
Section B
Question 4
Topic: ANCOVA
(i)
[50% of marks for question]

The key point is that researcher A has used an experimental design in which
participants were randomised into the different groups, whereas Researcher B
has used a quasi-experimental design with naturally-forming groups
(schizophrenics and controls). In both studies the researcher wants to compare
specific cognitive variables between the groups, but a measure of IQ showed
that there were significant IQ differences between the two groups in each study.
Both researchers wanted to remove the possible influence of group IQ
differences on the specific cognitive performance DV. The problem is that
ANCOVA, despite its widespread use as such, is not generally a suitable tool for
controlling unwanted differences on a nuisance variable between groups.
ANCOVA is most safely used when there is no difference in the mean value of
the covariate between the groups (students might draw an overallping circles
variance diagram to illustrate this scenario).
In the case of researcher B it is generally agreed that it is **unsound** to use
ANCOVA for this control function. The reason is that the IQ difference is likely
to be part of the intrinsic difference between chronic schizophrenic patients and
age-matched controls. By using ANCOVA to equate group IQ differences one
ends up evaluating the association between cognitive function and the “group
residual” variable (i.e. the group variable after removal of that variance which
overlaps with IQ differences). Students may draw another overlapping circles
variance diagram to illustrate this situation and the “group-res” variable. In
addition, by using ANCOVA in this situation means that one is making the
comparison between brighter-than-average chronic schizophrenic patients and a
subset of the age-matched controls who were less bright than the sample
average. This is probably not the hypothesis which the researcher wanted to test.
Examples to illustrate the problem:
from Lord --- Research question was “do boys end up with a higher final weight
(DV) after following a specific diet than girls (gender=IV) even when including
initial weight as a covariate?” Part of the intrinsic gender difference is in weight
and so using ANCOVA here would end up comparing the weight gain for
relatively light boys with the weight gain for relatively heavy girls. This is not
the hypothesis we want to test and there are issues about regression to the mean
as well (if we sampled light boys at the start of an experiment then they would
be likely to have gained more weight than heavy boys -- or indeed heavy girls -at a later testing point by regression to the mean).
from Miller and Chapman -- Imagine using ANCOVA to answer the question
would six and eight year old boys differ in weight if they did not differ in
height? Once again ANVOVA would create a comparison of short 8 year olds
with tall 6 year olds. Do we want to ask that question?
However, in the case of researcher A it is probably OK to use ANCOVA in this
situation. The reason is that the IQ differences between our randomly assigned
experimental groups are almost certainly just due to chance (from random




sampling), and so removing the group differences on the covariate is unlikely to
systematically distort the IV by removing part of the IV’s intrinsic variance.
However, this should still not be the primary purpose of the ANCOVA but
would be an additional consequence of doing the ANCOVA. See part (ii)
(ii) [20% of marks] The other -- and primary -- use of ANCOVA is to remove the
effects of noise variables in experimental designs. As these are cognitive
experiments one might anticipate that performance on the tests would be
associated with IQ score and so one would want to remove the influence of IQ
performance. The intention of this approach is to increase the power of the
statistical tests for the effects of the experimental variables. (There may have
already been a diagram to illustrate the increase in power.) However, it is only
safe to do this when the groups either do not differ on the covariate or when one
can be confident that the covariate differences are due to chance. This means
that this use of ANCOVA could safely be advised to researcher A but NOT to
researcher B. In the case of researcher A applying ANCOVA to increase
experimental power in this way would have the additional benefit of removing
the effect of the between-groups IQ difference as well, which arose through
chance.
(iii) [20% of marks] Half of the marks for the explanation and half for the diagrams,
properly labelled. The assumption which the question is alluding to is the
homogeneity of regression (HOR) slopes assumption. To apply ANCOVA one
removes the effect of the covariate from the DV using a single regression
equation across all subjects in the study. To do this meaningfully the
relationship between the covariate and the DV has to be the same (within
statistical limits) in each group of the study; in other words the regression
(slope) must be homogeneous across groups. A really good answer (>70%) will
say that when one tests this assumption one includes a group*covariate
interaction term in the ANOVA model, and if this interaction is significant then
the HOR assumption has been violated. Diagrams should illustrate the parallel
linear regressions of covariate on DV for each group separately (Homogeneity),
and also illustrate the case where the regression lines are clearly not parallel
(homogeneity violated).
(iv) [10% of marks for question]. The other use of ANCOVA is in so-called RoyBargmann step-down analyses carried out after finding a significant effect in a
MANOVA. (if they just say this give 7 out of 10 for this part of the question) -for full marks they need a bit of elaboration:- The Manova might test for group
differences on a set of DVs. To do the step-down analysis, one has to have an a
priori priority ordering of DVs (based on theory or other considerations). One
begins with the highest priority DV and tests this in a simple ANOVA (adjusting
for the total number of comparisons). Then one takes the next highest priority
DV is tested via an ANCOVA with the higher priority DV acting as a covariate.
The procedure repeats down the priority order with all the higher-order DVs
acting as covariates at each step. The intention is to try to understand the relative
contribution of the DVs to the MANOVA effect -- one is seeing whether there is
an effect for a particular DV even after removing the influence of higher priority
DVs (rather like hierarchical or sequential multiple regression).
Question 5
Topic: Contrasts
This question has a lot of text to read and consequently the actual written answers are
not expected to be, nor do they need to be, very lengthy in order to get very good
marks. (Model Answer not written as only 1 candidate attempted this question.)
Question 6
Topic: MANOVA and Repeated measures (M)ANOVAs
(i)
[40 % of marks for question] The essence to the answer to this question is that
the researcher called John has to decide whether to conduct a single MANOVA
comparing the two groups on a composite DV formed from the two state anxiety
measures A and B, or to carry out two separate ANOVAs, one for each measure
separately. Advantages/disadvantages of these choices:

MANOVAs main advantage (c.f. 2xANOVA) is that it reduces the need to
correct for multiple comparisons. This means that the result can be tested
at 0.05 significance rather than 0.025 (Bonferroni corrected) for each of
the 2 ANOVAs. Obviously, this advantage is greater the more DVs are
being used. This, of course, is not an advantage over simply averaging the
two DVs and carrying out a single ANOVA. However, relative to the
simple averaging method MANOVA does have the advantage in that it
creates a weighted (linear) combination of the DVs (2 in this case) which
maximally separates the groups.

There are quite subtle and complex power issues in the choice between
carrying out one MANOVA and two ANOVAs. Under certain rare
situations MANOVA can show differences that would not show up with
ANOVA (illustrative diag -- Fig 9.1 in Tab and Fidell). More generally the
power of MANOVA depends on the relationship between the DVs being
combined (and the number of them), and in many situations ANOVA (of
the DVs or a single averaged DV) will be considerably more powerful.

MANOVA is a more complex technique which also carries a number of
additional assumptions relative to ANOVA (e.g., homogeneity of
variance-covariance matrices across groups rather than just homogeneity
of variances in the ANOVA case).
(ii)
[20% of marks] The researcher called Janet does not want to use a MANOVA as
she is interested in the differences in the effect of the group variable (the
training programme vs. control) on the two different measures, which tap
different aspects of anxiety. (At this point, as with part i, we are still not
concerned about the 3 timepoints of anxiety measurement and are working with
the average measure over the 3 timepoints, for index A and index B. Do not
penalise answers that unnecessarily add the time factor in the answers to parts i
and ii.) So Janet just needs to carry out a 2-by-2 mixed-design or split-plot
ANOVA with group as the between subjects factor and anxiety index (A or B)
as the repeated-measures factor. The most critical result from this analysis
would be the interaction between group and index type. If this were significant
Janet could conclude that the training programme (relative to the control
treatment) had a significantly different effect on the two different types of
anxiety. A really complete answer here would note that Janet might then
conduct simple main effects contrast analyses for the group factor on each
anxiety index separately making an appropriate correction for multiple
comparisons.
(iii) [20% of marks]. John could extend the ANOVAs (for each anxiety DV
separately, or for the single ANOVA on the averaged measure) by adding time
as a repeated-measures factor with 3 levels. As the repeated measures factor has
>2 levels, he would need to choose between analysing the repeated measures
factor using ANOVA or MANOVA methods (and use a test of sphericity to help
him to choose between the approaches). In either the MANOVA or ANOVA
case he would need to consider the group by time interaction from such analyses
to see if the group differences in anxiety varied significantly as a function of
time before the exam. Appropriately corrected follow-up contrasts might also be
made (at exam time separately). John could similarly extend his MANOVA
analysis (on the single composite anxiety DV) by adding a repeated-measures
time factor. If he added this factor using a multivariate repeated measures
analysis the analysis would become a so-called “doubly-multivariate” analysis.
(A really good answer might note that it is possible to have the composite DV to
be created by MANOVA processes and analyse the repeated-measures effect on
such a composite DV using repeated-measures ANOVA -- this would obviously
not then be doubly multivariate.) In both these multivariate cases the group-bytime interaction terms (and subsequent contrasts) would be critical to answering
John’s research question.
(iv) [20% of marks] Janet could extend her split-plot analysis by adding a further
repeated-measures factor (time, with 3 levels, and so she would again need to
choose whether to do this using repeated-measures MANOVA or ANOVA).
The result would be a 2 (group) x 2 (index type) x 3 (time) analysis with
repeated-measures on the last 2 factor. Give some credit for the above but it will
be hard to pass this part of the question unless the answer also shows a clear
recognition that Janet is proposing a simple linear trend over time on the effect
of the group variable. (The pattern of results she is proposing might be
illustrated by a graph: with the treatment group having lower anxiety than the
control group, but by a decreasing amount as the exam draws near.) So, she
should be looking in particular at the outcome of a trend analysis and be
particularly concerned with the interaction between the linear trend over time
and group. If these effects were more marked for one type of anxiety index
rather than the other, then she might find a significant (linear time trend X group
X anxiety measure type) 3-way interaction.
Section C
Question 7
Resampling and Nonparametric methods
(not written as only 1 candidate attempted question)
Question 8
Classical Test Theory
(i) [30% marks]
For a variable xobs the following expression is the basis of classical test theory (CTT):
xobs = xtrue + x
xobs is the observed (i.e. measured) value of a variable x; xtrue is the true value for the
variable; and x is the error term associated with the measurement of xobs. The error
term is random which means that it has zero mean (i.e. not a systematic bias) and is
also uncorrelated with xtrue. Thirdly, the error term is assumed to be drawn from a
normal distribution. We can represent the true variation in x and the error term with
independent normal variables, denoted G(mu, sigma) where mu is the mean of the
normal distribution and sigma is the s.d.
(ii)
[20% marks] We define 3 variances:
σ2obs is the variance associated with the observed score of x; σ2true is the variance
associated with the true score of x; and σ2error is the error variance. From the basics of
CTT we know that σ2obs = ( σ2true + σ2error)
Reliability is defined as the proportion of the observed variance in a measure which
reflects true (non-error) variance of the entity being measured.
Thus reliability = = σ2true / σ2obs = σ2true /( σ2true + σ2error)
Let us assume xobs is the value measured at one time-point and yobs is the value of the
same variable measured at another time point. The correlation between xobs and yobs,
rxy, is defined as
rxy = Covar(xobs, yobs) / sqrt(Var(xobs)*Var(yobs))
where Covar(a, b) is the covariance between a and b.
From the information above, the expected values of sample variance of x and y can be
written as:
Exp{Var(xobs)} = p2 + errx2
Exp{Var(yobs)} = p2 + erry2
Given that the error terms for xobs and yobs, the covariance (shared variance) between
xobs and yobs is p2. It follows from the definition of the correlation between measures,
and the expected variance results, that we can obtain the following result for the
expected value of the correlation:
Exp{rxy} = p2 / sqrt((p2 + errx2)*( p2 + erry2))
If we assume that the measures at each of the two time-points have equal reliability
then errx2 = erry2 = err2. From this is then follows that
Exp{rxy} = p2 / (p2 + err2)
i.e. the test-retest correlation will approximate the reliability of the measure.
(iii) [20% marks]
Now xobs and yobs are two different measures of the same construct
The average score, Ave = (xobs + yobs)/2 = p + 0.5*error1 + 0.5*error2
Assume again that the reliability of each measure is the same,
i.e. error1=error2=err
The variance of 0.5*error1 = 0.5*error2 =0.25*σ2err
Thus, as the two error terms are uncorrelated, their variances will sum together.
This means that the total error variance associated with Ave is 0.5*σ2err and so
the reliability of Ave is
σ2p /( σ2p + 0.5*σ2err) which is greater than the reliability of xobs or yobs.
(iv) [20% marks]
In the researchers model of the tasks, RT in condition 1 is supposed to measure
process p1; RT in condition 2 is supposed to measure the combined effect of p1 plus
p2; RT in condition 3 supposedly measures the combination of all 3 processes p1, p2
and p3.
Assuming the processes combine additively, then we can estimate processes p2 and
p3 by subtractions, as the researcher proposed. CTT expressions for each observed
RTs in each condition (x1, x2, and x3) can be written:
x1 = G1(p1, p1) + G11(0, err1)
x2 = G2(p2, p2) + G1(p1, p1) + G12(0, err2)
x3 = G3(p3, p3) + G2(p2, p2) + G1(p1, p1) + G13(0, err3)
… (17)
where G1-G3 are the random normal variables associated with true the values of
processes p1-p3 and G11-G13 are the random normal variables (with zero mean) of the
associated error terms.
Thus, the estimates of p2 and p3 are given by:Est(p2) = G2(p2, p2) + G12(0, err2) - G11(0, err1)
Est(p3) = G3(p3, p3) + G13(0, err3) - G12(0, err2)
The above expressions show two important properties about these difference
measures which makes them unlikely to be able to answer the researcher’s question
definitively:
They are less reliable than either of the constituent measures (as they contain the error
terms from each part of the subtraction, rather than just a single error term), and more
importantly, they share a common error term (G12; with opposite sign in each
expression). Thus, if the error variance was reasonably large, and even if the
processes p2 and p3 were really unrelated, then the difference measures would tend to
correlate (negatively). The chances of detecting a real positive correlation between the
processes would also be reduced by the artefactual negative correlation caused by the
shared error term (and when a positive correlation was detected it would be likely to
underestimate the true size of the correlation).
(v)
[10% marks] The correlation between the two difference measures would be
more informative if the reliability of the basic measures was very high (i.e. the
size of the error variances was small relative to the true score variances). Give
limited credit for answers which say “if variables were measured without error”
as this is an unlikely case in psychology and is a single (extreme) case of the
general scenario mentioned above. For excellent marks the answer should note
that the assumptions of classical test theory are that the error terms associated
with each measure are uncorrelated with one another. These assumptions are
necessary to the arguments presented in (iv). If the error terms were correlated
across RT measures then subtraction would remove not only the common
processes between measures (as illustrated in iv above) but also remove the
correlated part of the error. In this situation, the error term for the difference
measures could be smaller than the error terms of the constituent RT measures
from which the difference measures were formed. In this case, correlations
between difference measures are more likely to be informative. [Might note
whether error terms likely to be correlated across the different RT measures.
Assuming participants did all 3 measures in the same testing session then any
error processes which were sustained across the whole testing session (e.g. a
participant feeling unusually tired or alert on that day etc) would contribute
systematically across all RT measures and thus lead to a degree of correlation of
the error terms.]
Question 9
Topic: Power
(i)
[20% marks]
Changing the type I error rate (alpha) changes power all other things being held
constant. If one moves the critical value line to the right (decreasing alpha, the
area to the right of this line in the H0 distribution) then the area to the right of
this line in the H1 distribution (i.e. power) must also decrease. So to increase
power you can use a more lax (larger) alpha value. (This could be illustrated
with a second diagram.)
(ii)
[10% marks] d = expected mean difference between groups / common standard
deviation
= 2.5/7.5 = 0.33
(iii) [10% marks] 80% or 90% are usual for sample size determinations. Give 0
marks for anyone who mistakenly quotes Cohen’s rules of thumbs for effect
sizes (0.2 small; 0.5 medium; 0.8+ large)
(iv) [10% marks] Change the following line
COMPUTE zalpha=IDF.NORMAL(1-alpha/2,0,1) .
to read (1 - alpha,0,1). Give most of the marks for this. For 100% have to explain why
it works. The original syntax converts the upper-tail of a two-tailed critical region
(1-0.025=0.975) to a z-score to use in the calculations. The revised syntax uses a
single tail (at 0.95) to compute the z-value. This makes the z value smaller than that
used in the two-tailed case so (appropriately) reduces the number of subjects required,
all other things being equal -- see 3rd line of syntax.
(v)
[10% marks] You would need more participants in total because power is
maximal for equal sized groups; with unequal groups you reduce power and
therefore need more subjects to detect an effect of a certain size at a certain
power.
(vi) [10% marks] Non-centrality parameter (NCP)
(vii) [20% marks] The NCP is a very simple concept. Under the null hypothesis the
sampling distribution of the mean difference between the two groups would
have an expected value of zero. If the null hypothesis is false then the true mean
difference is non-zero. The NCP is simply the amount by which the sampling
distribution is displaced from the central (zero) value, when the alternative
hypothesis is true. It is therefore intimately linked to the effect size.
The diagram below covers parts (vii) and (viii):-
(viii) [10% marks] see above diagram. As this diagram isn’t really any more than that
in part (i), the answer must place the line for tcrit (and shading for alpha) in
both tails of the distribution here (as a two-tailed test is specifically mentioned).
Give only 50% for 1-tailed shading otherwise correct. Drawings can look like
normal distributions although ideally should note that the distributions are t
distributions (slightly more platykurtic than normal distribution).
Download