Model Answers Summer exams 2003

STATS MODEL ANSWERS Model Answers Summer exams 2003 SECTION A Q1 Topic: Factor Analysis (i) [10% of marks] There are a number of differences: PCA derives components (which are composite “summary” variables formed from linear combinations of the measured variables) rather than factors (FA assumes latent unobserved variables which cause the variations in the observed variables); PCA analyses all variance (shared variance between variables, error variance and systematic variance unique to a particular variable) whereas FA analyses only covariance (ie just the shared variance); the previous point means that if as many components are retained as there are variables PCA exactly reproduces the data whereas FA merely approximates it; mathematically in PCA each variable contributes a unit of variance by placing a 1 on the leading diagonal of the variance-covariance matrix, whereas FA starts by estimating communalities and placing them on the leading diagonal (an estimate is made using the squared multiple correlation of for each variable as predicted by all the other variables). (ii) [30% of marks]  Varimax rotation is a form of orthogonal rotation of the solution (ie all factors are uncorrelated) which is designed to maximizes the variance of the factor loadings over the variables (hence the name). This simplifies the factors by having either high or loading variables and avoiding variables with mid-loadings. This makes factor interpretation/labelling easier (hence popularity of method).  A scree plot is a method for graphically determining the number of factors to be retained in the analysis. It is achieved by plotting the eigenvalues (which reflect the amount of total variance/covariance explained by the factor) for each factor in size order. The number of factors or components to be reatianed is determined by the elbow in the plot (where the size of the eigenvalues changes relatively little from one fator to another). Accurate to +/- a factor or so. Some debate about whether to include factor at elbow or not.  Exclusion of cases listwise means that if there are a set of variables to be used in the PCA or FA then for a case’s data to be included that case must contribute a datapoint for each variable. If the case has missing data for any variable the case is deleted. The alternative would be to estimate the correlation matrix (for FA or PCA) based on the maximum amount of data for each variable concerned.  KMO measure of sampling adequacy (MSA) and Bartlett’s test of sphericity are means for establishing the factorisability (or factorability) of a correlation matrix. In other words for checking whether there are meaning ful relationships between subsets of the variables which can cluster into factors/components. KMO has to be >0.6 to indicate  factorisability. Bartlett’s test being significant means that the hypothesis that there are no factors can be rejected but this test is overly sensitive. The anti-image correlation matrix (AICM) is another means for determining factorisability. To get the off-diagonal elements of the AICM one calculates the partial correlation between variable X and Y partialling out all the other variables (and then multiply this correlation by -1). Even if X and Y are related then, if other variables covary with X and Y (ie can form a factor with X and Y), the partial correlation between X and Y will be small. So the off-diagonal elements of the AICM should be zero. The KMO sampling adequacy measures for each variable are put on the diagonal of the AICM and these values should be as close to 1 as possible. (iii) [30% of marks]  The KMO MSA is above 0.6 which means that the matrix is factorisable, the same conclusion is indicated by the less useful Bartlett’s test (which is significant and thus rejects the null hypothesis that there are no factors). Notice that the MSAs for the individual variables along the leading diagonal of the AICM are between 0.529 and 0.787, values which are reasonably close to 1. The off diagonal values of the AICM are mostly close to zero, again indicating factorisability of the matrix although there is a large negative value -0.557 for EPQ-N and CogDis schizotypy. These variables -- and only these variables among the 7 in the analysis -- are expected to load on Eysenck’s neuroticism factor and so little of the shared variance between EPQ-N and CD schizotypy will be accounted for by other variables, thereby leaving the AICM partial correlation value large (and negative because multiplied by -1). A similar explanation can account for the large positive AICM correlation between Extraversion and IA schizotypy measure (E and IA are negatively correlated and the only variables which load on Eysenck’s extraversion factor). Having fewer than 3 variables for each factor will naturally produce these kinds of findings.  If we look at the eigenvalues for the 7 components that can be extracted (given 7 variables in the PCA) these give us an indication of the proportion of variance in the data which is captured by the components. The first 3 components explain a total of 79% of the variance in the variables (38.5%, 25%, and 15% respectively) with small contributions from the next 4 components (6.5 to 3.5%). This is reflected in the scree plot which has a sharp elbow at the 4th factor. This indicates 3 or 4 factors (there is debate in text books but logically retaining 3 factors is indicated as if we retained the 4th component we should retain the 5th as it explains the same amount of variance as the 4th). Notice that 3 components are predicted by Eysenck’s theory. Note also that the 3 retained components have eigenvalues >1 (which is another criterion for retention, although these days this is not one which is considered very useful).  The rotated component matrix shows which variables load onto which components. Component 1 is loaded on largely by 4 of the variables (EPQ-N, CD schizotypy, IN schizotypy and RD schizotypy; absolute values of loadings >0.65 with other variables having [absolute] loadings below 0.12); component 2 is loaded on by Extrvaersion and IA schizotypy only (absolute loadings >0.87, and component 3 is loaded on by EPQ-P (0.936) and somewhat by IN schizotypy (0.53). Thus the solution does not achieve “simple structure” because IN schizotypy loads on more than 1 component. Nor does the solution fit closely with Eysenck’s model (component 1 including more variables than expected and not having a simple overall interpretation). (iv) [20% of marks]. The second PCA adds two measures of anxiety which should load on the Neuroticism factor. The obtained solution is a very close fit to Eysenck’s model. Looking at the rotated component matrix, component 1 has four variables which load on it (loadings >0.69). These are the two anxiety variables, EPQ-N and CD schizotypy just as Eysenck would predict. Component 2 has 3 component which load on it (EPQpsychoticism, IN and RD schizotypy) again exactly as Eysenck would predict (loadings >0.71). The 3rd component has 2 loading variables (Extrvaersion and IA schizotypy [negative], with absolute loadings >0.81). However, thre are hints of further factorial complexity (and lack of simple structure) as EPQ-Psychoticism has a sizeable cross-loading (-0.45) on component 1. Perhaps other variables related to this, or to the other variables related to the factor on which EPQ-P loads, would simplify the factor structure further. (v) [10% of marks]  Good PCA and FA solutions are ones that are interpretable -- certainly this qualifies as it is has pretty easy to interpret components, and ones that were predicted by theory.  The slight hint of a lack of simple structure (noted above) indicates that further variables might have been profitably included (this can answer the “suggest further analyses” part of this question element)  On the numbers and indices calculated the solution should have a reasonable chance to behave well: The requirements of sample size, noting that there is no single opinion on this matter (e.g. Comfrey & Lee = a minimum of 300 cases for a good factor analysis or ratio of cases to variables - Nunnally 10:1, Guildford 2:1, Barrett &Kline find 2:1 replicates structure while 3:1 is better). Current data look adequate -- 211 subjects and 9 variables  Could mention ratio of Variables to Factors (e.g. Tabachnik & Fidell 5 or 6:1; Klein 3:1; Thurstone 3:1; Kim & Mueller 2:1). As hinted at above (3 factors from 9 variables) we might be a bit less satisfactory here.  MSA is better here than in analysis 1 (0.76), as are the MSAs for the individual variables.  Solution still explains 75% of variance with 3 factors even though we’ve added 2 new variables.  Some of the high AICM off-diagonal values in analysis 1 have been reduced by addition of extra variables as one would expect (EPQ-N and CD schizotypy now only -0.37 cf -0.56).  In sum, this looks like a pretty good PCA Q2 Topic: Multiple Regression The answer must lead up to carrying out a hierarchical (aka sequential) multiple regression. In this approach the 4 control variables (age, SES, road use experience gender) are all simultaneously entered on block 1 of the analysis and the 4 independent variables of primary interest (anxiety, impulsivity, psychomotor speed, distance estimation) are subsequently all simultaneously entered in block 2. This approach is conservative wrt the variables of interest in that all the “credit” for DV variance explained by the control variables is given to the control variables, and only unique additional DV variance related to the variables of interest is ascribed to them. To explain this more fully: It might be the case that one of the variables of interest [eg distance estimation ability] causally affects the DV and one of the control variables [eg road use experience]. That portion of the DV would be ascribed to the control variable rather than to the variable of interest.  We should penalise anyone who talks about a stepwise or statisical MR. They have been strongly taught that this technique is unsafe (in terms of significance levels) and replicability of findings.  Anyone who mentions a single block forced-entry regression with all 8 predictors can also get a reasonably good mark as long as they emphasize that they are interested in the “coefficients table” for the 4 IVs of interest (which will give their individual influence on the DV independent of the control variables, exactly as in the hierarchical analysis). What this approach will not give the researcher is the overall R-squared change for the 4 variables of interest as a set, so it will be less informative. To go through the analysis in more detail step-by-step:Must mention Importance of Data Screening Steps  Note the sample size is adequate; neither too large nor too small (according to e.g. Green, 91), although with 1000 particiapnts were are likely to have power to detect fairly small effect sizes.  Data is broadly adequate for MR – all ordinal/linear (scale) data or dummy variables (very good answer will probably explicitly mention how gender can be coded -- basically with any two numbers, but typically 0 and 1 or 1 and 2).  Will want to screen the data checking for normality of distribution, univariate and bivariate outliers (frequency and scatterplots) and multivariate outliers (Mahalanobis distance), illegal values.  Collinearity between pairs of IVs can be checked by their bivariate correlations (should be below 0.9) and multicollinearity within the set of IVs used (explain what this is) should be assessed either by calculating tolerances (see below for what these are and what are likely to be unsafe values) or by using collinearity diagnostics. The Tolerance statistics are simply derived by treating each IV in the model as a DV and performing a multiple regression using the remaining IVs. Tolerance is (1-R2) from such a model. Multicollinearity occurs when an IV is extremely well-predicted by a linear combination of the other IVs in the model, thus Tolerance (TOL) should not be low (various figures are suggested by different authorities -- not below 0.25 or 0.1). VIF (variance inflation factor) is simply 1/TOL.  Multivariate normality can be assessed by scatterplots on selected pairs of variables (checking for linearity, normality and homoscedasticity); variables with very different skews may be useful to plot in this connection. Alternatively, violations of multivariate normality can be revealed by examining plots of residuals against predicted DVs Next the answer needs to describe or reiterate the hierarchical nature of the analysis  The model summary table will show the overall findings for the first model 1 (i.e. the model entered on block 1 containing the control variables). This will give the R-squared (and thus proportion of variance in the DV) explained by the control variables. There will also be an F-test for this model which will indicate whether the collective effect of the control variables explain a significant portion of DV variance. (Given what the control variables are it seems likely, with such a large sample that this overall model will be highly significant.) A good answer might note that we should pay more attention to adjusted R2 statistics throughout as these take account of the number of IVs included and also adjust for the fact that unadjusted R2 overestimates the population value of R2.  The next model shown in the table is model 2 (after inclusion of the 4 IVs of interest in block 2) The key statistics are the change statistics for block 2 relative to block 1 (i.e. the improvement of model fit after adding in the new IVs on block 2). We are looking for the R-squared change and the associated F-change statistics. If the F-change is significant this means that the 4 IVs of interests, as a group, explain significant additional variance in the DV (risky road use behaviour in the simulator). The R2 change shows what additional % of DV variance these IVs explain over and above the control variables. With such a large sample the change might be significant even with a small change in R2.  The coefficients table will give the regression coefficients (B) for the constant and each of the IVs in each of the 2 models. The table for Model 2 (with all 8 IVs) is of interest and in particular the coefficients for the 4 IVs of interest. Each coefficient is also reported with the std error for that coefficient and the t-test statistic which test whether each coefficient is significantly different from zero. Note that the t value is just the coefficient divided by its std error. A standardised coefficient (beta) is also reported; this is simply the coefficient for the regression equation had both the DV and the IVs been standardised (and allow a better comparison of the relative importance of the separate IVs). This table allows one to determine which of the 4 IVs of interest explained a significant portion of DV variance independent of the effects of the 4 control IVs and the other 3 IVs of interest. These are the hypotheses that the researcher wanted to test. (A very good answer might note that coefficients table is where the collinearity information is printed out in SPSS and if any multicollinearity exists then some of the IVs need to be dropped or recombined, and the analysis re-run.) Q3 Topic: Logistic Regression The easiest way to answer this question is to go through the printouts section by section, describing the types of model fitted and leading to the recommendations made as we go:STUDY 1        The first output from Study 1 is a crosstab table which need not be discussed as it is reproduced in the crosstabs given later on. The model fitting information table includes information about the fit of two models: one is a model with no effects just the intercept (intercept only model) and the other (final model) is the model specified for this stage (for stage 1 the model is as described in (i) above). The -2 * log likelihood (-2LL) values for each model are given in the table with larger values indicating worse fitting models. The difference between -2LL values for two models is a likelihood ratio test statistic and this is distributed approximately as the chi-squared distribution with degrees of freedom (df) equal to the difference in number of parameters between the two models. This value is shown in the Chi-square column (=65.488) and the statistic is highly significant for 5 df indicating that there is be a statistically significant deterioration in fit from the final model to the intercept only model. This means that some or all the parameters in the final model are useful in explaining variance in outcome (i.e., the 2-category DV of chicken survival). A good answer might also explain why there are 5 df. The final model contains terms for breed (3 levels), which requires 2 parameters (3-1); the food factor (2 levels) requires 1 parameters (2-1); and the breed*food interaction also requires 2 parameters. This explains the 5 df (=2+1+2). A good answer will note that effects in logistic regression are really Effect*DV interactions, although with 2 levels of the DV this means that the number of parameters needed is the number of parameters for the factor (eg breed =2) times by (2-1) for the DV. This shows that the model fitted for study 1 is a full factorial model (it has all the possible factors and their interactions included). It is also a saturated model as when only factorial IVs are included (ie no covariates), and the model is full, there are no more degrees of freedom. This is confirmed by the goodness-of-fit (GOF) table, which compares the deterioration of fit between a saturated model (i.e., a model with 0 df, that provides the best possible fit to the data) and the final model for that stage of the analysis. The two statistics (Pearson and Deviance) calculate the goodness of fit statistic in slightly different ways (Deviance is equivalent to a log likelihood ratio test). The zero df for the GOF statistics indicates that the model tested is a saturated one (as it contains the same number of parameters as the saturated model with which it is compared). The likelihood ratio tests table provides information on the deterioration in the fit from the model fitted in this stage to reduced models in which particular terms are removed from the model. It is possible to remove the breed*food interaction but it is not possible independently to remove the intercept or main    effects of breed and food (as these are nested under the breed*food interaction), and so a zero df likelihood ratio test is reported for breed and food. The breed*food row shows the -2LL value after removing the interaction effect from the model. Note that the -2LL value is higher indicating a worse fitting model. The difference in -2LL values is shown in the chi-square column of the table (=256.124) and this is the likelihood ratio test statistic for the deterioration in model fit and it has 2 df which is the number of parameters associated with the breed*food interaction effect (n.b. really breed*food*survived, hence 2*1*1 parameters). This statistic is significant and so we would have a significantly worse-fitting model if we were to remove the interaction term from the model. The significant interaction term means that chicken survival varies significantly as a function of the combination of breed and food. Model fitting stops here as we can’t remove the highest level term from the model without a significant deterioration. The crosstabs and associated chi2 stats allow us to interpret the interaction. For Breed 1 there was no significant effect of food on survival although there was a 10% trend for more to survive with food 1 than food 2. For breeds 2 and 3 significantly more chicks survived with food 2 than food 1 (and the effect was numerically much more marked for breed 3). So the recommendation to the farmer would be that if he wants to sell all breeds 2 and 3 (maybe these are very popular with customers) then he should adopt food 2 with these birds (assuming there was no huge price differential on the foods). If he wanted also to sell breed 1, then there was no strong evidence that it would do worse with food 2 than food 1, but he should expect significantly poorer survival for this breed (relative to the other 2 breeds) with food 2. If it wasn’t impractical he might be better advised to use food 1 with breed 1. STUDY 2  The first step of this analysis is again to fit a full factorial saturated model, but this time there was absolutely no hint of a breed*food interaction (chi2=0.26, df=2, p>0.8). Therefore, a reduced model could be fitted on the second step of the analysis. This reduced model is called a main effects model because it contains terms just for the main effects of breed and food (but not their interaction).  In step 2 the likelihood ratio test table shows the consequences of removing each main effect from the model. In each case (ie for breed and food) there was a significant reduction in the fit of the model. This tells the researcher that there are systematic differences in survival rate as a function of breed (irrespective of food type given) and systematic differences in terms of food (irrespective of breed). Further simplification of the model is therefore not warranted.  A really good answer might note that the Parameter estimates table (PET) conveys largely the same information as the likelihood ratio tests table. The difference is that the parameter estimates table provides a test of the effect within a model containing the other terms, while the likelihood ratio tests provide information on the comparison of two models -- one with the effect and one without. The parameter estimates are more informative in that the effects are broken down into each single df, whereas they are aggregated together in the likelihood ratio tests.  In the PET, there is further confirmation that both breed and food are predictive of survival. The rows marked “breed 4” compare breed 4 with breed 6, whereas   the row marked “breed 5” compares breed 5 with breed 6 (breed 6 is noted as being redundant when compared with itself). The effects are log odds ratios (given by the B parameter) or the odds ratio (given by Exp(B)). The odds ratio (OR) is the change in odds of not surviving (outcome ‘survived=0’, ie died, shown in the table) relative to the other outcome (survived=1, not shown in the table). The log odds ratio is significantly below zero (as indicated by the Wald test: B2/std error2, which is distributed as chi-square with 1 df) for breeds 4 and 5 relative to breed 6. This means that these breeds are likely to die less often. This means that the ORs are significantly below 1, about 0.5 for either breed 4 or 5, relative to breed 6. (This is confirmed by the fact that the 95% confidence intervals for the breed ORs fall below 1.) The PET also shows that odds of dying relative to surviving for food 1 are over 2 times higher than the same odds for food 2 (OR=2.29; significantly above 1). The conclusions from the PET are supported by the cross-tabs which also show: (a) that breeds 4 and 5 are more likely to survive than breed 6; and (b) that food 2 tends to produce higher survival than food 1. So if food 2 was not much more expensive the researcher would recommend this food for breeds 4 to 6, but would avoid breed 6 unless there was a strong demand for this particular breed. Section B Question 4 Topic: ANCOVA (i) [50% of marks for question]  The key point is that researcher A has used an experimental design in which participants were randomised into the different groups, whereas Researcher B has used a quasi-experimental design with naturally-forming groups (schizophrenics and controls). In both studies the researcher wants to compare specific cognitive variables between the groups, but a measure of IQ showed that there were significant IQ differences between the two groups in each study. Both researchers wanted to remove the possible influence of group IQ differences on the specific cognitive performance DV. The problem is that ANCOVA, despite its widespread use as such, is not generally a suitable tool for controlling unwanted differences on a nuisance variable between groups. ANCOVA is most safely used when there is no difference in the mean value of the covariate between the groups (students might draw an overallping circles variance diagram to illustrate this scenario). In the case of researcher B it is generally agreed that it is **unsound** to use ANCOVA for this control function. The reason is that the IQ difference is likely to be part of the intrinsic difference between chronic schizophrenic patients and age-matched controls. By using ANCOVA to equate group IQ differences one ends up evaluating the association between cognitive function and the “group residual” variable (i.e. the group variable after removal of that variance which overlaps with IQ differences). Students may draw another overlapping circles variance diagram to illustrate this situation and the “group-res” variable. In addition, by using ANCOVA in this situation means that one is making the comparison between brighter-than-average chronic schizophrenic patients and a subset of the age-matched controls who were less bright than the sample average. This is probably not the hypothesis which the researcher wanted to test. Examples to illustrate the problem: from Lord --- Research question was “do boys end up with a higher final weight (DV) after following a specific diet than girls (gender=IV) even when including initial weight as a covariate?” Part of the intrinsic gender difference is in weight and so using ANCOVA here would end up comparing the weight gain for relatively light boys with the weight gain for relatively heavy girls. This is not the hypothesis we want to test and there are issues about regression to the mean as well (if we sampled light boys at the start of an experiment then they would be likely to have gained more weight than heavy boys -- or indeed heavy girls -at a later testing point by regression to the mean). from Miller and Chapman -- Imagine using ANCOVA to answer the question would six and eight year old boys differ in weight if they did not differ in height? Once again ANVOVA would create a comparison of short 8 year olds with tall 6 year olds. Do we want to ask that question? However, in the case of researcher A it is probably OK to use ANCOVA in this situation. The reason is that the IQ differences between our randomly assigned experimental groups are almost certainly just due to chance (from random     sampling), and so removing the group differences on the covariate is unlikely to systematically distort the IV by removing part of the IV’s intrinsic variance. However, this should still not be the primary purpose of the ANCOVA but would be an additional consequence of doing the ANCOVA. See part (ii) (ii) [20% of marks] The other -- and primary -- use of ANCOVA is to remove the effects of noise variables in experimental designs. As these are cognitive experiments one might anticipate that performance on the tests would be associated with IQ score and so one would want to remove the influence of IQ performance. The intention of this approach is to increase the power of the statistical tests for the effects of the experimental variables. (There may have already been a diagram to illustrate the increase in power.) However, it is only safe to do this when the groups either do not differ on the covariate or when one can be confident that the covariate differences are due to chance. This means that this use of ANCOVA could safely be advised to researcher A but NOT to researcher B. In the case of researcher A applying ANCOVA to increase experimental power in this way would have the additional benefit of removing the effect of the between-groups IQ difference as well, which arose through chance. (iii) [20% of marks] Half of the marks for the explanation and half for the diagrams, properly labelled. The assumption which the question is alluding to is the homogeneity of regression (HOR) slopes assumption. To apply ANCOVA one removes the effect of the covariate from the DV using a single regression equation across all subjects in the study. To do this meaningfully the relationship between the covariate and the DV has to be the same (within statistical limits) in each group of the study; in other words the regression (slope) must be homogeneous across groups. A really good answer (>70%) will say that when one tests this assumption one includes a group*covariate interaction term in the ANOVA model, and if this interaction is significant then the HOR assumption has been violated. Diagrams should illustrate the parallel linear regressions of covariate on DV for each group separately (Homogeneity), and also illustrate the case where the regression lines are clearly not parallel (homogeneity violated). (iv) [10% of marks for question]. The other use of ANCOVA is in so-called RoyBargmann step-down analyses carried out after finding a significant effect in a MANOVA. (if they just say this give 7 out of 10 for this part of the question) -for full marks they need a bit of elaboration:- The Manova might test for group differences on a set of DVs. To do the step-down analysis, one has to have an a priori priority ordering of DVs (based on theory or other considerations). One begins with the highest priority DV and tests this in a simple ANOVA (adjusting for the total number of comparisons). Then one takes the next highest priority DV is tested via an ANCOVA with the higher priority DV acting as a covariate. The procedure repeats down the priority order with all the higher-order DVs acting as covariates at each step. The intention is to try to understand the relative contribution of the DVs to the MANOVA effect -- one is seeing whether there is an effect for a particular DV even after removing the influence of higher priority DVs (rather like hierarchical or sequential multiple regression). Question 5 Topic: Contrasts This question has a lot of text to read and consequently the actual written answers are not expected to be, nor do they need to be, very lengthy in order to get very good marks. (Model Answer not written as only 1 candidate attempted this question.) Question 6 Topic: MANOVA and Repeated measures (M)ANOVAs (i) [40 % of marks for question] The essence to the answer to this question is that the researcher called John has to decide whether to conduct a single MANOVA comparing the two groups on a composite DV formed from the two state anxiety measures A and B, or to carry out two separate ANOVAs, one for each measure separately. Advantages/disadvantages of these choices:  MANOVAs main advantage (c.f. 2xANOVA) is that it reduces the need to correct for multiple comparisons. This means that the result can be tested at 0.05 significance rather than 0.025 (Bonferroni corrected) for each of the 2 ANOVAs. Obviously, this advantage is greater the more DVs are being used. This, of course, is not an advantage over simply averaging the two DVs and carrying out a single ANOVA. However, relative to the simple averaging method MANOVA does have the advantage in that it creates a weighted (linear) combination of the DVs (2 in this case) which maximally separates the groups.  There are quite subtle and complex power issues in the choice between carrying out one MANOVA and two ANOVAs. Under certain rare situations MANOVA can show differences that would not show up with ANOVA (illustrative diag -- Fig 9.1 in Tab and Fidell). More generally the power of MANOVA depends on the relationship between the DVs being combined (and the number of them), and in many situations ANOVA (of the DVs or a single averaged DV) will be considerably more powerful.  MANOVA is a more complex technique which also carries a number of additional assumptions relative to ANOVA (e.g., homogeneity of variance-covariance matrices across groups rather than just homogeneity of variances in the ANOVA case). (ii) [20% of marks] The researcher called Janet does not want to use a MANOVA as she is interested in the differences in the effect of the group variable (the training programme vs. control) on the two different measures, which tap different aspects of anxiety. (At this point, as with part i, we are still not concerned about the 3 timepoints of anxiety measurement and are working with the average measure over the 3 timepoints, for index A and index B. Do not penalise answers that unnecessarily add the time factor in the answers to parts i and ii.) So Janet just needs to carry out a 2-by-2 mixed-design or split-plot ANOVA with group as the between subjects factor and anxiety index (A or B) as the repeated-measures factor. The most critical result from this analysis would be the interaction between group and index type. If this were significant Janet could conclude that the training programme (relative to the control treatment) had a significantly different effect on the two different types of anxiety. A really complete answer here would note that Janet might then conduct simple main effects contrast analyses for the group factor on each anxiety index separately making an appropriate correction for multiple comparisons. (iii) [20% of marks]. John could extend the ANOVAs (for each anxiety DV separately, or for the single ANOVA on the averaged measure) by adding time as a repeated-measures factor with 3 levels. As the repeated measures factor has >2 levels, he would need to choose between analysing the repeated measures factor using ANOVA or MANOVA methods (and use a test of sphericity to help him to choose between the approaches). In either the MANOVA or ANOVA case he would need to consider the group by time interaction from such analyses to see if the group differences in anxiety varied significantly as a function of time before the exam. Appropriately corrected follow-up contrasts might also be made (at exam time separately). John could similarly extend his MANOVA analysis (on the single composite anxiety DV) by adding a repeated-measures time factor. If he added this factor using a multivariate repeated measures analysis the analysis would become a so-called “doubly-multivariate” analysis. (A really good answer might note that it is possible to have the composite DV to be created by MANOVA processes and analyse the repeated-measures effect on such a composite DV using repeated-measures ANOVA -- this would obviously not then be doubly multivariate.) In both these multivariate cases the group-bytime interaction terms (and subsequent contrasts) would be critical to answering John’s research question. (iv) [20% of marks] Janet could extend her split-plot analysis by adding a further repeated-measures factor (time, with 3 levels, and so she would again need to choose whether to do this using repeated-measures MANOVA or ANOVA). The result would be a 2 (group) x 2 (index type) x 3 (time) analysis with repeated-measures on the last 2 factor. Give some credit for the above but it will be hard to pass this part of the question unless the answer also shows a clear recognition that Janet is proposing a simple linear trend over time on the effect of the group variable. (The pattern of results she is proposing might be illustrated by a graph: with the treatment group having lower anxiety than the control group, but by a decreasing amount as the exam draws near.) So, she should be looking in particular at the outcome of a trend analysis and be particularly concerned with the interaction between the linear trend over time and group. If these effects were more marked for one type of anxiety index rather than the other, then she might find a significant (linear time trend X group X anxiety measure type) 3-way interaction. Section C Question 7 Resampling and Nonparametric methods (not written as only 1 candidate attempted question) Question 8 Classical Test Theory (i) [30% marks] For a variable xobs the following expression is the basis of classical test theory (CTT): xobs = xtrue + x xobs is the observed (i.e. measured) value of a variable x; xtrue is the true value for the variable; and x is the error term associated with the measurement of xobs. The error term is random which means that it has zero mean (i.e. not a systematic bias) and is also uncorrelated with xtrue. Thirdly, the error term is assumed to be drawn from a normal distribution. We can represent the true variation in x and the error term with independent normal variables, denoted G(mu, sigma) where mu is the mean of the normal distribution and sigma is the s.d. (ii) [20% marks] We define 3 variances: σ2obs is the variance associated with the observed score of x; σ2true is the variance associated with the true score of x; and σ2error is the error variance. From the basics of CTT we know that σ2obs = ( σ2true + σ2error) Reliability is defined as the proportion of the observed variance in a measure which reflects true (non-error) variance of the entity being measured. Thus reliability = = σ2true / σ2obs = σ2true /( σ2true + σ2error) Let us assume xobs is the value measured at one time-point and yobs is the value of the same variable measured at another time point. The correlation between xobs and yobs, rxy, is defined as rxy = Covar(xobs, yobs) / sqrt(Var(xobs)*Var(yobs)) where Covar(a, b) is the covariance between a and b. From the information above, the expected values of sample variance of x and y can be written as: Exp{Var(xobs)} = p2 + errx2 Exp{Var(yobs)} = p2 + erry2 Given that the error terms for xobs and yobs, the covariance (shared variance) between xobs and yobs is p2. It follows from the definition of the correlation between measures, and the expected variance results, that we can obtain the following result for the expected value of the correlation: Exp{rxy} = p2 / sqrt((p2 + errx2)*( p2 + erry2)) If we assume that the measures at each of the two time-points have equal reliability then errx2 = erry2 = err2. From this is then follows that Exp{rxy} = p2 / (p2 + err2) i.e. the test-retest correlation will approximate the reliability of the measure. (iii) [20% marks] Now xobs and yobs are two different measures of the same construct The average score, Ave = (xobs + yobs)/2 = p + 0.5*error1 + 0.5*error2 Assume again that the reliability of each measure is the same, i.e. error1=error2=err The variance of 0.5*error1 = 0.5*error2 =0.25*σ2err Thus, as the two error terms are uncorrelated, their variances will sum together. This means that the total error variance associated with Ave is 0.5*σ2err and so the reliability of Ave is σ2p /( σ2p + 0.5*σ2err) which is greater than the reliability of xobs or yobs. (iv) [20% marks] In the researchers model of the tasks, RT in condition 1 is supposed to measure process p1; RT in condition 2 is supposed to measure the combined effect of p1 plus p2; RT in condition 3 supposedly measures the combination of all 3 processes p1, p2 and p3. Assuming the processes combine additively, then we can estimate processes p2 and p3 by subtractions, as the researcher proposed. CTT expressions for each observed RTs in each condition (x1, x2, and x3) can be written: x1 = G1(p1, p1) + G11(0, err1) x2 = G2(p2, p2) + G1(p1, p1) + G12(0, err2) x3 = G3(p3, p3) + G2(p2, p2) + G1(p1, p1) + G13(0, err3) … (17) where G1-G3 are the random normal variables associated with true the values of processes p1-p3 and G11-G13 are the random normal variables (with zero mean) of the associated error terms. Thus, the estimates of p2 and p3 are given by:Est(p2) = G2(p2, p2) + G12(0, err2) - G11(0, err1) Est(p3) = G3(p3, p3) + G13(0, err3) - G12(0, err2) The above expressions show two important properties about these difference measures which makes them unlikely to be able to answer the researcher’s question definitively: They are less reliable than either of the constituent measures (as they contain the error terms from each part of the subtraction, rather than just a single error term), and more importantly, they share a common error term (G12; with opposite sign in each expression). Thus, if the error variance was reasonably large, and even if the processes p2 and p3 were really unrelated, then the difference measures would tend to correlate (negatively). The chances of detecting a real positive correlation between the processes would also be reduced by the artefactual negative correlation caused by the shared error term (and when a positive correlation was detected it would be likely to underestimate the true size of the correlation). (v) [10% marks] The correlation between the two difference measures would be more informative if the reliability of the basic measures was very high (i.e. the size of the error variances was small relative to the true score variances). Give limited credit for answers which say “if variables were measured without error” as this is an unlikely case in psychology and is a single (extreme) case of the general scenario mentioned above. For excellent marks the answer should note that the assumptions of classical test theory are that the error terms associated with each measure are uncorrelated with one another. These assumptions are necessary to the arguments presented in (iv). If the error terms were correlated across RT measures then subtraction would remove not only the common processes between measures (as illustrated in iv above) but also remove the correlated part of the error. In this situation, the error term for the difference measures could be smaller than the error terms of the constituent RT measures from which the difference measures were formed. In this case, correlations between difference measures are more likely to be informative. [Might note whether error terms likely to be correlated across the different RT measures. Assuming participants did all 3 measures in the same testing session then any error processes which were sustained across the whole testing session (e.g. a participant feeling unusually tired or alert on that day etc) would contribute systematically across all RT measures and thus lead to a degree of correlation of the error terms.] Question 9 Topic: Power (i) [20% marks] Changing the type I error rate (alpha) changes power all other things being held constant. If one moves the critical value line to the right (decreasing alpha, the area to the right of this line in the H0 distribution) then the area to the right of this line in the H1 distribution (i.e. power) must also decrease. So to increase power you can use a more lax (larger) alpha value. (This could be illustrated with a second diagram.) (ii) [10% marks] d = expected mean difference between groups / common standard deviation = 2.5/7.5 = 0.33 (iii) [10% marks] 80% or 90% are usual for sample size determinations. Give 0 marks for anyone who mistakenly quotes Cohen’s rules of thumbs for effect sizes (0.2 small; 0.5 medium; 0.8+ large) (iv) [10% marks] Change the following line COMPUTE zalpha=IDF.NORMAL(1-alpha/2,0,1) . to read (1 - alpha,0,1). Give most of the marks for this. For 100% have to explain why it works. The original syntax converts the upper-tail of a two-tailed critical region (1-0.025=0.975) to a z-score to use in the calculations. The revised syntax uses a single tail (at 0.95) to compute the z-value. This makes the z value smaller than that used in the two-tailed case so (appropriately) reduces the number of subjects required, all other things being equal -- see 3rd line of syntax. (v) [10% marks] You would need more participants in total because power is maximal for equal sized groups; with unequal groups you reduce power and therefore need more subjects to detect an effect of a certain size at a certain power. (vi) [10% marks] Non-centrality parameter (NCP) (vii) [20% marks] The NCP is a very simple concept. Under the null hypothesis the sampling distribution of the mean difference between the two groups would have an expected value of zero. If the null hypothesis is false then the true mean difference is non-zero. The NCP is simply the amount by which the sampling distribution is displaced from the central (zero) value, when the alternative hypothesis is true. It is therefore intimately linked to the effect size. The diagram below covers parts (vii) and (viii):- (viii) [10% marks] see above diagram. As this diagram isn’t really any more than that in part (i), the answer must place the line for tcrit (and shading for alpha) in both tails of the distribution here (as a two-tailed test is specifically mentioned). Give only 50% for 1-tailed shading otherwise correct. Drawings can look like normal distributions although ideally should note that the distributions are t distributions (slightly more platykurtic than normal distribution).

Model Answers Summer exams 2003

Related documents

Products

Support

Model Answers Summer exams 2003

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib