Statistics Portfolio Section A: Issues in Research Design Q1. What are differential carry-over effects? [50 words] The difference in magnitude of the residual (or carry-over) effect of one condition on another in one testing order, compared to another testing order (Pesarin & Salmaso, 2010). For example, the residual effect of condition 1 on condition 2 is different from the residual effect of condition 2 on condition 1. Q2. Table 1 contains information on the complete counterbalancing sequences for a withinparticipants design that has 6 conditions. The arrangement is digram-balanced. Explain what this means. [20 words] Each condition appears once in each ordinal position and precedes and follows every other condition once (Myers & Hansen, 2012). Q3. How does a cyclic counterbalancing arrangement differ to one that is digram-balanced? [40 words] A cyclic counterbalancing arrangement, unlike a digram-balanced arrangement, does not ensure that each condition precedes and follows every other condition once (Myers & Hansen, 2012). For example, in a six-condition cyclic counterbalancing arrangement, all conditions are preceded and followed by the same condition five times. Q4. State one advantage and one disadvantage of using a cyclic counterbalancing arrangement. [45 words] As every possible order grows factorially with the number of conditions used, a cyclic counterbalancing arrangement is more practical to use (Girden, 1992). However, all conditions are preceded and followed by the same condition more than once, doing little to reduce carry-over effects (Girden, 1992). Q5. State one advantage and one disadvantage of using a digram-balanced arrangement. [45 words] As each condition precedes and follows every other condition once, a digram-balanced arrangement significantly reduces carry-over effects (Girden, 1992). However, the reduction of carry-over effects is not absolute as each condition does not precede and follow every other condition equally often at each ordinal position (Girden, 1992). Q6. Why should a researcher be interested in effect sizes? [4 arguments: 60 words] Using null hypothesis testing alone is arbitrary, dependent upon sample size and severely limiting to what can be concluded (Cohen, 1990). Moreover, the null hypothesis is never true in social science data (Cohen, 1990). Effect size provides a measure of the magnitude of an observed effect (Field, 2005). Effect size can be standardised, and thus compared across different measures (Field, 2005). Effect size allows one to estimate the likely size of an effect in the target population (Field, 2001). Q7. Supporting your comments with references, explain what statistical power refers to and why this is important for psychological research. [60 words] Power is the ability to detect a significant effect, if one exists (Cohen, 1990). Power is important because it is influenced by a number of factors (Cohen, 1990), including effect size (e.g. a smaller effect size requires exponentially more power), the accepted probability level (e.g. a smaller probability level requires greater power) and the number of participants tested (e.g. the larger the sample, the greater the power). Section B: Factorial ANOVA and Interactions Q1. Inspect Table 2. The data are taken from an experiment examining visual search performance in which participants searched for a target among distractors. The within-participants factors were trial type (the target was either present or absent from the display) and target type (digits or letters) and the between-participants factor was working memory group. Are the patterns of means suggestive of any main effects? Explain your answer. [60 words] Visual search performance was considerable quicker in target present trials (M = 1403.75) relative to target absent trials (M = 1887.25), when participants searched for letters (M = 1579.25) as opposed to digits (M = 1711.75) and for participants with high working memory (M = 1362) relative to those with low working memory (M = 1929). The means are suggestive of a main effect for all three independent variables. (See next page) Q2. The data on Blackboard are taken from an experiment that examined visual search performance in men and women. Participants searched for a target item among distractors in displays with a small set size (12 items in the display) and a large set size (24 items in the display). The target was present in the display on 50% of trials but absent on the remaining trials. The dependent variable was search times recorded in milliseconds (ms). Using the descriptive data available, produce an APA table of means and standard deviations. Remember to be consistent with decimal places and to include an appropriate table number and title. Table 1. Mean (Standard Deviation) Visual Search Times (in ms) as a Function of Trial Type, Set Size and Gender Trial Type Set Size (12 items) Set Size (24 items) Male Target Present Trials 993.26 (155.49) 1223.51 (145.10) Target Absent Trials 1698.25 (252.71) 2116.26 (191.66) Female Target Present Trials 994.88 (152.62) 1256.45 (181.84) Target Absent Trials 1510.25 (229.03) 1985.38 (265.04) Q3. Using the SPSS data output on Blackboard, report the results of the study (all main effects and interactions) in a narrative. You should include F ratios, MSEs, p values and effect sizes. State the exact p value unless this is < .001. You should refer to all effects by variable name (i.e. the main effect of set size; the interaction between set size and trial type) and ensure that you make it clear whether each effect is significant. [160 words] The ANOVA revealed a significant main effect for trail type [F (1, 38) = 715.459, MSE = 28224.170, p< .001, ηp2 = .950] and set size [F (1, 38) = 422.092, MSE = 11360.995, p< .001, ηp2 = .917] respectively. However, no significant main effect was found for gender [F (1, 38) = 1.706, MSE = 118475.617, p> .199, ηp2 = .043]. Following on, no significant interaction was found between trial type, set size and gender [F (1, 38) = .380, p> .541, ηp2 = .010]. A significant interaction was found between trail type and set size [F (1, 38) = 91.919, MSE = 4381.081, p< .001, ηp2 = .708] and trail type and gender [F (1, 38) = 11.064, p< .002, ηp2 = .226]. However, no significant interaction was found between gender and set size [F (1, 38) = 1.722, p> .197, ηp2 = .043]. Q4. Using the SPSS data file on Blackboard run the appropriate post hoc tests to examine the trial type x set size interaction and report the results of the post hocs in an appropriate narrative. [80 words] Within-subjects t-tests revealed that participant’s visual search times were significantly quicker between the small set size (12 items) and the large set size (24 items) in both the present target trials [t (39) = 15.636, p< .001] and the absent target trials [t (39) = 19.098, p< .001]. Moreover, visual search times were quicker for present relative to absent target trials for both the small set size [t (39) = 19.413, p< .001] and the large set size [t (39) = 25.603, p< .001]. Q5. Summarise the trial type x set size interaction using the descriptive information where necessary to support your explanation. [120 words] The reduction in visual search times was greater between absent and present target trials for the large set size (MD = 810.85, SD = 200.30) relative to the small set size (MD =610.18, SD = 198.79). This suggests that trial type has a greater impact on visual search times for the larger set size, relative to the small set size. Moreover, there is a greater reduction in visual search times between the small and large set size for absent target trials (MD = 446.58, SD = 147.89) relative to present target trials (MD = 245.91, SD = 99.47). This suggests that set size has a greater impact on visual search times for absent relative present target trials. Q6. The post hoc tests for the trial type x gender interaction have been conducted for you (see SPSS output on Blackboard). With reference to this information and the Bonferroni correction, explain what the interaction shows. Use descriptive information where necessary to support your explanation. [120 words] Accounting for the Bonferroni correction, there was no significant difference in visual search times between males and females on both absent target trials (p> .031) and present target trials (p> .720). However, females (M = 1747.82, SD = 237.04) were quicker than males (M = 1907.26, SD = 211.34) on absent target trials, males (M = 1108.39, SD = 142.08) were marginally quicker than females (M = 1125.66, SD = 160.19) on present target trials. Thus, a small, albeit non-significant effect, of gender was found only for absent target trial visual search times. Visual search times were significantly less for present relative to absent target trials for both females (p< .001) and males (p< .001). The difference in visual search times between absent and present target trials was greater for females (MD = 798.87, SD = 192.60) relative to males (MD = 622.16, SD = 139.11), suggesting that, relative to males, trial type has a greater impact on female visuals search times. Q7. Write three to four sentences detailing what the data in Figure 1 suggests about the potential results (see Appendix 1). [80 words] The data is illustrative of a main effect for task type: mean completion times for both women (20 s) and men (80 s) for motor task 1 are considerably less than for women (50 s) and men (90 s) for motor task 2. The data is illustrative of a main effect for gender: mean completion times are lower for women (20 s) relative to men (80 s) for motor task 1 and lower for women (50 s) relative to men (90 s) for motor task 2. The data is illustrative of an interaction: there is a bigger difference in mean completion times between women and men for motor task 1 (60 s) relative to motor task 2 (40 s). Section C: Linear and Logistic Regression Q1. Comment briefly on the assumptions for both hierarchical linear regression and logistic regression. [40 words] According to Field (2009) hierarchical linear regression requires: normality, linearity and homoscedasticity of residuals, the criterion variable to be interval/ratio level, significant outliers to be removed and very little or no multicolinearity between predictors. At least 15 cases per predictor is recommended (Stevens, 1996). For logistic regression, 50 cases per predictor is recommended and variables must be dichotomous and categories must be mutually exclusive and exhaustive (Field, 2009). Q2. The SPSS data output on Blackboard shows a hierarchical regression analysis in which the criterion was spelling performance in children. Non-verbal reasoning skills (Raven’s) were entered first, followed by two language measures (TROG & expressive vocabulary) and phoneme awareness (spoonerisms and alliteration) was entered in the final model (see Appendix A for a summary of the tests). Present an APA formatted table to show the B, SE of B and Beta values for the data shown. Remember to number the table and to provide an appropriate table title. Table 2. Hierarchical Multiple Regression with Spelling Performance as the Criterion and Non-verbal Reasoning Skills (Raven’s), Language Measures (Test of Reception of Grammar & Expressive Vocabulary) and Phoneme Awareness (Spoonerisms & Alliteration) as Predictors. Predictor b SE b β Block 1 Constant Raven’s 15.236 .556 2.977 .135 .411** Block 2 Constant Raven’s Test of Reception of Grammar Expressive Vocabulary -.899 .465 .298 -.166 6.460 .135 .116 .168 .343** .384* -.146 Block 3 Constant Raven’s Test of Reception of Grammar Expressive Vocabulary Spoonerisms Alliteration 8.417 .236 .101 -.246 .521 .190 4.307 .101 .085 .120 .073 .122 .174* .130 -.216* .639** .139 R2 = .169, Adjusted R2 = .159 (Block 1); R2 = .247, Adjusted R2 = .220 (Block 2); R2 = .630, Adjusted R2 = .607 (Block 3); R2 change = .169 (Block 1); R2 change = .079 (Block 2); R2 change = .383 (Block 3). ** p < .001; * p < .05. Q3. Using the SPSS output on Blackboard, report all the results of the multiple regression analysis (see lecture and guidance notes). Provide an interpretation for these results including a statement of what beta values represent and therefore what they show for this particular study. [300 – 350 words] The ANOVA revealed that the regression model was significant for block 1 [F (1, 84) = 17.042, MSE = 19.161, p< .001], block 2 [F (3, 82) = 8.983, MSE = 17.771, p< .001] and block 3 [F (5, 80) = 27.227, MSE = 8.958, p< .001]. The proportion of variance in spelling explained by block 1 was 15.9% (R2 = .169; Adjusted R2 = .159), this rose to 22% for block 2 (R2 = .247; Adjusted R2 = .220) and rose considerably more to 60.7% for block 3 (R2 = .630; Adjusted R2 = .607). The change statistics showed that the proportion of variance explained by block 1 (16.9%) was significant [F Change (1, 84) = 17.042, p< .001]. The additional 7.9% of variance explained by block 2 [F Change (2, 82) = 4.387, p< .017] and the additional 38.3% of variance explained by block 3 [F Change (2, 80) = 41.336, p< .001] were both significant. Thus, the inclusion of Phoneme Awareness measures in block 3 explained substantially more variance in spelling on top of the combined 22% of variance explained by Non-verbal Reasoning Skills and Language Measures in block 2. On block 1, Raven’s emerged as a significant predictor of spelling (β = .411, t = 4.128, p< .001) and remained so on block 2 (β = .343, t = 3.439, p< .001) and block 3 (β = .174, t = 2.342, p< .022). The Test of Reception of Grammar emerged as a significant predictor on block 2 (β = .384, t = 2.579, p< .012), but not on block 3 (β = .130, t = 1.188, p> .238). Expressive Vocabulary was not a significant predictor on block 2 (β = -.146, t = -.989, p> .325), but was on block 3 (β = -.216, t = -2.056, p<. 043). On block 3, Spoonerisms was a significant predictor (β = .639, t = 7.089, p< .001), but Alliteration was not (β = .139, t = 1.560, p> .123). Beta represents the number of standard deviations the criterion variable will change as a result of a 1 standard deviation change in a predictor. Analysing block 3, Spoonerism is the most important predictor of spelling performance; for every 1 standard deviation change in the Spoonerism score, spelling performance improves by .639 of a standard deviation. The second most important predictor is Expressive Vocabulary, which is inversely correlated with spelling performance. The other 3 predictors, in order of importance, are Raven’s, Alliteration and Test of Reception of Grammar. (See next page) Q4. Following the format of the table used in the lecture, present an APA table showing the relevant statistics for the logistic regression analysis. In this analysis, the criterion was spelling group (poor/good speller) with phoneme awareness (spoonerisms) entered first and then TROG entered next (i.e. hierarchical). The table should be numbered and have an appropriate title. Table 3. Binary Logistic Regression with Spelling Group as the Criterion and Spoonerisms and Test of Reception of Grammar as Predictors. 95% CI for EXP (B) Model B SE Lower EXP (B) Upper Constant -.140 .216 Block 1 Spoonerisms .330* .075 1.202 1.392 1.610 Block 2 Spoonerisms Test of Reception of Grammar .363* -.068 .081 .052 1.228 .843 1.438 .934 1.684 1.035 .870 Hosmer and Lemeshow (final model): χ² (8) = 11.192, p> .191; R2 = .361 (Cox & Snell); R2 = .481 (Nagelkerke); Model: χ² (2) = 38.449, p< .001. *p< .001. Q5. Using the SPSS output on Blackboard, report all the results of the logistic regression analysis (see lecture and guidance notes). Provide an interpretation for these results including a comparison of the models and an explanation of the relevant statistical values. [200 words] The binary logistic regression revealed that prior to any coefficients being entered in to the model, the 2LL value was 118.802 and the overall classification was 53.5% [EXP (B) = .870, p> .518]. When Spoonerisms was entered on block 1, the -2LL value was 82.128 and the overall classification was 81.4%. The model was significant [χ² (1) = 36.674, p< .001; Spoonerisms: EXP (B) = 1.302, CI: 1.202 1.610]. The percentage of variance explained in spelling group membership on block 1 was 34.7% (R2 = .347) according to Cox and Shell and 46.7% (R2 = .467) according to Nagelkerke. On block 2, the -2LL value was 80.353 and the overall classification was 82.6%. The overall model was significant [χ² (2) = 38.449, p< .001], but the block 2 statistic revealed a non-significant effect [χ² (1) = 1.775, p> .183; Spoonerisms: EXP (B) = 1.438, CI: 1.228 – 1.684; Test of Reception of Grammar: EXP (B) = .934, CI: .843=1.035]. The percentage of variance explained on block 2 was 36.1% (R2 = .361) according to Cox and Shell and 48.1% (R2 = .481) according to Nagelkerke. The introduction of Spoonerisms on block 1 considerably improved the fit of model, as evident by the significance of the model, the notable reduction in the -2LL value and the increase in the percentage of cases that can be correctly classified as good or poor spellers. The introduction of Test of Reception of Grammar in block 2 only slightly improved the fit of the model, as evident by the non-significance of the model, the marginal decrease in the -2LL value and the marginal increase in the percentage of cases that can be correctly classified as good or poor spellers. In block 2, Spoonerisms is the most important predictor of spelling group membership; for every 1 unit increase in Spoonerism score, individuals are 1.438 times more likely to be classified as a good speller. For every 1 unit increase in Test of Reception of Grammar, individuals are marginally more likely to be classified as a poor speller. Q6. Supporting your answer with references, provide a brief discussion of the influence that residuals play in multiple regression analysis and the extent to which they may compromise the integrity of the results. [150 words]. According to Gonzalez’s (2013) overview of residuals, outliers are data points with large residual values (the deviation of a particular point from the regression line [its predicted value]) that can ‘push’ or ‘pull’ the regression line in one direction, leading to bias regression coefficients. As a result and in line with the assumptions of multiple regression, one should omit or rescore outliers. The exclusion of just one outlier can have a substantial impact on the regression model. Detecting outliers is difficult in multiple regression due to the need to analyse all variables concurrently. Cook’s Distance (Cook, 1977) analyses the role of potential outliers and is the measure of the difference between the regression one gets by including a data point relative to omitting said data point, it is influenced by the residual and leverage (the amount by which the predicted value would change if the observation was shifted one unit in the y-direction) of predictors. References Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312. Cook, R.D. (1977). Detection of influential observations in linear regression. Technometrics, 19(1), 1518. Field, A.P. (2001). Meta-analysis of correlation coefficients: a Monte Carlo comparison of fixed- and random-effects methods. Psychological Methods, 6 (2), 161-180. Field, A.P. (2005). Meta-analysis. In J. Miles & P. Gilbert (Eds.). A handbook of research methods in clinical and health psychology (pp. 295-308). Oxford: Oxford University Press. Field, A.P. (2009). Discovering statistics using SPPS: and sex and drugs and rock ‘n’ roll (3rd edition). London: Sage Girden, E. (1992). ANOVA: Repeated Measures. California: SAGE. Gonzalez, R. (2013). Residual Analysis and Multiple Regression (lecture notes). Retrieved 10 December, 2014, from http://www-personal.umich.edu/~gonzo/coursenotes/file7.pdf Myers, A., & Hansen, C. (2012). Experimental Psychology. California: Thomson/Wadsworth. Pesarin, F., & Salmaso L. (2010). Permutation tests for complex data. Chichester: John Wiley and Sons Ltd. Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd edition). Mahwah: Lawrence Erlbaum Associates.