CPSY 501: Lecture 11, Nov14 Please download “relationships.sav” Non-Parametric Tests Between subjects: Mann-Whitney U; Kruskall-Wallis; factorial variants Within subjects: Wilcoxon Signed-rank; Friedman’s test Chi-square (χ2) & Loglinear analysis Misc. notes: Exact Tests Small samples, rare events or groups Constructing APA formatted tables Non-parametric Analysis Analogues to ANOVA & regression IV & DV parallels Between-subjects: analogous to one-way ANOVA & t -tests, factorial Within-subjects: analogous to repeated-measures ANOVA Correlation & regression: Chi-square (χ2) & loglinear analysis for categorical Non-parametric “Between-Cell” or “levels” Comparisons Non-parametric tests are based on ranks rather than raw scores: SPSS converts the raw data into rankings before comparing groups These tests are advised when (a) scores on the DV are ordinal; or (b) when scores are interval, but ANOVA is not robust enough to deal with the existing deviations from assumptions for the DV distribution (review: “assumptions of ANOVA”). If the underlying data meet the assumptions of parametricity, parametric tests have more power. Between-Subjects Designs: Mann-Whitney U Design: Non-parametric, continuous DV; two comparison groups (IV); different participants in each group (“betw subjects” cells; cf. t-tests & χ2). Examples of research designs needing this statistic? Purpose: To determine if there is a significant “difference in level” between the two groups “Data Structure” = Entry format: 1 variable to represent the group membership for each participant (IV) & 1 variable representing scores on the DV. Mann-Whitney U in SPSS: Relationships data set Running the analysis: analyze> nonparametric> 2 independent samples> “2 Independent samples” “Grouping var” (IV-had any…) & “Test var” (DVquality…) & “Mann-Whitney U” Note: the “define groups” function can be used to define any two groups within the IV (if there are more than two comparison groups). (If available) to switch from “asymptotic” method of calculation to “exact” analyze> nonparametric> 2 independent samples> “Exact” (requires optional SPSS module; see Notes at end of outline) Mann-Whitney U in SPSS (cont.) Test Statisticsa Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Quality of Commun cation 311.000 1014.000 -1.890 .059 a. Grouping Variable: Had any counselling There was no significant effect of Having Counselling on Quality of Communication, U = 311.00, p = .059, MdnH = 4.0, MdnN = 3.0. [try Descriptive Stats>Explore] Effect Size in Mann-Whitney U Must be calculated manually, using the following formula: Z r= ̶̶̶̶ √N - 1.89 r= ̶̶̶̶ √60 r = -.24499 Use existing research or Cohen’s effect size “estimates” to interpret the meaning of the r score: “There is a small difference between the therapy and no therapy groups, r = -.24” Review & Practice: Mann-Whitney U … There was no significant effect of Having Counselling on Quality of Communication, U = 311.00, p = .06, MdnH = 4.0, MdnN = 3.0. There is a small difference between the therapy and no therapy groups, r = -.24. … Try: Is there a significant difference between spouses who report communication problems and spouses who have not (“Com_prob”), in terms of the level of conflict they experience (“Conflict”)? What is the size of the effect? Between-Subjects Designs: Kruskall-Wallis Design: Non-parametric, continuous DV; two or more comparison groups; different participants in each group (parallel to the one-way ANOVA). Examples of research designs needing this statistic? Purpose: To determine if there is an overall effect of the IV on the DV (i.e., if at least 2 groups are different from each other), while controlling for experiment-wise inflation of Type I error Data Structure: 1 variable to represent the groups in the IV; 1 variable of scores on the DV. Running Kruskall-Wallis in SPSS Running the analysis: analyze> nonparametric> K independent samples> “Kruskall-Wallis H” Enter the highest and lowest group numbers in the “define groups” box. (If available) switch from “asymptotic” method of calculation to “exact” analyze> nonparametric> 2 independent samples> “Exact” (requires optional SPSS module & may require longer computing time) For illustration in our data set: IV = Type of Counselling & DV = Level of Conflict Kruskall-Wallis H in SPSS (cont.) Test Statisticsa,b Chi-Square df Asymp. Sig. Level of Conflict 7.094 2 .029 a. Kruskal Wallis Test b. Grouping Variable: Type of Counselling Type of counselling has a significant effect on participants’ level of conflict, χ2(2) = 7.09, p = .029. Specifically… [report medians & post hoc results…] Following-up a Significant K-W Result If overall KW test is significant, conduct a series of Mann-Whitney tests to compare the groups, but with corrections to control for inflation of type I error. No option for this in SPSS, so manually conduct a Bonferroni correction ( = .05 / number of comparisons) and use the corrected -value to interpret the results. Consider comparing only some groups, chosen according to (a) theory, (b) your research question; or (c) listing from lowest to highest mean ranks, and comparing each group to next highest group Effect Size in Kruskall-Wallis SPSS has no options to calculate effect-size, so it must be done manually (by us…). Instead of calculating overall effect of the IV, it is more useful to calculate the size of the difference for every pair of groups that is significantly different from each other (i.e., from the Mann-Whitney Us): Z r groups = ̶ ̶ ̶ ̶ √n groups Number of participants in that pair of groups Reporting Kruskall-Wallis analyses … Type of Counselling has a significant effect on participants’ level of conflict, χ2(2) = 7.09, p = .029. Specifically, the No Counselling group had higher conflict scores, MdnN = 4.0, than did the Couples Counselling group, MdnC = 3.0, Mann-Whitney U = 176.5, Z = -2.61, p = .009, r = -.37. … also note that Field uses another notation for the K-W: … H(2) = 7.09 … Note: Bonferroni correction: .05 / 3 = .017 Reporting K-W analyses (cont.) Note: The median for the Individual Counselling grp is Mdn = 3.0, & sometimes is not reported we often include this kind of information to give readers a more complete “picture” or description of results. In this case, we would need to give more detailed descriptions about the medians, and that would be too much detail. “Checking” nonparametrics… Comparison of these results with the corresponding ANOVA may be able to lend more confidence in the overall adequacy of the patterns reported. Nonparametric analyses tend to have less power for well-distributed DVs, but they can be more sensitive to effects when the DV is truly bimodal, for instance! “Checking” nonparametrics: EX E.g., Type of Counselling (IV) and Level of Conflict (DV) with a one-way ANOVA (run Levene test & Bonferroni post hoc comparisons) shows us comparable results: F(2, 57) = 4.05, p = .023, with the No Counselling group showing more conflict than the Couples Counselling group, MN = 3.87 and MN = 3.04 (“fitting” well with nonparametric results). “approximations” help check Non-Sig Kruskall-Wallis analyses If the research question behind the analysis is “important,” we may need to explore the possibility of low power or other potential problems. In those cases, a descriptive follow-up analysis can be helpful. See the illustration in the Friedman’s ANOVA discussion below for some clues. Non-Parametric Options for Factorial Between-Subjects Comparisons SPSS does not provide non-parametric equivalents to Factorial ANOVA (i.e., 2 or more IVs at once). One option is to convert each combination of the IVs into a single group, and run a Kruskall-Wallis, comparing groups on the newly created variable. Disadvantages: (a) reduced ability to examine interaction effects; (b) can end up with many groups Advantages: (a) can require “planned comparison” approaches to interactions, drawing on clear conceptualization; (b) can redefine groups flexibly Alternatives: Separate K-W tests for each IV; convert to ranks and do a loglinear analysis; and others. Example: Nonparametric “Factorial” Analysis Research question: How do Marital Status & Type of Counselling relate to conflict levels? Type of Counselling & Marital Status (IVs) and Level of Conflict (DV) Crosstabs for the 2 IVs show that cell sizes are “good enough” (smallest cells, for individual counselling, have 5 & 6 people per group) Nonparametric “Factorial” … 6 Groups: Individual counsel & married; Individ & divorced; Couples counselling & Married; Couples counselling & divorced; No counselling & Married; No counselling & Divorced. Create a new IV, with these 6 groups, coded as 6 separate groups (using Transform > Recode into Different Variables & “If” conditions, for instance) Test Statistics(a,b) Level of Conflict Chi-Square df Asymp. Sig. 8.753 5 .119 a Kruskal Wallis Test b Grouping Variable: Counselling Type & Marital Status The K-W test for the combined variable is not significant. This result suggests that the significant effect for Counselling Type is masked when combined with Marital Status. Test Statistics(a) Level of Conflict Mann-Whitney U 337.000 Wilcoxon W 802.000 Z Asymp. Sig. (2-tailed) -1.752 .080 a Grouping Variable: Marital Status The idea of a “masking effect” of Marital Status shows as well when we test that main effect alone. Interaction issues: A note Divorced-No counselling group assumed to have high conflict levels can be compared some of the other 5 groups with Mann-Whitney U tests, as a “theoretically guided” replacement for interaction tests in non-parametric analysis. The choice depends on conceptual relations between the IVs. More Practice: Kruskall-Wallis Is there a significant effect of number of children (“children,” with scores ranging from 0 to 3) on quality of marital communication (quality)? Within-Subjects 2-cell Designs: Wilcoxon Signed-rank test Requirements: Non-parametric, continuous DV; two comparison cells/times/conditions; related (or the same) participants in both repetitions. this analysis parallels the “paired-samples” t-test Examples of research designs needing this statistic? Purpose: To determine if there is a significant difference between the two times/groups. Data Entry: Separate variables to represent each repetition of scores on the DV. Running Wilcoxon Signed-rank in SPSS Running the analysis: analyze > nonparametric > 2 related samples> “Wilcoxon” Select each pair of repetitions that you want to compare. Multiple pairs can be compared at once (but with no correction for doing multiple tests). (If available) switch from “asymptotic” method of calculation to “exact” analyze > nonparametric > 2 related samples> “Exact” Practise: does level of conflict decrease from pretherapy (Pre-conf) to post-therapy (Conflict)? Running Wilcoxon Signed-rank in SPSS (Cont.) Ran ks N Pre-therapy l evel of Confl ict - Level of Conflict Neg ati ve Ranks Posi tive Ranks Ties Total 1a 13b 46c 60 Mean Rank 4.50 7.73 Sum of Ranks 4.50 100.50 a. Pre-therapy l evel of Confl ict < Level of Confli ct b. Pre-therapy l evel of Confl ict > Level of Confli ct c. Pre-therapy l evel of Confl ict = Level of Confli ct There was a significant reduction in level of conflict after therapy, T = 4.5, p = .002 OR Z = -3.09, p = .002 [effect size added here] Test Statisticsb Z Asymp. Sig. (2-tailed) Pre-therapy level of Conflict Level of Conflict -3.094a .002 a. Based on negative ranks. b. Wilcoxon Signed Ranks Test Effect Size in Wilcoxon Signedrank test Must be calculated manually, using the following formula: Z r= ̶̶̶̶ √N observations - 3.09 r= ̶̶̶̶̶̶ √120 r = -.28 The N here is the total number of observations that were made (typically, participants x 2 when you have two levels of the w/i variable [times], & so on) Wilcoxon Signed-rank: Practice Is there a significant change between pre-therapy levels of conflict (Pre_conf) and level of conflict 1 year after therapy (Follow_conf)? If so, calculate the size of the effect. Note that participant attrition at time 3 (i.e., Follow_conf) changes the total number of observations that are involved in the analysis. EX: “There was a significant reduction in level of conflict after therapy, T = 4.5, p = .002 [OR Z = -3.09, p = .002], r = -.28.” Within-Subjects Designs for 3 or more cells: Friedman’s ANOVA Requirements: Non-parametric, continuous DV; several comparison groups/times; related (or the same) participants in each group. Repeated measures Examples of research designs needing this statistic? Purpose: To determine if there is an overall change in the DV among the different repetition (i.e., if scores in at least 2 repetitions are different from each other), while controlling for inflated Type I error. Data Entry: A separate variable for each repetition of scores on the DV (= each “cell”). Running Friedman’s in SPSS Running the analysis: analyze >nonparametric >K related samples > “Friedman” Move each repetition, in the correct order, into the “test variables” box. (If available) switch from “asymptotic” method of calculation to “exact” analyze> nonparametric > K related samples> “Exact” (requires optional SPSS module Running Friedman’s ANOVA in SPSS (Cont.) Test Statisticsa N Chi-Square df Asymp. Sig. 57 9.065 2 .011 a. Friedman Test There was a significant change in levels of conflict over time, χ2(2, N = 57) = 9.07, p = .011. Specifically… [report of post hoc results goes here] Following-up a Significant Friedman’s Result: Post hoc tests If Friedman’s is significant, one may conduct a series of Wilcoxon Signed-ranks tests to identify where the specific differences lie, but with corrections to control for inflation of type I error. Calculate a Bonferroni correction to the significance level ( = .05 / number of comparisons) and use the corrected -value to guide your interpretation of the results. Reminder: Bonferroni corrections are overly conservative, so they might not be significant. Post hoc Median comparisons following a Friedman’s Test: 2 If you have many levels of the IV (“repetitions,” “times,” etc.) consider comparing only some of them, chosen according to (a) theory or your research question; or (b) time 1 vs. time 2, time 2 vs. time 3, time 3 vs. time 4, etc. Strategy for the No. of Comparisons: For instance, one makes only k – 1 comparisons (max), where k = # of levels of the IV. This suggestion for restricting comparisons is more important if the effect sizes or power are low, or the # of cells is large, thus exaggerating Type II error. Our Example: 3 cells Post hoc analyses: 3 Wilcoxon’s @ .05 overall p. Bonferroni correction is .017 as a significance level cutoff Pre-Post comparison: z = - 3.09, p = .002, r = -.28 Pre-One year later: z = - 2.44, p = .015, r = -.22; Post-to-One year later: ns Thus, improvement after therapy is maintained at the follow-up assessment. REPORT in article… There was a significant change in levels of conflict over time, χ2 (2, N = 57) = 9.07, p = .011. Specifically, conflict reduced from pre-therapy levels at post-therapy observations, Z = -3.09, p = .002, r = -.28, and levels remained below pre-therapy conflict levels one year later, Z = -2.44, p = .015, r = .22. Following-up a Non-significant Friedman’s Result If Friedman’s is not significant, we often need to consider whether the results reflect low power or some other source of Type II error. This holds for any analysis, but we can illustrate the process here. Conduct a series of Wilcoxon Signed-ranks tests, but the focus of attention is on effect sizes, not on significance levels (to describe effects in this sample). If the effect sizes are in a “moderate” range, say > .25, then the results could be worth reporting. Enough detail should be reported to be useful with future meta-analyses. Friedman’s Practice Load the “Looks or Personality” data set (Field) Is there a significant difference between participants’ judgements of people who are of average physical appearance, but present as dull (“ave_none”); somewhat charismatic (“ave_some”), or as having high charisma (“ave_high”)? If so, conduct post-hoc tests to identify where the specific differences lie. Between-Subject Designs Non-Parametric Mann-Whitney / Wilcoxon rank-sum Parametric Independent samples t-test (1 IV, 1 DV) Kruskal-Wallis One-way ANOVA Further post-hoc tests if significant (H or χ2) Use Mann-Whitney (1 IV w/ >2 levels, 1 DV) Further post-hoc tests if F-ratio significant Factorial ANOVA ( ≥2 IVs, 1 DV) Further post-hoc tests if F-ratio significant Within-Subjects Designs Non-Parametric Wilcoxon Signed-rank Parametric Paired/related samples t-test Friedman’s ANOVA Repeated Measures ANOVA Further post-hoc tests if significant Further investigation needed if significant Categorical Data Analyses Chi-square (χ2): Two categorical variables. Identifies whether there is non-random association between the variables. (review) Loglinear Analysis: More than two categorical variables. Identifies the relationship among the variables and the main effects and interactions that contribute significantly to that relationship. McNemar / Cochran’s Q: One dichotomous categorical DV, and one categorical IV with two or more groups. Identifies if there are any significant differences between the groups. McNemar is used for independent IVs, Cochran for dependent IVs. Assumptions & Requirements to Conduct a χ2 Analysis Usually two variables: Each variable may have two or more categories within it. Independence of scores: Each observation/person should be in only one category for each variable and, therefore, in only one cell of the contingency table. Minimum expected cell sizes: For data sets with fewer cells, all cells must have expected frequencies of > 5 cases; for data sets with a larger numbers of cells, 80% of cells (rounded up) must have expected frequencies of > 5 cases AND no cells can be empty. Analyse >descriptives >crosstabs >cells> “expected” Doing χ2 Analysis in SPSS Data entry: It is often better to enter the data as raw scores, not weighted cases (for small data sets). Assess for data entry errors and systematic missing data (but not outliers). Assess for assumptions and requirements of chi-square. (If available, change the estimation method to Exact Test Analyse>descriptives> crosstabs>exact…> “Exact” This requires an additional SPSS module.) Run the main χ2 analysis: Analyse >descriptives >crosstabs >statistics > “chi-square” Types of χ2 Tests Pearson Chi-square: Compares the actual scores you observed in each cell, against what frequencies of scores that you would have expected, due to chance. Yates’ Continuity Correction: Adjustment to Pearson Chi-square, to correct for inflated estimates when you have a 2 x 2 contingency table. However, it can overcorrect, leading to underestimation of χ2. Likelihood Ratio Chi-square (Lχ2): Alternative way to calculate chi-square, based on maximum likelihood methods. Slightly more accurate method of estimation for small samples, but it’s less well known. Interpreting a χ2 Result Ideally, all three types of χ2 will yield the same conclusion. When they differ, the Likelihood Ratio is preferred method (esp. for 2 x 2 contingency tables). Chi-Square Tests Pearson Chi-Square Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value 7.459a 7.656 7.409 b 1.992 Exact Sig. (2-sided) .022 .022 .022 Exact Sig. (1-sided) Point Probability 2 2 Asymp. Sig. (2-sided) .024 .022 1 .158 .216 .108 .053 df 60 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 5.50. b. The standardized statistic is 1.411. There is a sig. association between marital status and type of therapy, Lχ2 (2, N = 60) = 7.66, p = .022, with [describe strength of association or odds ratios]. Effect Sizes in χ2 Strength of Association: There are several ways to convert a χ2 to run from 0 to 1, to be able to interpret it like a correlation (r not r2): (a) Phi Coefficient (accurate for 2x2 designs only); (b) Cramer’s V (accurate for all χ2 designs); (c) Contingency Coefficient (estimates can be too conservative… normally, do not use this one). Analyse >descriptives >crosstabs >statistics> “Phi and Cramer’s V” Odds Ratio: For a 2 x 2 contingency table, calculate the odds of getting a particular category on one variable, given a particular category on the other variable. Must be done “by hand” (see p. 694 of text). From χ2 to Loglinear Analysis Χ2 is used commonly with two categorical variables. Loglinear Analysis is usually recommended for three or more categorical variables. Preview: Loglinear Analysis … …Used as a parallel “analytic strategy” to factorial ANOVA when the DV is categorical rather than ordinal (but a conceptual DV is not required) So the general principles also parallel those of multiple regression for categorical variables Conceptual parallel: e.g., Interactions = moderation among relationships. Journals: Loglinear Analysis Fitzpatrick et al. (2001). Exploratory design with 3 categorical variables. Coding systems for session recordings & transcripts: counsellor interventions, client good moments, & strength of working alliance Therapy process research: 21 sessions, male & female clients & therapists, expert therapists, diverse models. Abstract: Interpreting a study Client ‘good moments’ did not necessarily increase with Alliance Different interventions fit with Client Information good moments at different Alliance levels. “Qualitatively different therapeutic processes are in operation at different Alliance levels.” Explain each statement & how it summarizes the results. Research question What associations are there between WAI, TVRM, & CGM for experts? Working Alliance Inventory (Observer version: low, moderate, high sessions) Therapist Verbal Response Modes (8 categories: read from tables) Client Good Moments (Significant Information-I, Exploratory-E, AffectiveExpressive-A) (following T statements) Analysis Strategy Loglinear analysis starts with the most complex interaction (“highest order”) and tests whether it adds incrementally to the overall model fit (cf. the idea of ΔR2 in regression analysis). The 3-way interaction can be dropped in a couple of analyses, but not in one. Interpretation thus focuses on 2-way interactions & a 3-way interaction. Sample Results Exploratory Good Moments tended to occur more frequently in High Alliance sessions (2-way interaction). Alliance x Interventions interaction: Structured interventions (guidance) take place in Hi or Lo Alliance sessions, while Unstructured interventions (reflection) are higher in Moderate Alliance sessions (see figure). Explain: What does it mean? Alliance x Interventions interaction: Structured interventions (guidance) take place in Hi or Lo Alliance sessions, while Unstructured interventions (reflection) are higher in Moderate Alliance sessions describes shared features of “working through” and “working with” clients, different functions of safety & guidance. Explaining “practice”: (a) Explain: Exploratory Good Moments tended to occur more frequently in High Alliance sessions (2-way interaction). (b) How does the article show that this effect is significant? Relatively “easy” questions. Appendixes Slides with information on Exact tests A slide on ways to format tables in accord with APA style Exact tests: for small samples & rare occurrences Assumptions. Asymptotic methods assume that the dataset is reasonably “large,” and that tables are densely populated and well balanced. If the dataset is small, or tables are sparse or unbalanced, the assumptions necessary for the asymptotic method have not been met, & we can benefit by using the “exact” or the Monte Carlo methods. EXACT TESTS Exact. The probability of the observed outcome or an outcome more extreme is calculated exactly. Typically, a significance level less than 0.05 is considered significant, indicating that there is some relationship between the row and column variables. Moreover, an exact test is often more appropriate than an asymptotic test because randomization rather than random sampling is the norm, for example in biomedical research. Monte Carlo Estimates Monte Carlo Estimate. An unbiased estimate of the exact significance level, calculated by repeatedly sampling from a reference set of tables with the same dimensions and row and column margins as the observed table. The Monte Carlo method allows you to estimate exact significance without relying on the assumptions required for the asymptotic method. This method is most useful when the data set is too large to compute exact significance, but the data do not meet the assumptions of the asymptotic method. From SPSS help files Example. Asymptotic results obtained from small datasets or sparse or unbalanced tables can be misleading. Exact tests enable you to obtain an accurate significance level without relying on assumptions that might not be met by your data. For example, results of an entrance exam for 20 fire fighters in a small township show that all five white applicants received a pass result, whereas the results for Black, Asian and Hispanic applicants are mixed. A Pearson chi-square testing the null hypothesis that results are independent of race produces an asymptotic significance level of 0.07. This result leads to the conclusion that exam results are independent of the race of the examinee. However, because the data contain only 20 cases and the cells have expected frequencies of less than 5, this result is not trustworthy. The exact significance of the Pearson chi-square is 0.04, which leads to the opposite conclusion. Based on the exact significance, you would conclude that exam results and race of the examinee are related. This demonstrates the importance of obtaining exact results when the assumptions of the asymptotic method cannot be met. The exact significance is always reliable, regardless of the size, distribution, sparseness, or balance of the data. SPSS exact stats SPSS has Exact stats options for NPAR TESTS and CROSSTABS commands You may have to use syntax commands to use this option. See SPSS help files for further information. Formatting of Tables (for Project, Thesis, etc.) Use the “insert table” and “table properties” functions of MSWord to build your tables; don’t do it manually. General guidelines for table formatting can be found on pages 147-176 of the APA manual. Additional tips, instructions and examples for how to construct tables can be down-loaded from the NCFR web-site: http://oregonstate.edu/~acock/tables/ In particular, pay attention to the column alignment article, for how to get your numbers to align according to the decimal point (which is where it should be).