Lecture 10 – Single Factor Designs Factor New name for nominal/categorical independent variable In ANOVA literature, IV's are called Factors. Values of factor are called Levels of Factors So, a Factor is a nominal (aka categorical) independent variable. One Factor design: Research involving only one nominal IV, i.e., one factor Three general types of design 1. Between subjects, no matching Different groups of participants. No attempt to match 2. Between-subjects, participants matched. With two groups, fairly easy. With more than two groups, gets harder. Matching variable must be correlated with dv. 3. Within-subjects design, same people serve at all levels of the factor. These are sometimes called repeated measures designs. This should seem familiar, because it’s the same trichotomy we encountered in the Comparing Two Groups lecture. Single Factor Designs - 1 2/8/2016 The Various Tests Comparing K Research Conditions by Design and Dependent Variable Characteristics Design Independent Groups / Between Subjects Design Matched Participants or Within-subjects / Repeated Measures Dependent Variable Characteristics Interval / Ratio Ordinal Categorical Dependent Variable Dependent Variable Dependent Variable US: One way Between Subjects Analysis of Crosstabulation Variance Kruskal-Wallis with Chi-square Test Skewed: Kruskal-Wallis Repeated Measures ANOVA Friedman ANOVA Advanced analyses If this looks familiar, it should. It’s the same table presented in Lecture 9 on Two group comparisons, except that it now covers comparisons of two or more groups. Single Factor Designs - 2 2/8/2016 One-Way Between-subjects Analysis of Variance Comparing the means of 3 or more groups. Suppose there are three groups – Group A, Group B, and Group C. Why not just perform multiple t-tests. t-test comparing Mean of Group A with Mean of Group B t-test comparing mean of Group A with Mean of Group C t-test comparing mean of Group B with Mean of Group C The above 3 t-tests exhaust the possible comparisons between 3 groups. Problem with the above method: It’s very difficult to compute the correct p-value for the tests, which makes it difficult to use in hypothesis testing. What is needed is a single omnibus test. Such a test was provided by Sir R. A. Fisher. It’s based on the following idea Single Factor Designs - 3 2/8/2016 Consider 3 populations whose means are all equal: Now consider samples from each of those populations o o o o o o o o o o o o o o o o o o o o o Finally, consider the means of the three samples . . . Sample means not very variable, so N* S2X-bar about o oo equals σ2. Now think about the variability in the above dots. Within-group variability There is variability of the individual scores in each sample. The variance of scores within each sample would be an estimate of the population variance, σ2. So the average of the 3 sample variances, (S12 +S22+S32)/3, would be a really good estimate of σ2. Between-group variability But there is more variability in the above situation. There is variability of the sample means, S2X-bar. The variability of the means S2X-bar would be equal to σ2/N from our study of the standard error of the mean. Equivalently, N* S2X-bar would be approximately equal to σ2. That is, N times the variance of the sample means would be about equal to the population variance. So, we have two estimates of the population variance in the above scenario. 1) The estimate based on the average of the variances of the three samples. 2) The estimate based on the variance of the sample means. When the samples are from populations with equal means, the two estimates will be about equal. Single Factor Designs - 4 2/8/2016 Now consider 3 populations whose means are NOT equal. Now consider samples from each of these populations o o o o o o o o o o o o o o o o o o o o o Now consider the means of those samples . . . O o o Sample means ARE very variable, so N* S2X-bar is > σ2. Note that the variability of the individual scores within each sample is about the same as above. BUT, note that when the population means are not equal, the means of samples from those populations are quite variable, much more so than they were when the population means were equal. This means that in this case S2X-bar would be LARGER THAN σ2/N. Equivalently, it means that N* S2X-bar would be LARGER THAN σ2. This suggests that N* S2X-bar is an indicator of whether or not the population means are equal or not. If the means were equal, N* S2X-bar would be equal to σ2. But if the means were not equal, N* S2X-bar would be larger than σ2 Since the variability of individual scores within samples was the same in both situations, Fisher proposed the ratio: N* S2X-bar / Mean of the individual sample variances as a test statistic. N*S2X-bar N times variance of sample means F = --------------------------- = ---------------------------------Mean of sample variances Mean of sample variances If the population means are equal, F will be about equal to 1. If the population means are not equal, F will be larger than 1. Fisher computed the sampling distribution of F and proposed it as an omnibus test of the equality of population means. (He did not name the statistic F after himself.) Single Factor Designs - 5 2/8/2016 Specifics of the One-Way Between-subjects Analysis of Variance The research design employs two or more independent conditions (no pairing). The groups are identified by different levels of a single factor. The dependent variable is interval / ratio scaled. The distribution of scores within groups is unimodal and symmetric. Variances of the populations being compared are equal. Hypotheses: H0: All population means are equal H1: At least one inequality is present. Test Statistic: Estimate of population variance based on differences between sample means F= Estimate of population variance based on differences between individual scores within samples Likely values if null is true Values around 1 Likely values if null is false. Values larger than 1 Single Factor Designs - 6 2/8/2016 Example problem Michelle Hinton Watson, a 95 graduate of the program interviewed employees and former employees of a local company, Company X (cxfile.sav in In-class Datasets). A set of 7 questions assessing overall job satisfaction was given to all respondents. She interviewed 107 persons who had left the company prior to her contacting them. She also interviewed 49 persons who left the company within a year after she contacted them, and 51 persons who were still working for the company a year after the initial contact. The interest here is on whether the three groups are distinguished by their average job satisfaction – persons who had previously left the company, persons who left after the initial interview or persons who stayed with the company after the initial interview. Specifying the analysis Analyze -> General Linear Model -> Univariate Click on the Plot… button to specify a graph of means. Click on the Post Hoc… button to specifyPost Hoc comparisons of means Click on the Options… button to specify descriptive statistics and estimates of effect size. Specifying a plot of means – Single Factor Designs - 7 2/8/2016 Specifying Post Hocs If the overall F statistic is significant, post hoc comparisons are often used to determine exactly which pairs of means are significantly different. Post Hoc tests vary on a dimension of liberalness vs conservativeness. Liberal Test Tends to find differences, some of them Type I errors Most Powerful – able to find small differences For Affordable Care Act Conservative Test Tends to not find differences, even those that exist Least powerful, unable to see small differences Supports Big Business The LSD test is the most liberal. The Scheffé test is the most conservative. The Tukey’s-b test is a compromise between the above two extremes. LSD --------------------------------------------------- Tukey-b------------------------------------------------------ Scheffé Strategy: If a conservative test rejects the null, most likely a difference. If a liberal test fails to reject the null, most likely not a difference. Single Factor Designs - 8 2/8/2016 The Options: Specifying Printing of Effect Size and Observed Power: Single Factor Designs - 9 2/8/2016 The output The p-value of .000 for Levene’s test of equality of error variance means that we should be particularly cautious when interpreting the comparisons of means that follow. We should inspect distributions for each group. We should also consider a nonparametric test of equality of location, which I will do. Single Factor Designs - 10 2/8/2016 For this semester, ignore the “Corrected Model” and the “Intercept” lines. Partial Eta squared: This is the effect size for one way ANOVA. See effect sizes for ANOVA in Power lecture for a characterization. Recall: Eta2 = .01 for small; Eta2=.059 for medium; Eta2 = .138 for large. Eta2 = .262 means we have a SuperSized with Fries effect size. Observed Power Observed power is power if the population means were as different as the sample means. The value, 1.000, means that if the population means were as different as the sample means, and many independent tests of the null hypothesis of equality of population means were run, the F would be significant in about 100% of those tests. Start here on 11/17/15. Single Factor Designs - 11 2/8/2016 vs. Homogeneous Subsets If two means are in the same column, they are not significantly different. If two different means are only in different columns, they ARE significantly different. Single Factor Designs - 12 2/8/2016 Profile Plots A picture is worth 1000 words. Single Factor Designs - 13 2/8/2016 Kruskal-Wallis One way Analysis of Variance by Ranks The research design employs two or more independent conditions (no pairing). The groups are distinguished by different levels of the independent variable. The dependent variable is ordinal or better. This test is also used when the DV is interval/ratio scaled but the distributions within groups are skewed or have unequal variances between groups. Hypotheses: H0: All population locations are equal H1: At least one inequality is present. From Howell, D. (1997). Statistical Methods for Psychology. 4th Ed. p. 658. "It tests the hypothesis that all samples were drawn from identical populations and is particularly sensitive to differences in central tendency." Test Statistic: Kruskal-Wallis H statistic. The probability distribution of the H statistic when the null is true is the Chi-square distribution with degrees of freedom equal to the number of groups being compared minus 1. Example problem (Same problem as above). The interest here is on whether the three groups are distinguished by their overall job satisfaction – persons who had previously left the company, persons who left after the initial interview or persons who stayed with the company after the initial interview. This test is appropriate since the variances were not homogenous in the above analysis. Single Factor Designs - 14 2/8/2016 Specifying the analysis Analyze -> Nonparametric tests -> Legacy Dialogs -> K Independent Samples Put the name(s) of the dependent variable(s) in this box. Click on this button to invoke the dialog box below. Put the minimum group no. and maximum group no. in the two boxes. The Results Kruskal-Wallis Test Ranks finaldest ovsat N Mean Rank .00 Left Co. before Q given 107 74.59 1.00 Left Co. after Q given 49 142.67 2.00 Stayed w. Co. 51 128.55 Total 207 Test Statisticsa,b ovsat Chi-Square 54.996 df Asymp. Sig. Ranks are from smallest to largest, so group 0 appears to have the smallest scores. 2 This is the probability of a chi-square value as large as the obtained value of 54.996 if the null hypothesis of equal distributions were true. .000 a. Kruskal Wallis Test b. Grouping Variable: finaldest Alas, there are no post-hoc tests of which I’m aware for the Kruskal-Wallis situation. Some investigators will follow up with Mann-Whitney U-tests, using that test as a substitute for a true post-hoc test. Single Factor Designs - 15 2/8/2016 Chi-Square Analysis of a Dichotomous Dependent Variable The research design employs two or more independent conditions (no pairing). The groups are distinguished by categories of an independent variable or factor. The dependent variable is categorical. This test may used when the DV is interval/ratio scaled or ordinal but you are uncomfortable with the numeric values. But you definitely should not categorize a variable that can be analyzed as a quantitative variable. You should categorize only in emergencies. It represents the most conservative assumption you can make about your dependent variable, that its values are only categorizable into High and Low. Hypotheses: H0: Percentages in each category are equal across populations H1: At least one inequality is present. Test Statistic: Two-way chi-square. If the null is true, its probability distribution is the Chi-square distribution with degrees of freedom equal to the product of (No. of DV categories - 1) x (No. of Groups -1). Example problem Same problem as above. Each OVSAT score was categorized as 0 if it was less than or equal to the median of all the OVSAT scores or 1 if it was greater than the overall median. This is called performing a median split. The categorized variable is called SATGROUP. frequencies variable=ovsat /sta=median. Frequencies Statistics ovsat N Valid Missing 207 0 3.8571 Median recode ovsat (lowest thru 3.8571=0)(else=1) into satgroup. frequencies variable=satgroup. satgroup Valid .00 1.00 Total Frequency 101 106 Percent 48.8 51.2 Valid Percent 48.8 51.2 207 100.0 100.0 Cumulative Percent 48.8 100.0 Single Factor Designs - 16 2/8/2016 Specifying the analysis Analyze -> Descriptive Statistics -> Crosstabs Put the dependent variable in the Row(s) box. Put the independent variable in the Column(s) box. Click on the "Cells" button to invoke this dialog box. Check "Column" percentages. Single Factor Designs - 17 2/8/2016 Click on the “Statistics” button and check the Chi-square box. The Results Crosstabs All three tests – analysis of variance, Kruskal-Wallis, and chi-square resulted in the same conclusion, suggesting that there are significant differences between the satisfaction scores of the three groups. It appears that members of group 0 – those that had left prior to the interview – were generally least satisfied. Single Factor Designs - 18 2/8/2016 One way Repeated Measures ANOVA In Mike Clark’s thesis, three versions of the Big Five questionnaire was given to participants under three instructional conditions . . . 1) Honest: Respond honestly 2) Dollar: Respond honestly, but participants who score highest will be entered into a drawing 3) Instructed: Respond to maximize your chances of obtaining a customer service job. These three conditions are called the Honest, Dollar, and Instructed - H, D, and I - conditions respectively. The question here concerns the mean score on the Conscientiousness scale across the three conditions. If the participants were not paying attention to the instructions, then we’d expect the means to be equal. But if participants faked in the second two conditions, we’d expect an increase in Conscientiousness scores across the three conditions. The data are in G:\MdbR\Clark\ClarkDataFiles\ClarkAndNewDataCombined070710.sav The data should be in 3 columns – an H column, a D column, and an I column. Analysis Menu sequence: Analyze -> General Linear Model -> Repeated Measures Enter a name for the Repeated Measures factor here Enter the number of levels of the factor. Click the [Add] button. Highlight the name of one of the variables to be included in the analysis and then click on the [>] button. Single Factor Designs - 19 2/8/2016 Click on the [Plots] button in the main dialog box and put the name of the repeated measures factor as the Horizontal Axis of the plot. Click on the [Options] button in the main dialog box and check the three boxes shown below. Single Factor Designs - 20 2/8/2016 The Output General Linear Model [DataSet1] G:\MdbR\Clark\ClarkDataFiles\ClarkAndNewDataCombined070710.sav Within-S ubj e cts Factor s Me asure : ME ASURE_1 con dit 1 De pend ent Va riable hc 2 dc 3 ic De scriptiv e Statis tics Me an Std . Deviatio n 4.4 029 .92 630 hc N 24 9 dc 4.7 979 1.0 5333 24 9 ic 5.4 779 .96 747 24 9 The GLM procedure first prints Multivariate Tests of the hypothesis of no difference between means. These tests are robust with respect to violations of the various assumptions of the analysis, although less powerful than the tests below, if those tests meet the assumptions. Multiv a riate Tests c Eff ect con dit Va lue .47 1 Pil lai's T race F Hyp othe sis df 110 .015 b 2.0 00 Error df 247 .000 .00 0 Pa rtial E ta Sq uared .47 1 No ncent . Pa rame ter 220 .031 Sig . a Ob serve d Power 1.0 00 Wil ks' La mbd a .52 9 110 .015 b 2.0 00 247 .000 .00 0 .47 1 220 .031 1.0 00 Ho tellin g's Trace .89 1 110 .015 b 2.0 00 247 .000 .00 0 .47 1 220 .031 1.0 00 Ro y's La rgest Root .89 1 110 .015 b 2.0 00 247 .000 .00 0 .47 1 220 .031 1.0 00 a. Co mput ed using a lpha = .05 b. Exa ct sta tistic c. De sign: Intercept Wit hin S ubje cts De sign: cond it Mauchly’s test should be nonsignficant. If it is significant, as it is below, then the most powerful test, labeled “Sphericity Assumed” below should not be reported. Ma uchly's Te st of Sphe ricity b Me asure : ME ASURE_1 Ep silon Wit hin S ubje cts Ef fect con dit Ma uchly's W .90 9 Ap prox. Ch i-Squ are 23. 571 df Sig . 2 .00 0 Gre enho useGe isser .91 7 a Hu ynh-Feldt .92 3 Lower-b ound .50 0 Te sts the null hypo thesi s that the e rror covari ance matrix of t he orthono rmal ized transf orme d dep ende nt va riable s is p ropo rtiona l to a n ide ntity matri x. a. Ma y be used to ad just th e de grees of fre edom for the a verag ed te sts of signi fican ce. Correct ed te sts are disp laye d in t he Te sts of Withi n-Sub jects Effe cts ta ble. b. De sign: Intercept Wit hin S ubje cts De sign: cond it Single Factor Designs - 21 2/8/2016 Since Mauchly’s test was significant, only the last 3 tests below should be used. It happens, though, that for these data, all tests give the same result, so in this particular case, it doesn’t make a difference. Tes ts of Within-Subj ec ts Effects Me asure : ME ASURE_1 Sp hericity Assume d Type III Sum of Squa res 14 7.241 Gre enho use-Geisser 14 7.241 1.8 33 80 .321 12 1.599 .00 0 .32 9 22 2.909 1.0 00 Hu ynh-Feldt 14 7.241 1.8 46 79 .756 12 1.599 .00 0 .32 9 22 4.487 1.0 00 Lo wer-b ound 14 7.241 1.0 00 14 7.241 12 1.599 .00 0 .32 9 12 1.599 1.0 00 Error(co ndit) Sp hericity Assume d 30 0.297 49 6 .60 5 Gre enho use-Geisser 30 0.297 45 4.622 .66 1 Hu ynh-Feldt 30 0.297 45 7.840 .65 6 Lo wer-b ound 30 0.297 24 8.000 1.2 11 So urce con dit df 2 Me an S quare 73 .620 F 12 1.599 .00 0 Pa rtial E ta Sq uared .32 9 No ncen t. Pa rame ter 24 3.197 Sig . a Ob serve d Po wer 1.0 00 a. Co mput ed using a lpha = .05 Tes ts of Within-Subj ec ts Contras ts Me asure : ME ASURE_1 So urce con dit con dit Lin ear Type III Sum of Squa res 14 3.866 1 14 3.866 F 21 7.487 3.3 74 1 3.3 74 6.1 42 16 4.050 24 8 .66 1 13 6.246 24 8 .54 9 Qu adrat ic Error(co ndit) Lin ear Qu adrat ic IgnoreMe for class an this S quare df .00 0 Pa rtial E ta Sq uared .46 7 No ncen t. Pa rame ter 21 7.487 .01 4 .02 4 6.1 42 Sig . a Ob serve d Po wer 1.0 00 .69 5 a. Co mput ed using a lpha = .05 Tes ts of Betw een-Subj ects Effec ts Me asure : ME ASURE_1 Tra nsformed Varia ble: A vera ge So urce Inte rcep t Error Typ e III Sum of S qua res 178 83.5 68 df 419 .784 Ignore for this situation. 1 Me an S quare 178 83.5 68 248 1.6 93 F 105 65.2 48 Sig . .00 0 Pa rtial E ta Sq uared .97 7 No ncent . a Pa rame ter Ob serve d Power 105 65.2 48 1.0 00 a. Co mput ed using a lpha = .05 Profile Plots Again, worth 1000 words. Mean Conscientiousness scores increased significantly from 1 (Honest) to 2 (Dollar) to 3 (Instructed) conditions. The participants responded to the instructions in the expected fashion. Single Factor Designs - 22 2/8/2016