Week 10 Chapter 10 - Hypothesis Testing III : The Analysis of Variance (ANOVA) & Chapter 11 – Hypothesis Testing IV: Chi Square Chapter 10 Hypothesis Testing III : The Analysis of Variance (ANOVA) In This Presentation The basic logic of analysis of variance (ANOVA) A sample problem applying ANOVA The Five Step Model Limitations of ANOVA Post hoc techniques Chapter 8 Group 1 = All Education Majors Population = Penn State University RS of 100 Education Majors Chapter 9 Group 1 = All Males in Population RS of 100 Males Population = Pennsylvania Group 2 = All Females in Population RS of 100 Females In this Chapter Group 1 = All Protestants in Population RS of 100 Protest. Population = Pennsylvania Group 2 = All Catholics in Population Group 2 = All Jews in Population RS of 100 Catholics RS of 100 Jews Basic Logic ANOVA can be used in situations where the researcher is interested in the differences in sample means across three or more categories Examples: How do Protestants, Catholics and Jews vary in terms of number of children? How do Republicans, Democrats, and Independents vary in terms of income? How do older, middle-aged, and younger people vary in terms of frequency of church attendance? Basic Logic ANOVA is used when: The independent variable has more than two categories The dependent variable is measured at the interval or ratio level Basic Logic Can think of ANOVA as extension of t test for more than two groups The t test can only be used when the independent variable only has two categories ANOVA asks “are the differences between the samples large enough to reject the null hypothesis and justify the conclusion that the populations represented by the samples are different?” (p. 243) The Ho is that the population means are the same: Ho: μ1= μ2= μ3 = … = μk Basic Logic If the Ho is true, the sample means should be about the same value If the Ho is true, there will be little difference between sample means If the Ho is false, there should be substantial differences between categories, combined with relatively little difference within categories The sample standard deviations should be low in value If the Ho is false, there will be big difference between sample means combined with small values for sample standard deviations Basic Logic The larger the differences between the sample means, the more likely the Ho is false – especially when there is little difference within categories When we reject the Ho, we are saying there are differences between the populations represented by the sample Example 1 We have administered the support for capital punishment scale to a sample of 20 people who are equally divided across five religious categories Example 1 Hypothesis Test of ANOVA Step 1: Make assumptions and meet test requirements Independent random samples Interval-ratio level of measurement Normally distributed populations Equal population variances Example 1 Step 2: State the null hypothesis Ho: μ1 = μ2 = μ3 = μ4 = μ5 H1: at least one of the populations means is different Step 3: Select the sampling distribution and establish the critical region Sampling distribution = F distribution Alpha = 0.05 dfw = 15, dfb = 4 F(critical) = 3.06 Example 1 Step 4: Compute test statistic F = 2.57 Step 5: Make a decision and interpret the results F(critical) = 3.06 F(obtained) = 2.57 The test statistic does not fall in the critical region, so fail to reject the null hypothesis – support for capital punishment does not differ across the populations of religious affiliations Limitations of ANOVA 1. 2. 3. Requires interval-ratio level measurement of the dependent variable and roughly equal numbers of cases in the categories of the independent variable Statistically significant differences are not necessarily important The alternative (research) hypothesis is not specific – it only asserts that at least one of the population means differs from the others Use post hoc techniques for more specific differences USING SPSS On the top menu, click on “Analyze” Select “Compare Means” Select “One Way ANOVA” ANOVA in SPSS Analyze / Compare means / One-way ANOVA ANOVA dialog box ANOVA output ANOVA Total output Between Groups Within Groups Total Sum of Squares 309.600 293.900 603.500 df 2 27 29 Mean Square 154.800 10.885 F 14.221 Sig . .000 Chapter 11 Hypothesis Testing IV: Chi Square In This Presentation Bivariate (crosstabulation) tables The basic logic of Chi Square The terminology used with bivariate tables The computation of Chi Square with an example problem The five step model Limitations of Chi Square The Bivariate Table Bivariate tables: display the scores of cases on two different variables at the same time The Bivariate Table Note the two dimensions: rows and columns. What is the independent variable? What is the dependent variable? Where are the row and column marginals? Where is the total number of cases (N)? Chi Square Chi Square can be used: with variables that are measured at any level (nominal, ordinal, interval or ratio) with variables that have many categories or scores when we don’t know the shape of the population or sampling distribution Basic Logic Independence: “Two variables are independent if the classification of a case into a particular category of one variable has no effect on the probability that the case will fall into any particular category of the second variable” (p. 274) Basic Logic Chi Square as a test of statistical significance is a test for independence Basic Logic Chi Square is a test of significance based on bivariate, crosstabulation tables (also called crosstabs) We are looking for significant differences between the actual cell frequencies OBSERVED in a table (fo) and those that would be EXPECTED by random chance or if cell frequencies were independent (fe) Computation of Chi Square Example RQ: Is the probability of securing employment in the field of social work dependent on the accreditation status of the program? NULL HYP: The probability of securing employment in the field of social work is NOT dependent on the accreditation status of the program. (The variables are independent) HYP: The probability of securing employment in the field of social work is dependent on the accreditation status of the program. (The variables are dependent) Computation of Chi Square Example Computation of Chi Square Expected frequency (fe) for the top-left cell: fe row m arg inal column m arg inal 40 55 22 N 100 Computation of Chi Square Example Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distribution-free Step 2: State the Null Hypothesis Ho: The variables are independent Another way to state the Ho, more consistently with previous tests: H0: fo = fe H1: The variables are dependent Another way to state the H1: H1: fo ≠ fe Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = Chi Square, χ2 Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)= 1 χ2 (critical) = 3.841 Step 4: Calculate the Test Statistic χ2 (obtained) = 10.78 Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 10.78 The test statistic falls in the critical region, so reject Ho There is a significant relationship between employment status and accreditation status in the population from which the sample was drawn Interpreting Chi Square The chi square test tells us only if the variables are independent or not It does not tell us the pattern or nature of the relationship To investigate the pattern, compute percentages within each column and compare across the columns Computation of Chi Square Are the homicide rate and volume of gun sales related for a sample of 25 cities? (Problem 11.4, p. 295) The bivariate table shows the relationship between homicide rate (columns) and gun sales (rows) This 2 x 2 table has 4 cells Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distributionfree Step 2: State the Null Hypothesis Ho: The variables are independent Another way to state the Ho, more consistently with previous tests: Ho: fo = fe H1: The variables are dependent Another way to state the H1: H1: fo ≠ fe Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = χ2 Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)=1 χ2 (critical) = 3.841 Step 4: Calculate the Test Statistic χ2 (obtained) = 2.00 Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 2.00 The test statistic is not in the critical region, fail to reject the Ho There is no relationship between homicide rate and gun sales in the population from which the sample was drawn Interpreting Chi Square Cities low on homicide rate were high in gun sales, and cities high in homicide rate were low in gun sales As homicide rates increase, gun sales decrease We found this relationship not to be significant, but it does have a clear pattern Homicide Rate Gun Sales Low High High 8 (66.7%) 5 (38.5%) 13 Low 4 (33.3%) 8 (61.5%) 12 12 (100%) 13 (100%) 25 Limitations of Chi Square 1. 2. 3. Difficult to interpret when variables have many categories Best when variables have four or fewer categories With small sample size, cannot assume that Chi Square sampling distribution will be accurate Small sample: High percentage of cells have expected frequencies of 5 or less Like all tests of hypotheses, Chi Square is sensitive to sample size As N increases, obtained Chi Square increases With large samples, trivial relationships may be significant It is important to remember that statistical significance is not the same as substantive significance Chi Square in SPSS Step 4: computing the test statistic in SPSS Chi Square in SPSS Step 5: making a decision and interpreting the results of the test overweight_1 * urban Crosstabulation Chi-Square Tests urban overweight_1 0 1 Total Count Expected Count Count Expected Count Count Expected Count 0 1 329 385.7 155 98.3 484 484.0 468 411.3 48 104.7 516 516.0 Total 797 797.0 203 203.0 1000 1000.0 Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value 79.699b 78.301 82.696 79.619 df 1 1 1 1 Asymp. Sig. (2-sided) .000 .000 .000 Exact Sig. (2-sided) Exact Sig. (1-sided) .000 .000 .000 1000 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 98. 25. Result (χ2 obtained) Chi Square in SPSS Symmetric Measures Nominal by Nominal N of Valid Cases Contingency Coefficient Value .272 1000 Approx. Sig. .000 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. The nominal symmetric measures indicate both the strength and significance of the relationship between the row and column variables of a crosstabulation.