Chi Square Tests Chapter 17 Nonparametric Statistics • A special class of hypothesis tests • Used when assumptions for parametric tests are not met – Review: What are the assumptions for parametric tests? Assumptions for Parametric Tests • Dependent variable is a scale variable interval or ratio – If the dependent variable is ordinal or nominal, it is a nonparametric test • Participants are randomly selected – If there is no randomization, it is a non-parametric test • The underlying population distribution is normal – If the shape is not normal, it is a non-parametric test When to Use Nonparametric Tests • When the dependent variable is nominal – What are ordinal, nominal, interval, and ratio scales of measurement? • Used when either the dependent or independent variable is ordinal • Used when the sample size is small • Used when underlying population is not normal Limitations of Nonparametric Tests • Cannot easily use confidence intervals or effect sizes • Have less statistical power than parametric tests • Nominal and ordinal data provide less information • More likely to commit type II error – Review: What is type I error? Type II error? Chi-Square Test for Goodness-of-Fit • Nonparametric test when we have one nominal variable – These variables, also called "attribute variables" or "categorical variables," classify observations into a small number of categories. A good rule of thumb is that an individual observation of a nominal variable is usually a word, not a number – Examples of nominal variables include sex (the possible values are male or female), genotype (values are AA, Aa, or aa), or ankle condition (values are normal, sprained, torn ligament, or broken) Chi-Square Test for Goodness-of-Fit • Nonparametric test when we have one nominal variable – Measurement v. Nominal: Imagine recording each observation in a lab notebook. If you record a number (width, height, speed, errors) it’s a measurement, if you record a label it’s nominal (sex, popularity, beauty) Examples of When to Use Chi-Square • The observed counts of numbers of observations in each category are compared with the expected counts, which are calculated using some kind of theoretical expectation, such as a 1:1 sex ratio, or 4:2:1 population density in following example. • Example: looking at an area of shore that had 59% of the area covered in sand, 28% mud and 13% rocks (4:2:1); if seagulls were standing in random places, your null hypothesis would be that 59% of the seagulls were standing on sand, 28% on mud and 13% on rocks (4:2:1). Examples of Chi-Square • Does the count of the Observed match the count of the Expected? • Mendel crossed peas that were heterozygotes for Smooth/wrinkled, where Smooth is dominant. The expected ratio in the offspring is 3 Smooth: 1 wrinkled. He observed 423 Smooth and 133 wrinkled. • The expected frequency of Smooth is calculated by multiplying the sample size (556) by the expected proportion (0.75) to yield 417. The same is done for green to yield 139. The number of degrees of freedom when an extrinsic hypothesis is used is the number of values of the nominal variable minus one. In this case, there are two values (Smooth and wrinkled), so there is one degree of freedom. • The result is chi-square=0.35, 1 d.f., P=0.557, indicating that the null hypothesis cannot be rejected; there is no significant difference between the observed and expected frequencies. Examples of Chi-Square • Does the count of the Observed match the count of the Expected? • Mannan and Meslow (1984) studied bird foraging behavior in a forest in Oregon. In a managed forest, 54% of the canopy volume was Douglas fir, 40% was ponderosa pine, 5% was grand fir, and 1% was western larch. They made 156 observations of foraging by red-breasted nuthatches; 70 observations (45% of the total) in Douglas fir, 79 (51%) in ponderosa pine, 3 (2%) in grand fir, and 4 (3%) in western larch. The biological null hypothesis is that the birds forage randomly, without regard to what species of tree they're in; the statistical null hypothesis is that the proportions of foraging events are equal to the proportions of canopy volume. The difference in proportions is significant (chi-square=13.593, 3 d.f., P=0.0035). How the test works • The test statistic is calculated by taking an observed number (O), subtracting the expected number (E), then squaring this difference. The larger the deviation from the null hypothesis, the larger the difference between observed and expected is. 2 ( O E ) 2 Χ E • Squaring the differences makes them all positive. Each difference is divided by the expected number, and these standardized ratios are summed: the more differences between what you would expect and what you get the bigger the number. Chi-Square Test for Goodness-of-Fit • The six steps of hypothesis testing Question: Are the best soccer players born early rather than later in the year ? 1. Identify 2. State the hypotheses 3. Characteristics of the comparison distribution 4. Critical values 5. Calculate 6. Decide Chi-Square Test for Goodness-of-Fit • The six steps of hypothesis testing 1. Identify Pop. Distribution & Assumptions a) Two populations, one distribution that matches expected outcomes and another where distribution matches observed outcomes. E.g., great soccer players are born evenly throughout year, great soccer players born in first half of year. b) Comparison distribution is chi-square c) First assumption, variable of interest is nominal, birth month. Second, independence of observation, that is each observation fits in only one category, no soccer player has two birth months. Third, random selection of pop ( in this case, they are only Germans, and only elite). Fourth, large enough sample size, ideally 5 times the number of cells (in this case N= 56 > 10 (2 x 5). Chi-Square Test for Goodness-of-Fit • State the hypotheses: does the Observed count of elite soccer player Birth Months match the Expected count of elite soccer player Birth Months • Null: Match • Alternative: No match Chi-Square Test for Goodness-of-Fit • Characteristics of the comparison distribution df 2 k 1 df 2 2 1 1 • Only two categories of soccer players Chi-Square Test for Goodness-of-Fit • Critical values Chi-Square Test for Goodness-of-Fit • Calculate Chi-Square Test for Goodness-of-Fit • Calculate Making a Decision A more typical Chi-Square • Evenly divided expected frequencies – Can you think of examples where you would expect evenly divided expected frequencies in the population? • Chi-square test for independence – Analyzes 2 nominal variables – The six steps of hypothesis testing 1. Identify 2. State the hypotheses 3. Characteristics of the comparison distribution 4. Critical values 5. Calculate 6. Decide The Cutoff for a Chi-Square Test for Independence The Decision Cramer’s V (phi) • The effect size for chi-square test for independence X2 ( N )( df row/ column ) Graphing Chi-Squared Percentages Relative Risk > We can quantify the size of an effect with chi square through relative risk, also called relative likelihood. > By making a ratio of two conditional proportions, we can say, for example, that one group is three times as likely to show some outcome or, conversely, that the other group is one-third as likely to show that outcome. Adjusted Standardized Residuals > The difference between the observed frequency and the expected frequency for a cell in a chi-square research design, divided by the standard error; also called adjusted residual. Formulae (O E ) Χ E 2 2 df row krow 1 df column kcolumn 1 df X 2 (df row )( df column ) Determining the Cutoff for a Chi-Square Statistic