Chi-square, Goodness of fit, and Contingency Tables

Chi-square, Goodness of fit, and Contingency Tables What is the χ2 distribution  Basically a distribution of squared differences Useful for detecting categorical differences    Calculate the χ2 test statistic= (observed-expected)2/expected Degrees of freedom = number of categories -1 Look up χ2 value for that degree of freedom and chosen alpha value. If test statistic > table value, then significant 1.Two sided test: find the column corresponding to α/2 in the table for upper critical values and 1. reject the null hypothesis if the test statistic is greater than the tabled value. 2.Use 1 - α /2 in the table for lower critical values and reject null if the test statistic is less than the tabled value. 2.Upper one-sided test: find column corresponding to α in upper critical values table. If test statistic greater, reject. Also useful for model fitting    Assume you have a fit a model to some data and have some residual errors left over. You want to check if residuals are normally distributed. You bin them in a histogram Estimate proportions of residuals in each, compare to actual data Model Fitting Example    Consider a classic genetics experiment. The offspring of a cross between the F1 brassicas was 53 dark green and 11 yellow. If the plants are heterozygous for color the ratio of 3 dark green to 1 yellow would be expected. Observed numbers (O) Expected numbers (E) O-E (O-E)2 (O-E)2 / E Dark Green Yellow Total 53 11 64 48 16 64 5 25 25/48 = 0.52 -5 25 25/16 = 1.56 0 2.08 Compound Hypotheses and Directionality     With multiple categories, compound hypotheses are possible H0 Pr(cat 1) = 0.25, Pr(cat 2) = 0.50 and Pr(cat 3) = 0.75 HA: one of the above not the case Where there are 2 categories, a “directional alternative” is possible Directional Alternatives   Only in the case of “dichotomous variables” – two categories, effectively. Step 1: Check Directionality of trend    If not, p-value > 0.5 by necessity If so, proceed to step 2 The P-value is half what it would be if HA were non directional Directional Alternative Example     Two football teams records are compared against the average number of wins by an NFL team per year, 9. Team 1 won 14 games this year and several players were caught doping with HGF. Team 2 won 11 games this year and tested clean. Is there evidence that doping increased the number of wins by team 1? Contingency Tables    Use χ2 test statistic as above, but Calculate expected values for each element in table from E=(row total)*(column total)/Grand Total; Df =1 2x2 Contingency Tables  Can indicate either   Two independent samples with a dichotomous observed variabled One sample with two dichotomous observed variables Female Male Tot(col) HIV test 9 8 17 No HIV test 52 51 103 Tot (row) 61 59 120 Relation to Independence of data     You can interpret contingency tables in terms of conditional probabilities Pr(HIV test | female)= 9/61 Pr(female | HIV test) = 9/17 Test becomes H0 : Likelihood of taking and HIV test is independent of sex Female Male Tot(col) HIV test 9 8 17 No HIV test 52 51 103 Tot (row) 61 59 120 Rxk contingency tables  Same as above, but degrees of freedom = (r-1)*(k-1). Corrections to the Chi-Squared Test    It is a requirement that a chi-squared test be applied to discrete data. Counting numbers are appropriate, continuous measurements are not. Assuming continuity in the underlying distribution distorts the p value and may make false positives more likely. Frank Yates proposed a correction to the chi-squared formula. Adding a small negative term to the argument. This tends to increase the p-value, and makes the test more conservative, making false positives less likely. However, the test may now be *too* conservative. Additionally, chi squared test should not be used when the observed values in a cell are <5. It is, at times not inappropriate to pad an empty cell with a small value, though, as one can only assume the result would be more significant with no value there.

Chi-square, Goodness of fit, and Contingency Tables

Related documents

Products

Support

Chi-square, Goodness of fit, and Contingency Tables

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib