The chi-squared distribution, the goodness-of

The 2 test • Sections 19.1 and 19.2 of Howell • This section actually includes 2 totally separate tests • goodness-of-fit test • contingency table analysis • Each has its own point, and requires different things • Only thing in common - same formula • Keep them separate in your mind! 1 Return to hypothesis testing • We can test statistical significance, no prob • need p and alpha (and a computer) • Sometimes, no computer available • can use tables to test statistical significance • Little more work, but works just as well • This method uses the same logic as the p value method 2 Testing Ho without a PC • • • • • The strategy (new stuff is underlined) Step1: Set up Ho, Ha and decide on alpha Step 2: Calculate the statistic and df Step 3: Get the critical value from the table Step 4: Compare critical value to statistic 3 4 Step 1 • Set up Ho and alpha - already know • Ha - the alternative hypothesis • If Ho is false, what do we believe then? (Ha) • Ha represents the opposite of Ho • eg. if Ho: r = 0 then Ha: r  0 • If we reject Ho (because its false), then we must accept Ha as being true. 5 Step 2 • Nothing different • use appropriate formulas for stat and df! Step 3 • Get the critical value • from the table (back of Howell) • Use alpha and df to look it up • Critical value: the value of your statistic at which p = alpha (the edge of the rejection region) 6 Step 4 • Compare your stat to the crit value: • Ignore any minuses (look only at value) • If your calculated stat is more than the crit value, then p < alpha (ie. significance!) • The test is significant if calculated value is greater than the crit value • Reject the Ho, and accept the Ha. • Pretty easy! 7 8 Example • Lets use an r value: • We get r = 0.61 with df = 10, alpha = 0.05 • Is this significant? • Critical value: use df and alpha on table D2 in Howell (significant values of the correlation coefficient) • for alpha = 0.05 and df = 10, crit value = 0.576 9 Example • Now we have the calculated value and crit value • Calculated = 0.61 • Critical = 0.576 • Check: • if calculated > critical, reject Ho • 0.61 > 0.576, so we reject Ho • The result is statistically significant! Return to 2 • Note: 2 only works with discrete data • What is the point of 2 ? • Goodness-of-fit: Used to see if data matches a hypothetical distribution • Are there the same number of men as women? • Are about 25% of South Africans unemployed? • Contingency table analysis (independence test): used as a correlation for discrete data (are the variables related?) 10 Goodness-of-fit 2 • Used to test a model distribution of data • Have an idea of how data should be distributed • eg. There should be 60% brunettes, 40% blondes • Collect data, check to see if our idea (model) is supported by the data • Does the data fit the model? • Before starting a goodness-of-fit test, always be sure of what the model is 11 Creating a model • We put our expectations as percentages on a table • One cell of the table for each possible value of the variable • Each cell has the percentage of observations we expect 12 13 Example model • We expect 40% brunettes, 60% blondes, so Blondes 60% Brunettes 40% 14 Observed scores and Expected scores • Strategy: Want to see if our observation matches our model • We collect some data (Observed scores) • We work out what the data would look like if our model were correct (Expected scores) • Compare the two: do the observed scores show the same pattern as the expected scores? 15 Converting the model to expected scores • We have our model as percentages Blondes 60% Brunettes 40% • We must now convert % to actual values (frequencies) - use n (number of observations) If we collected 134 observations, then Blondes Brunettes (60/100) x 134 = 80.4 (40/100) x 134 = 53.6 16 Converting % to frequency • To do this: • (percentage / 100) x n • Keep the decimals! • You cannot work with % for 2 - you must have frequencies (number of observations) Beginning the 2  analysis • To begin, need Ho • For , 2 it is always “observed data = expected data” • Need to state the model (in %) • Collect the data • Create an expected freq table (using your model and n) • Calculate 2 to see if the observed = expected 17 18 2 Formula O = observed score E = expected score 19 2 formula, step by step • Step 1: for each subject, that subject’s O minus that subject’s E • Step 2: for each subject, square the step 1s. • Step 3: for each subject, take their step2, and divide it by that subject’s E • Step 4: sum all the step 3’s 20 Table method for 2 • Use the following columns: • O E O-E (O-E)2 (O-E)2 E Add up here 21 Degrees of freedom (df) • The df for goodness-of-fit tests is easy to calculate: • df = k-1 • k is the number of possible values for your variable (categories) • using males and females k = 2 • using coke, pepsi, sprite k = 3 • using easy, moderate, hard, awesome k =4 22 Worked example 1 • We suspect that there is a 50%/50% gender distribution at UCT. We observed 147 people, 68 male, 79 female. Do we really have a 50%/50% distribution? • Set up (step 1) • Ho: Distribution is 50%/50% • Ha: Distribution is not 50%/50% • alpha = 0.05 23 Example: work out expected scores • (What would we have seen if Ho were true?) • Model: • Males 50% • Females 50% • • • • Convert to scores n = 147 Males expected: (50/100) x 147 = 73.5 Females expected: (50 / 100) x 147 = 73.5 24 Example: O and E values • Now we have our values Value O E Male 68 73.5 Female 79 73.5 O-E (O-E)2 (O-E)2 E 25 Example - Work out the columns Value O E O-E (O-E)2 (O-E)2 E Male 68 73.5 -5.5 30.25 0.411 Female 79 73.5 5.5 30.25 0.411 26 Example - Add up the values in the last column Value O E O-E (O-E)2 (O-E)2 E Male 68 73.5 -5.5 30.25 0.411 Female 79 73.5 5.5 30.25 0.411 0.823 27 Example - df • Now we have our 2 value: 0.823 • Is it statistically significant? (does the model explain the population?) • Need the critical value for this! • Degrees of freedom: k-1 • 2 categories (male, female) • so df = 1 28 Example: critical value • What is the critical value for our male/female example? • Df: k = 2 (male and female), so df = 1 • For df = 1 and alpha = 0.05, the table says: • crit = 3.84 • To be significant, our value must be more that 3.84 29 Example: conclusions • Calculated < critical • (0.823 < 3.84), so the Ho is true • (this means: it is true that “distribution is 50%/50%) • Conclusion: it seems that at UCT there are as many males as there are females. Interpreting 2  30 findings • 2 findings are interpreted a little differently • False Ho (significance) means we cannot accept the model (the model is wrong for this population) • True Ho (non-significance) means we must assume that the model applies to this population • This is the case for goodness-of-fit tests 31 Contingency table analysis with 2 • Pearson’s product moment allowed us to establish a relationship between 2 continuous variables • doesn’t work for discrete data (categories) • Eg. “is there are relationship between gender and owning a dog or cat?” (2 discrete variables) • Contingency table analysis is used for this • can work with nominal variables 32 Something old, something new • Quite similar to goodness-of-fit tests • Work out the expected values • Use the chi square formula • Work out df • get a critical value from the table • Differences: • Slightly different O table • New way of working out expected values • New way of working out df 33 Observed values • For each person, we ask 2 questions (2 vars) • “are you male/female” and “do you have a dog or a cat” owners) • • • • • (let’s assume we sample only pet We end up with: Subject Gender 1 M 2 M 3 F Pet D C D etc. 34 O table • We need to convert those data into a frequency table that looks like: GENDER Male Female Dog PET Cat Filling in the O table 35 • Each cell has only one number in it • number of people fitting that condition GENDER Male Female Dog 1 2 Cat 3 4 PET In cell 1: number of people who are Male AND have a dog In cell 2: number of people who are Female AND have a dog In cell 3: number of people who are Male AND have a cat etc 36 The finished O table • An o table usually looks like: GENDER Male Female Dog 36 34 Cat 7 32 We had 7 males with cats PET We had 34 females with dogs This table is a 2x2 table - 2 rows (pet) and 2 columns (gender) 37 Notes about O tables • The numbers inside the cells are frequencies (just like goodness-of-fit) • You can have as many levels of a variable as you like • eg. dog, cat, parakeet, moose, hamster, other (6 levels) • BUT you can only have 2 variables • eg. not gender, pet AND car type 38 E values • Expected values are a bit more tricky • We want to finish with an E table, of the same form as the O table Expected Male Female Dog Need to calculate a value for each cell Cat we will use the O values to do this 39 E values, step by step • Step 1: work out the grand total from the O table (N) • Step 2: work out the marginal totals from the O table • Step 3: use a formula (RiCj/N) to get a value for each cell of the E table 40 Step 1: Grand total (N) • How many people did we use? • Same idea as the usual n • called capital N (for some reason) • To calculate: Add up all the numbers in each of the cells • So in the gender/pet example: N is • 36+34+7+32 = 109 • N = 109 41 Step 2: Marginal totals • We can work out the total of the margins of the O table O Male The marginal totals are written on the edges of the o table Female Dog 36 34 70 Cat 7 32 39 43 66 42 Step 2: Calculating marginals • For each marginal, add up the numbers in that line, so: O Male Female Dog 36 34 Cat 7 32 36+7 = 43 34+32 = 66 36+34 = 70 7 + 32 = 39 Do the rows AND the columns! Step 3: Work out E table • Write your marginals around your blank E table - in the right places! E Male Female Dog 70 Cat 39 43 We will now use the marginals to compute one E value for each cell The formula for E: E= 66 Ri x Cj N 43 Step 3: Work out a single cell 44 • For each cell, look at the cell’s row and column marginal (Ri and Cj) For Male/Dog E Dog Male Female R = 70 C = 43 70 39 Cat 43 Ri = 70 Cj = 43 The formula for E: 70 x 43 = 27.614 E= 109 66 Do the same for each cell Ready to calculate 45 2 • Now we have O and E, ready to calculate 2 (using the same formula as before) O Male Female Dog 36 34 Cat 7 32 E 27.614 15.385 42.385 23.614 Calculate 46 2 • This is almost the same as for goodness-offit, but be careful in building your table (the O and the E columns) O E 36 27.614 34 42.385 7 15.385 32 23.614 O-E (O-E)2 (O-E)2 E Matching up the O and E columns • Be careful!! Each type of response has an O and an E - match up the correct ones! • Male/Cat has O = 7 and E = 15.385 • Female/Dog has O=34 and E = 42.385 • If you get the wrong E for an O, all your results are wrong!! Do it slowly. 47 Working out the table • Step 1: O-E (go row by row, slowly) O E O-E 36 27.614 8.385 34 42.385 -8.385 7 15.385 -8.385 32 23.614 8.385 (O-E)2 (O-E)2 E 48 49 Working out the table • Step 2: square the differences O E O-E (O-E)2 36 27.614 8.385 70.3136 34 42.385 -8.385 70.3136 7 15.385 -8.385 70.3136 32 23.614 8.385 70.3136 (O-E)2 E 50 Working out the table • Step 3: divide the squares by E O E O-E (O-E)2 (O-E)2 E 36 27.614 8.385 70.3136 2.546 34 42.385 -8.385 70.3136 1.658 7 15.385 -8.385 70.3136 4.57 32 23.614 8.385 70.3136 2.977 Working out the table • Step 4: sum the divisions to get chi squared O E O-E (O-E)2 (O-E)2 E 36 27.614 8.385 70.3136 2.546 34 42.385 -8.385 70.3136 1.658 7 15.385 -8.385 70.3136 4.57 32 23.614 8.385 70.3136 2.977 11.7528 51 Df for contingency tables • Need to check the statistical significance of out chi value! • Use df and alpha (exactly as in goodness-offit) • df = (R-1)(C-1) • number of rows-1 x number of columns-1 • In our example, R = 2 (2 rows) and C = 2 (2 columns) • (2-1)(2-1) = (1)(1) = 1 • df = 1 52 53 Testing significance • Look up the critical value in the chi square table using alpha and df • if your calculated chi square is more than thew critical, then there is a relationship between the variables • remember that Ho is “no relationship” • if it is significant, then Ho is false - the si a relationship 54 Example: conclusion • We calculated 2 to be 11.75 • df = 1, alpha set to 0.05 • Crit value = 3.84 • Calc > crit, so reject Ho • There is a relationship between gender and pet ownership!

The chi-squared distribution, the goodness-of

Related documents

Products

Support

The chi-squared distribution, the goodness-of

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib