The chi-squared distribution, the goodness-of

advertisement
The
2
test
• Sections 19.1 and 19.2 of Howell
• This section actually includes 2 totally separate
tests
• goodness-of-fit test
• contingency table analysis
• Each has its own point, and requires different
things
• Only thing in common - same formula
• Keep them separate in your mind!
1
Return to hypothesis testing
• We can test statistical significance, no prob
• need p and alpha (and a computer)
• Sometimes, no computer available
• can use tables to test statistical significance
• Little more work, but works just as well
• This method uses the same logic as the p
value method
2
Testing Ho without a PC
•
•
•
•
•
The strategy (new stuff is underlined)
Step1: Set up Ho, Ha and decide on alpha
Step 2: Calculate the statistic and df
Step 3: Get the critical value from the table
Step 4: Compare critical value to statistic
3
4
Step 1
• Set up Ho and alpha - already know
• Ha - the alternative hypothesis
• If Ho is false, what do we believe then? (Ha)
• Ha represents the opposite of Ho
• eg. if Ho: r = 0 then Ha: r  0
• If we reject Ho (because its false), then we must
accept Ha as being true.
5
Step 2
• Nothing different
• use appropriate formulas for stat and df!
Step 3
• Get the critical value
• from the table (back of Howell)
• Use alpha and df to look it up
• Critical value: the value of your statistic at
which p = alpha (the edge of the rejection
region)
6
Step 4
• Compare your stat to the crit value:
• Ignore any minuses (look only at value)
• If your calculated stat is more than the crit
value, then p < alpha (ie. significance!)
• The test is significant if calculated value is
greater than the crit value
• Reject the Ho, and accept the Ha.
• Pretty easy!
7
8
Example
• Lets use an r value:
• We get r = 0.61 with df = 10, alpha = 0.05
• Is this significant?
• Critical value: use df and alpha on table D2 in
Howell (significant values of the correlation
coefficient)
• for alpha = 0.05 and df = 10, crit value = 0.576
9
Example
• Now we have the calculated value and crit
value
• Calculated = 0.61
• Critical = 0.576
• Check:
• if calculated > critical, reject Ho
• 0.61 > 0.576, so we reject Ho
• The result is statistically significant!
Return to
2
• Note: 2 only works with discrete data
• What is the point of 2 ?
• Goodness-of-fit: Used to see if data matches a
hypothetical distribution
• Are there the same number of men as women?
• Are about 25% of South Africans unemployed?
• Contingency table analysis (independence test):
used as a correlation for discrete data (are the
variables related?)
10
Goodness-of-fit
2
• Used to test a model distribution of data
• Have an idea of how data should be distributed
• eg. There should be 60% brunettes, 40% blondes
• Collect data, check to see if our idea (model) is
supported by the data
• Does the data fit the model?
• Before starting a goodness-of-fit test,
always be sure of what the model is
11
Creating a model
• We put our expectations as percentages on a
table
• One cell of the table for each possible value
of the variable
• Each cell has the percentage of observations
we expect
12
13
Example model
• We expect 40% brunettes, 60% blondes, so
Blondes
60%
Brunettes
40%
14
Observed scores and Expected scores
• Strategy: Want to see if our observation
matches our model
• We collect some data (Observed scores)
• We work out what the data would look like if
our model were correct (Expected scores)
• Compare the two: do the observed scores show
the same pattern as the expected scores?
15
Converting the model to expected
scores
• We have our model as percentages
Blondes
60%
Brunettes
40%
• We must now convert % to actual values
(frequencies) - use n (number of observations) If
we collected 134 observations, then
Blondes
Brunettes
(60/100) x 134 = 80.4 (40/100) x 134 = 53.6
16
Converting % to frequency
• To do this:
• (percentage / 100) x n
• Keep the decimals!
• You cannot work with % for 2 - you must
have frequencies (number of observations)
Beginning the
2

analysis
• To begin, need Ho
• For , 2 it is always “observed data = expected
data”
• Need to state the model (in %)
• Collect the data
• Create an expected freq table (using your
model and n)
• Calculate 2 to see if the observed =
expected
17
18
2 Formula
O = observed score
E = expected score
19
2 formula, step by step
• Step 1: for each subject, that subject’s O
minus that subject’s E
• Step 2: for each subject, square the step 1s.
• Step 3: for each subject, take their step2,
and divide it by that subject’s E
• Step 4: sum all the step 3’s
20
Table method for 2
• Use the following columns:
• O E
O-E
(O-E)2
(O-E)2
E
Add up here
21
Degrees of freedom (df)
• The df for goodness-of-fit tests is easy to
calculate:
• df = k-1
• k is the number of possible values for your variable
(categories)
• using males and females k = 2
• using coke, pepsi, sprite k = 3
• using easy, moderate, hard, awesome k =4
22
Worked example 1
• We suspect that there is a 50%/50% gender
distribution at UCT. We observed 147
people, 68 male, 79 female. Do we really
have a 50%/50% distribution?
• Set up (step 1)
• Ho: Distribution is 50%/50%
• Ha: Distribution is not 50%/50%
• alpha = 0.05
23
Example: work out expected scores
• (What would we have seen if Ho were true?)
• Model:
• Males 50%
• Females 50%
•
•
•
•
Convert to scores
n = 147
Males expected: (50/100) x 147 = 73.5
Females expected: (50 / 100) x 147 = 73.5
24
Example: O and E values
• Now we have our values
Value
O
E
Male
68
73.5
Female
79
73.5
O-E
(O-E)2
(O-E)2
E
25
Example
- Work out the columns
Value
O
E
O-E
(O-E)2
(O-E)2
E
Male
68
73.5
-5.5
30.25
0.411
Female
79
73.5
5.5
30.25
0.411
26
Example
- Add up the values in the last column
Value
O
E
O-E
(O-E)2
(O-E)2
E
Male
68
73.5
-5.5
30.25
0.411
Female
79
73.5
5.5
30.25
0.411
0.823
27
Example - df
• Now we have our 2 value: 0.823
• Is it statistically significant? (does the model
explain the population?)
• Need the critical value for this!
• Degrees of freedom: k-1
• 2 categories (male, female)
• so df = 1
28
Example: critical value
• What is the critical value for our
male/female example?
• Df: k = 2 (male and female), so df = 1
• For df = 1 and alpha = 0.05, the table says:
• crit = 3.84
• To be significant, our value must be more that
3.84
29
Example: conclusions
• Calculated < critical
• (0.823 < 3.84), so the Ho is true
• (this means: it is true that “distribution is
50%/50%)
• Conclusion: it seems that at UCT there are
as many males as there are females.
Interpreting
2

30
findings
• 2 findings are interpreted a little differently
• False Ho (significance) means we cannot accept
the model (the model is wrong for this
population)
• True Ho (non-significance) means we must
assume that the model applies to this population
• This is the case for goodness-of-fit tests
31
Contingency table analysis with 2
• Pearson’s product moment allowed us to
establish a relationship between 2
continuous variables
• doesn’t work for discrete data (categories)
• Eg. “is there are relationship between gender
and owning a dog or cat?” (2 discrete variables)
• Contingency table analysis is used for this
• can work with nominal variables
32
Something old, something new
• Quite similar to goodness-of-fit tests
• Work out the expected values
• Use the chi square formula
• Work out df
• get a critical value from the table
• Differences:
• Slightly different O table
• New way of working out expected values
• New way of working out df
33
Observed values
• For each person, we ask 2 questions (2 vars)
• “are you male/female” and “do you have a dog
or a cat”
owners)
•
•
•
•
•
(let’s assume we sample only pet
We end up with:
Subject
Gender
1
M
2
M
3
F
Pet
D
C
D
etc.
34
O table
• We need to convert those data into a
frequency table that looks like:
GENDER
Male
Female
Dog
PET
Cat
Filling in the O table
35
• Each cell has only one number in it
• number of people fitting that condition
GENDER
Male
Female
Dog
1
2
Cat
3
4
PET
In cell 1: number of
people who are Male
AND have a dog
In cell 2: number of
people who are Female
AND have a dog
In cell 3: number of
people who are Male
AND have a cat
etc
36
The finished O table
• An o table usually looks like:
GENDER
Male
Female
Dog
36
34
Cat
7
32
We had 7 males
with cats
PET
We had 34 females
with dogs
This table is a 2x2 table - 2 rows (pet) and 2 columns (gender)
37
Notes about O tables
• The numbers inside the cells are frequencies
(just like goodness-of-fit)
• You can have as many levels of a variable
as you like
• eg. dog, cat, parakeet, moose, hamster, other (6
levels)
• BUT you can only have 2 variables
• eg. not gender, pet AND car type
38
E values
• Expected values are a bit more tricky
• We want to finish with an E table, of the
same form as the O table
Expected
Male
Female
Dog
Need to calculate a
value for each cell
Cat
we will use the O
values to do this
39
E values, step by step
• Step 1: work out the grand total from the O
table (N)
• Step 2: work out the marginal totals from
the O table
• Step 3: use a formula (RiCj/N) to get a
value for each cell of the E table
40
Step 1: Grand total (N)
• How many people did we use?
• Same idea as the usual n
• called capital N (for some reason)
• To calculate: Add up all the numbers in each of
the cells
• So in the gender/pet example: N is
• 36+34+7+32 = 109
• N = 109
41
Step 2: Marginal totals
• We can work out the total of the margins of
the O table
O
Male
The marginal
totals are written
on the edges of
the o table
Female
Dog
36
34
70
Cat
7
32
39
43
66
42
Step 2: Calculating marginals
• For each marginal, add up the numbers in
that line, so:
O
Male
Female
Dog
36
34
Cat
7
32
36+7
= 43
34+32
= 66
36+34 = 70
7 + 32 = 39
Do the rows
AND the
columns!
Step 3: Work out E table
• Write your marginals around your blank E
table - in the right places!
E
Male
Female
Dog
70
Cat
39
43
We will now use the
marginals to compute
one E value for each
cell
The formula for E:
E=
66
Ri x Cj
N
43
Step 3: Work out a single cell
44
• For each cell, look at the cell’s row and
column marginal (Ri and Cj)
For Male/Dog
E
Dog
Male
Female
R = 70
C = 43
70
39
Cat
43
Ri = 70
Cj = 43
The formula for E:
70 x 43
= 27.614
E=
109
66
Do the same for each cell
Ready to calculate
45
2
• Now we have O and E, ready to calculate 2
(using the same formula as before)
O
Male
Female
Dog
36
34
Cat
7
32
E
27.614
15.385
42.385
23.614
Calculate
46
2
• This is almost the same as for goodness-offit, but be careful in building your table (the
O and the E columns)
O
E
36
27.614
34
42.385
7
15.385
32
23.614
O-E
(O-E)2
(O-E)2
E
Matching up the O and E columns
• Be careful!! Each type of response has an O
and an E - match up the correct ones!
• Male/Cat has O = 7 and E = 15.385
• Female/Dog has O=34 and E = 42.385
• If you get the wrong E for an O, all your
results are wrong!! Do it slowly.
47
Working out the table
• Step 1: O-E (go row by row, slowly)
O
E
O-E
36
27.614
8.385
34
42.385
-8.385
7
15.385
-8.385
32
23.614
8.385
(O-E)2
(O-E)2
E
48
49
Working out the table
• Step 2: square the differences
O
E
O-E
(O-E)2
36
27.614
8.385
70.3136
34
42.385
-8.385
70.3136
7
15.385
-8.385
70.3136
32
23.614
8.385
70.3136
(O-E)2
E
50
Working out the table
• Step 3: divide the squares by E
O
E
O-E
(O-E)2
(O-E)2
E
36
27.614
8.385
70.3136
2.546
34
42.385
-8.385
70.3136
1.658
7
15.385
-8.385
70.3136
4.57
32
23.614
8.385
70.3136
2.977
Working out the table
• Step 4: sum the divisions to get chi squared
O
E
O-E
(O-E)2
(O-E)2
E
36
27.614
8.385
70.3136
2.546
34
42.385
-8.385
70.3136
1.658
7
15.385
-8.385
70.3136
4.57
32
23.614
8.385
70.3136
2.977
11.7528
51
Df for contingency tables
• Need to check the statistical significance of
out chi value!
• Use df and alpha (exactly as in goodness-offit)
• df = (R-1)(C-1)
• number of rows-1 x number of columns-1
• In our example, R = 2 (2 rows) and C = 2 (2
columns)
• (2-1)(2-1) = (1)(1) = 1
• df = 1
52
53
Testing significance
• Look up the critical value in the chi square
table using alpha and df
• if your calculated chi square is more than
thew critical, then there is a relationship
between the variables
• remember that Ho is “no relationship”
• if it is significant, then Ho is false - the si a
relationship
54
Example: conclusion
• We calculated 2 to be 11.75
• df = 1, alpha set to 0.05
• Crit value = 3.84
• Calc > crit, so reject Ho
• There is a relationship between gender and pet
ownership!
Download