Chapter_12_13

advertisement
Chapter 12
Two by two table for survey questions “What is your gender” and “Do you feel that female sports
should receive the same level of monetary support as males get?”
(NOTE: to start just look at the counts and inquire as to what students believe. Then point out
that to make comparisons, unless the counts are equal for each condition, is difficult unless we
have percentages.
Tabulated statistics: Gender, SameLevel
Rows: Gender
Columns: SameLevel
No
Yes
All
Female
86
13.38
181.5
556
86.62
460.5
642
100.00
538.0
Male
201
53.88
105.5
172
46.12
267.5
373
100.00
339.0
All
287
28.27
307.0
728
71.73
708.0
1015
100.00
1015.0
Cell Contents:
Count
% of Row
Expected count
Chi-Sq = 190.735, DF = 1, P-Value = 0.000
Commonly the explanatory variable is placed on the row and the response variable on the
vertical. So for this data Gender would be explaining the attitude toward monetary support.
Conditional percents for rows are comprised by the taking a cell total divided by the row total;
for columns this conditional percent is taken by cell total divided by column total. For example,
of Females the conditional percent who said “Yes” to women receiving equal support is 556/642
= 86.62%; for Men this conditional percent who said “Yes” is 172/373 = 46.12%. As to columns,
of those that said “Yes” the conditional percent that were Female is 556/728 = 76.37%
Probability, Risk, and Odds
If we randomly selected a student, what is probability that the student said “Yes”? 728/1015 =
0.7172
What is the risk that a randomly selected student said “Yes?” Again, 728/1015 = 0.7172
What is the proportion of those who said “Yes”? Again, 728/1015 = 0.7172
What is the percent of those who said “Yes”? Similarly 728/1015 = 0.7172 times 100% 71.72%
As we can see, these four are equivalently saying the same thing just using different phrasing.
1
What are the odds a person says “Yes”? This is now 728/ 287 which is approximately 2.5 to 1.
Questions to class:
What is probability that a Male says “Yes”? 172/373 = 0.4612
What is the risk that a Male says “Yes”? 172/373 = 0.4612
How about the proportion and percentage? Proportion also 0.4612 while the percentage is
46.12%
What are the odds that a Male says Yes? 172/201 or roughly 7 to 8. How are the odds
interpreted? We would say then that the odds of a Male saying Yes is 7 to 8; in other words, 7 of
15 would say Yes and 8 out of 15 would say No.
Relative Risk is when two risks are compared. For instance the relative risk that Females say
Yes to Males that say Yes would be calculated by the ratio of risk for Males saying Yes to the
risk of Females saying Yes. The Female-Yes risk is 556/642 and the Male-Yes risk is 172/373.
This would make the relative risk (556/642)/(172/373). This computes to 0.8662/0.4612 about 2
The interpretation is that the risk of agreeing that females deserve equal pay in sports as males is
about 2 times greater for Females than for Males.
Baseline risk is when to which another risk is compared to in a relative risk. That is it is the risk
in the denominator of the relative risk. In the previous example, Female risk of Yes was compare
to the Male risk of Yes putting the male risk in the denominator. This makes the baseline risk the
risk of Males saying Yes.
Odds ratio is similar to relative risk except it is the ratio of two odds. The odds ratio that
Females say Yes to Males say Yes is (556/86)/ (172/201) = 7.5 meaning the odds of Females
saying Yes is about 7.5 times the odds of Males saying Yes.
Important To Interpret
When reading a report that provides a risk it is extremely important to know or be given the
baseline risk. For instance, say a study reported that women who binge drink are 3 times more
likely to develop liver disease than women who do not drink. This may alarm some females,
understandably. But what if the risk of getting liver disease for women who do not binge drink
(i.e. the baseline risk) was 0.001 or 1 out of a 1000 women who do not binge drink are likely to
develop liver disease. This would mean that risk of females who do binge drink developing liver
disease is 3/1000 or 0.003 Not that alarming!
To calculate:
Odds are the (number of interest with trait)/(number of interest without trait)
Risk is the (number of interest with trait)/(over total number of interest)
2
Chapter 13
Ho: The two variables are not related in population (i.e. they are independent)
Ha: The two variables are related in the population (i.e. they are dependent)
Keeping with the Gender and Equal level monetary support for males-females in athletics,
How would tables look as they went from independent to dependent?
What do you think this table indicates?
Rows: Gender
Columns: SameLevel
No
Yes
All
Female
321
50.00
321
50.00
642
100.00
Male
187
50.00
186
50.00
373
100.00
508
507
All
1015
How about this table?
Rows: Gender
Columns: SameLevel
No
Yes
All
Female
193
30.00
449
70.00
642
100.00
Male
112
30.00
261
70.00
373
100.00
305
710
All
1015
How about now?
Tabulated statistics: Gender, SameLevel
Rows: Gender
Columns: SameLevel
No
Yes
All
Female
86
13.38
181.5
556
86.62
460.5
642
100.00
538.0
Male
201
53.88
105.5
172
46.12
267.5
373
100.00
339.0
All
287
28.27
307.0
728
71.73
708.0
1015
100.00
1015.0
Cell Contents:
Count
% of Row
Expected count
Chi-Sq = 190.735, DF = 1, P-Value = 0.000
3
What will indicate a relationship is a change in direction in the conditional percents from one
level of the explanatory to another level. As in the third 2x2 table the percentage changes of Yes
for Females is very different than that for Males. But what determines HOW Different?
We apply what is called a Chi-square Test of Independence. This is done by first taking the
observed values to compute what you would expect to see if the two variables were independent.
This is done by creating expected counts in each cell of the table by using the row and column
tables compared to the overall total.
Finding Expected Table (i.e. based on the data collected, these are the counts we would
expect to find if the variables were independent)
Female
Male
Total
NO
(642x287)/1015
= 181.5
(373x287)/1015
= 105.5
287
Yes
Total
(642x728)/1015 642
= 460.5
(373x728)/1015 373
= 267.5
728
1015
This computes to, if independent, we would expect to see conditional percentages for Yes of
71.72% (found by 460.5/642 and 267.5/373) and for No of 28.28% (by 181.5/642 and 105.5/373)
The next step is to statistical compare what we would observed to what we expected. To do this
we calculate a chi-square test statistic and associated p-value (for this class the p-value will be
provided).
The general formula is:  2 
 (Observed  Expectect)
2
Expected
Applying that to this data:
2 
(86  181.5) 2 (556  460.5) 2 (201  105.5) 2 (172  267.5) 2



 190.735
181.5
460.5
105.5
267.5
The p-value for this comes to approximately 0.000
Decision and conclusion: As with the test for a correlation the p-value is the probability the
sample data would produce a result given the null hypothesis is true. If the p-value is small then
this indicates that the variables are related. Again will use 0.05 as a level of significance to
compare with our p-value. Here the p-value of 0.000 is less than 0.05 so we reject the null
hypothesis and conclude that Gender and attitude about equal level of support are related where
Females are more likely to believe in equal monetary support for the genders than Males.
NOTE: Keep in mind that the results can only be extended to sample group unless the data was
randomly selected and we cannot conclude a causal relationship unless random assignment was
involved.
4
Affect of Confounding Variables in Categorical Data Relationships
When a confounding variable is present (i.e. a variable that affects the relationship but was not
considered in the study) the statistical results may not reflect the true relationship. This can lead
to what is called Simpson’s Paradox
Simpson’s Paradox occurs when combined data leads to one result but when we separate the data
by another lurking variable we get opposite results.
Example: Following a 1972 Supreme Court ruling to eliminate racial disparities in capital cases,
several studies were conducted to follow-up on sentences of those found guilty of capital
offenses. One such study considered homicides in Florida between 1976 and 1977 to examine if
a relationship existed between race and assignment of the death penalty (see Michael Ravelet,
American Sociological Review, 1981 vol. 46).
Overall table:
Defendant/Death Penalty
White
Black
Total
NO
141
149
290
Yes
19
17
36
Total
160
166
326
% Yes
11.9%
10.2%
From this table it shows that White defendants that were guilty were slightly more likely to get
the death penalty than Black guilty defendants.
However, a lurking variable victim’s race provides a different look:
Victim: White
Defendant/Death Penalty
White
Black
Total
NO
132
52
184
Yes
19
11
30
151
63
214
Victim: Black
Defendant/Death Penalty
White
Black
Total
NO
9
97
106
Yes
0
6
6
9
103
112
Total
% Yes
12.6
17.5%
Total
% Yes
0%
5.85
As we can see, in both instances when the victim’s race is considered the percentage of White
defendant’s who receive the death penalty is now lower than the percentage of defendant’s who
were Black.
5
Download