20 - Odds Ratio

advertisement
Statistics 312 – Dr. Uebersax
20 - Odds Ratio
Final Exam confirmation: Final will be Monday 3/17 from 1:10-4:00 in 02-204, confirmed by
the class scheduling office.
1. Effect-Size Statistics and Test Statistics
Some statistics we've studied fall into the category of test statistics (e.g., z, t). The actual value
of a test statistic has no meaning in itself; it is merely something we use to evaluate a statistical
hypothesis (i.e., accept or reject H0).
Other statistics, like mean and standard deviation, have an intrinsic meaning. We can attach a
precise meaning and interpretation to their magnitude.
Best of all are statistics that have both properties: their magnitude has a specific meaning, and
we can also use them for statistical inference. This means we don't have to calculate two
separate statistics (e.g., a mean and a z-score) to make an inference. The odds ratio is one
such statistic.
2. Odds Ratio
In the previous lecture we considered the Pearson r, which measures the degree of association
between two ratio- or interval-level variables. The odds ratio is a statistic used to measure and
test the association between two binary variables. If two binary variables are associated, they
are not statistically independent.
Odds
The formal definition of odds is ratio of the expected probabilities of X and ~X; i.e.,
P(X):P(~X) or P(X) / P(~X)
Example. The odds of rain tomorrow are 3:1.
Rain is three times as likely as no rain.
P(rains tomorrow) = 0.75
P(doesn't rain tomorrow) = 0.25
Odds = 0.75:0.25 = 3:1
Odds Ratio and Log-Odds Ratio
For two binary variables {X, ~X} and {Y, ~Y}, the odds ratio is the ratio of the odds of X if Y is
true to the odds of X if Y is not true.
Consider the first part. We start with these definitions of conditional probability:
P( X | Y ) 
P( X  Y )
P(Y )
P(~ X | Y ) 
P(~ X  Y )
P(Y )
Statistics 312 – Dr. Uebersax
20 - Odds Ratio
So the odds of X given Y are:
P( X | Y ) / P(~ X | Y ) 
P( X  Y ) / P(Y )
 P( X  Y ) / P(~ X  Y )
P(~ X  Y ) / P(U )
Similarly, the odds of X given ~Y are:
P( X |~ Y ) / P(~ X |~Y )  P( X  ~Y ) / P(~ X  ~Y )
And the odds ratio is
Odds ratio 
P( X  Y ) / P(~ X  Y )
P( X  ~Y ) / P(~ X  ~Y )
Calculating the odds ratio for tabled frequencies is very simple:
Variable 1
Variable 2
X
~X
Total
Y
a
b
a+b
~Y
c
d
c+d
Total
a+c
b+d
N
Odds ratio  OR 
a/b a d

c/d b c
For mathematical reasons, it is better to work with the natural log of the odds ratio, or log-odds
ratio.
a d 

ln( OR )  ln 
 bc 
Statistics 312 – Dr. Uebersax
20 - Odds Ratio
Interpretation of OR and ln(OR)
OR = 625/625 = 1,
ln(OR) = 0
OR = 2000/24 = 83.33
ln(OR) = 4.42
OR = 25/1800 = 0.014
ln(OR) = –4.28
No association of X and Y
Strong positive association of
X and Y
Strong negative/reverse
association of X and Y
Confidence Interval and Hypothesis Testing
There is no formula for the standard error of the OR, but there is for ln(OR):
s
ln(OR )

1 1 1 1
  
a b c d
Further, the sampling distribution for ln(OR) is approximately normally distributed. This gives us
a convenient way to produce a credible/confidence interval for ln(OR):
LLln(OR)  ln(OR)  zcrit  sln(OR)
ULln(OR)  ln(OR)  z crit  s ln(OR)
Where, as previously, zcrit is the z-value that defines the width of our credible/confidence
interval. For example, zcrit = 1.96 defines a 95% CI.
Once we have the LL and UL of the CI for ln(OR), we can take their anti-logs to get the CI for
OR:
LL OR  exp[ LL ln(OR) ]  e
LLln(OR )
ULOR  exp[ULln(OR) ]  e
ULln(OR )
We can also test a null hypothesis of no association between X and Y
H0: ln(OR) = 0 (no association of X and Y)
H1: ln(OR) ≠ 0 (association of X and Y)
Statistics 312 – Dr. Uebersax
20 - Odds Ratio
by calculating a z-score for ln(OR).
z
ln(OR )
s ln( OR)
Given this z, we can compute a p-value for a one- or two-tailed significance test. If the p-value is
less than the specified α, we reject the null hypothesis.
Homework
A quality engineer wants to measure the association of customer satisfaction (low vs. high) and
product design (old vs. new) and collects the following results for a sample of 100 cases.
Customer Satisfaction
Design
Low
High
Old
45 (= a)
8 (= b)
New
12 (= c)
35 (= d)
Calculate and report:
a. The odds ratio
b. The log of the odds ratio
c. The upper and lower limits for a 95% CI of ln(OR)
d. The upper and lower limits for a 95% CI of OR
Remember to use the natural log.
Show formulas!
You can check your answer here:
http://www.medcalc.org/calc/odds_ratio.php
Download