Analyzing Proportions Binomial Distribution Binomial Distribution

advertisement
Analyzing Proportions
Binomial Distribution
- Example
Binomial Test
- Example
1
Binomial Distribution
There are many application in biology where one is confronted
with data that fall into one of two categories: male/female,
dead/alive, left-handed/right-handed, etc.
This situation falls into our previous coin-flip example
(head/tails). In the population, there will be a fixed proportion
that fall into one category (p) and the balance will by definition
be (1-p).
Often times the proportion 'p' is referred to as the number of
“successes” (the complement being the number of “failures”).
2
Binomial Distribution
If we take a random sample of n individuals from this type
of population, the sampling distribution for the number of
successes p is described by the binomial distribution:

X
n− X
Pr [ x successes ]= n p 1− p
X
The term
 Xn  is read as “n choose X” and can be expanded to:
n!
 Xn = X ! n−
X !
3
Binomial Distribution
- Example -
Suppose you are evaluating an orchid species that exhibits
a left-handed lip on the flower or a right-handed lip.
Suppose you sample 27 flowers (say all the ones that
occurred in random quadrat) and determined the proportion
of left-handed flowers p = 0.25.
The binomial coefficient then is “27 choose 6” is 296,010
and expansion of the previous equation comes to 0.1719.
You can then sequentially calculate the probability for each
value of p to generate the binomial distribution:
4
5
6
Note the Law of
Large Numbers.
Larger samples
yield more precise
estimates (larger
sample exhibits
less spread in the
data).
7
> hist(rbinom
(100,100,.25))
8
Binomial Test
The Binomial test applies the binomial sampling distribution to
hypothesis testing for a proportion. Generally, these tests take
the form:
H0: The relative frequency of successes in the pop is p0.
HA: The relative frequency of successes in the pop is not p0.
9
Binomial Test
- Example 7.2 -
A study of 25 genes involved in spermatogenesis (sperm
formation) found their locations in the mouse genome. The study
was carried out to test a prediction of evolutionary theory that
such genes should occur disproportionately often on the X
chromosome.4 As it turned out, 10 of the 25 spermatogenesis
genes (40%) were on the X chromosome (Wang et al. 2001).
If genes for spermatogenesis occurred “randomly” throughout the
genome, then we would expect only 6.1% of them to fall on the X
chromosome because the X chromosome contains 6.1% of the
genes in the genome. Do the results support the hypothesis that
spermatogenesis genes occur preferentially on the X
chromosome?
10
Binomial Test
- Example 7.2 -
11
Binomial Test
- Example 7.2 -
12
Binomial Test
- Example 7.2 -
> sum(dbinom(10:25,25,0.061))*2
[1] 1.987976e-06
13
Binomial Test
- Example 7.2 A P-value of 0.00000198 << 0.05, thus, we reject the null
hypothesis and conclude that there is a disproportionate
number of spermatogenesis genes on the X chromosome.
Our best estimate of the proportion of spermatogenesis
genes that are located on the X chromosome is:
p =
10
=0.40
25
Which is much greater than the proportion of 0.061 stated
in the null hypothesis.
14
Binomial Test
- Example 7.2 -
> binom.test(10,25,p=0.061)
Exact binomial test
data: 10 and 25
number of successes = 10, number of trials = 25,
p-value = 9.94e-07
alternative hypothesis: true probability of success is not
equal to 0.061
95 percent confidence interval:
0.2112548 0.6133465
sample estimates:
probability of success
0.4
Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated
in the case of the binomial. Biometrika, 26, 404–413. (Confidence Interval in R)
15
Download