Analyzing Proportions Binomial Distribution - Example Binomial Test - Example 1 Binomial Distribution There are many application in biology where one is confronted with data that fall into one of two categories: male/female, dead/alive, left-handed/right-handed, etc. This situation falls into our previous coin-flip example (head/tails). In the population, there will be a fixed proportion that fall into one category (p) and the balance will by definition be (1-p). Often times the proportion 'p' is referred to as the number of “successes” (the complement being the number of “failures”). 2 Binomial Distribution If we take a random sample of n individuals from this type of population, the sampling distribution for the number of successes p is described by the binomial distribution: X n− X Pr [ x successes ]= n p 1− p X The term Xn is read as “n choose X” and can be expanded to: n! Xn = X ! n− X ! 3 Binomial Distribution - Example - Suppose you are evaluating an orchid species that exhibits a left-handed lip on the flower or a right-handed lip. Suppose you sample 27 flowers (say all the ones that occurred in random quadrat) and determined the proportion of left-handed flowers p = 0.25. The binomial coefficient then is “27 choose 6” is 296,010 and expansion of the previous equation comes to 0.1719. You can then sequentially calculate the probability for each value of p to generate the binomial distribution: 4 5 6 Note the Law of Large Numbers. Larger samples yield more precise estimates (larger sample exhibits less spread in the data). 7 > hist(rbinom (100,100,.25)) 8 Binomial Test The Binomial test applies the binomial sampling distribution to hypothesis testing for a proportion. Generally, these tests take the form: H0: The relative frequency of successes in the pop is p0. HA: The relative frequency of successes in the pop is not p0. 9 Binomial Test - Example 7.2 - A study of 25 genes involved in spermatogenesis (sperm formation) found their locations in the mouse genome. The study was carried out to test a prediction of evolutionary theory that such genes should occur disproportionately often on the X chromosome.4 As it turned out, 10 of the 25 spermatogenesis genes (40%) were on the X chromosome (Wang et al. 2001). If genes for spermatogenesis occurred “randomly” throughout the genome, then we would expect only 6.1% of them to fall on the X chromosome because the X chromosome contains 6.1% of the genes in the genome. Do the results support the hypothesis that spermatogenesis genes occur preferentially on the X chromosome? 10 Binomial Test - Example 7.2 - 11 Binomial Test - Example 7.2 - 12 Binomial Test - Example 7.2 - > sum(dbinom(10:25,25,0.061))*2 [1] 1.987976e-06 13 Binomial Test - Example 7.2 A P-value of 0.00000198 << 0.05, thus, we reject the null hypothesis and conclude that there is a disproportionate number of spermatogenesis genes on the X chromosome. Our best estimate of the proportion of spermatogenesis genes that are located on the X chromosome is: p = 10 =0.40 25 Which is much greater than the proportion of 0.061 stated in the null hypothesis. 14 Binomial Test - Example 7.2 - > binom.test(10,25,p=0.061) Exact binomial test data: 10 and 25 number of successes = 10, number of trials = 25, p-value = 9.94e-07 alternative hypothesis: true probability of success is not equal to 0.061 95 percent confidence interval: 0.2112548 0.6133465 sample estimates: probability of success 0.4 Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404–413. (Confidence Interval in R) 15