1MA01: Probability Sinéad Ryan TCD November 29, 2013 Binomial theory If a probability problem asks for the the number of successful outcomes when a trial is repeated n times it is described by a Binomial (n,p) distribution. Definition a binomial experiment is one in which a trial is repeated n times trials are independent the outcome is either success or failure each trial has a probability of success, p and a probability of failure q = 1 − p the goal of each experiment is to count the number of successes in n trials Example: rolling a die, repeated 3 times, ask how many sixes are rolled. This is a binomial experiment since the rolls are independent each roll is either success (ie a 6) or failure (ie not a 6) the probability of success in each roll is p = 1/ 6 and failure q = 5/ 6 n=3 we are counting how many successes (sixes) Random variables A random variable X is a variable whose value may change with a random experiment. Eg. let X by the number of sixes in 3 rolls of a die. This can change with different rolls. A discrete random variable may only assume a finite number of values. Eg in 3 rolls, X = 0, 1, 2 or 3 ie a finite number of values. The possible vlaue of X and their probabilities give a distribution of X. Example: Roll a die once. If it lands on 6 you get $4, otherwise you lose $1. Let M be the money you win. Find the distribution of M. Now, P(M = 4) = 16 and P(M = −1) = 56 and plotting this distribution gives 0.8 P(M) 0.6 0.4 0.2 0 M=-1 M=4 Binomial (n,p) and Binomial coefficients Theorem Let X be the number of successes in a binomial experiment with n trials and probability of success, p. Then n P(X = k) = pk qn−k , k = 0, 1, . . . , n. k and n k n! = k!(n − k)! where n! = 1 × 2 × 3 . . . n, n ≥ 1 and 0! = 1. Calculating binomial coefficients 14 8 6 0 = = 6! 0!(6 − 0)! 14! 8!(14 − 8)! =1 = 3003. see tutorial sheets for more examples. Calculating probabilities in binomial experiments Example 1: Consider a multiple choice test. Each question has 5 possible answers of which only 1 is correct. A student guesses on 6 questions. What is the probability that 2 of those guesses are correct? You can verify this is a binomial expt. with, n = 6, p = 51 = 0.2 and q = 45 = 0.8. Then, P(X = 2) = 6 2 p2 q6−2 6 (0.2)2 (0.8)4 2 = 15(0.04)(0.4096) = = 0.24576 ie ≈ 25% probability to guess 2 correctly. Example 2: Consider a box with 5 tickets inside, each labelled with a 1 or a 0 as; 0 0 0 0 1 6 tickets are drawn with replacement (ie a ticket is drawn, the number recorded and replaced before drawing the next). What is the probability that the sum of the tickets drawn is two. You can verify this is a binomial expt. with n = 6, p = 1/ 5 so then 6 P(X = 2) = (0.2)2 (0.8)4 = 0.24576. 2 Note: consider the random sampling of a population (this is akin to drawing from a box without replacement since once a person has been sampled they are not asked again in the same poll) Now, in principle this means that the probability is not constant since one person is removed from the experiment each trial. However, this still works if population is large enough so that removing one does not change the result by very much. application in genetics Consider 2 heterozygous pea-plants eg with alleles (Ff) which are crossed. Find the probability the offspring will also be heterozygous. Let X be the number of F alleles in the offspring. The probability that each gamete is F is 12 . 2 gametes are selected from the parents independently so X has a binomial (n, p) distribution and then P(X = 1) = 2 1 1 1 1 1 2 2 = 1 2 Note that here allele means a unique form of a gene so eg blood has genotypes: AA,AO,BB,BO,OO,AB; phenotypes: A, B, 0, AB. Hardy-Weinberg Law Suppose a trait has 2 alleles A and a and the frequency of A in sperm and egg gametes in a population is p so the frequency for a is 1 − p = q. Now, consider a sperm and egg randomly selected and crossed. Find the probability of having 0 A, 1 A and 2 A. 2 P(0 A) = P(aa) = P(X = 0) = p0 q2−0 = q2 0 2 P(1 A) = P(Aa) = P(X = 1) = p1 q2−1 = 2pq 1 2 P(2 A) = P(AA) = P(X = 2) = p2 q2−2 = p2 2 Then, eg. of A is dominant then the proportion of the population showing the dominant phenotype is p2 + 2pq. A population with proportions for trait expression/allele distribution as p2 : 2pq : q2 is in equilibrium. This law can be generalised for 3 alleles etc. Example: albinism, a recessilvely inherited trait with a rate of approx 1/10,000. Assuming a population follows the Harvey-Weinberg distribution find the proportion of heterozygotes and homozygotes. P(aa) = q2 = 1/ 10, 000 so q = 0.01 Therefore p = 1 − q = 0.99. The probability → P(AA) = (0.99)2 = 0.9801 gives the proportion of homozygous dominant. The probability P(aA) = 2pq − 2(0.99)(0.01) = 0.0198 is the proportion of heterozygotes (carriers). This implies the allele for albinism is present but not expressed in ∼ 2% of the population. Histograms If X has a binomial (n, p) = (6, 1/ 2) distribution then P(X = 0) = 1/ 64 P(X = 1) = 3/ 32 5/16 P(X = 3) = 5/ 16 P(X=k) P(X = 2) = 15/ 64 15/64 P(X = 4) = 15/ 64 3/32 P(X = 5) = 3/ 32 1/64 P(X = 6) = 1/ 64 0 1 2 3 4 5 6 7 X rectangle with highest value is called the mode ie the most likely value area of a rectangle = height×width = P(X = P k) × 1 = P(X = k) since k P(X = k) = 1 the area of the histogram is 1.