PBG 650 Advanced Plant Breeding Mathematical Statistics Concepts – Probability Laws – Probability Distributions – Binomial distributions – Mean and Variance of Linear Functions – Frequency data – Regression and Correlation Probability Laws • Union: The probability that A or B (or both) occurs Pr( A B) Pr( A ) Pr(B) Pr( A B) A • Intersection: The probability that both A and B occur simultaneously Also called the joint probability • B Pr( A B) Pr( A,B) Complement: The probability that A does not occur Pr(A) 1 Pr(A) Conditional Probability and Independence • the conditional probability of A given B Pr(A,B) Pr(A B) Pr(B) • If events A and B are independent, then Pr(A B) Pr(A) Pr(B A) Pr(B) Pr(A,B) Pr(A)Pr(B) Probabilities Plant Height B1B1 B1B2 B2B2 Marginal Prob. height ≤ 50 cm 0.10 0.14 0.06 0.30 50 < height ≤ 75 0.04 0.18 0.10 0.32 height > 75 cm 0.02 0.16 0.20 0.38 Marginal Prob. 0.16 0.48 0.36 1.00 Marginal Probability: Pr(Genotype= B1B1) = 0.16 Joint Probability: Pr(height ≤ 50, Genotype= B1B1) = 0.10 Conditional Probability: Pr height 50 Genotype B1B1 Pr (height 50,Genotype B1B1) 0.10 0.625 Pr(Genotype B1B1) 0.16 Statistical Independence If X is statistically independent of Y, then their joint probability is equal to the product of the marginal probabilities of X and Y If Independent Pr(height ≤ 50, Genotype= B1B1) = Pr(height ≤ 50) x Pr(Genotype= B1B1) = 0.30 x 0.16 = 0.0480 B1B1 B1B2 B2B2 Marginal Prob. height ≤ 50 cm 0.0480 0.1440 0.1080 0.30 50 < height ≤ 75 0.0512 0.1536 0.1152 0.32 height > 75 cm 0.0608 0.1824 0.1368 0.38 Marginal Prob. 0.16 0.48 0.36 1.00 Plant Height Joint Probability (observed): Pr(height ≤ 50, Genotype= B1B1) = 0.10 Bayes’ Theorum (Bayes’ Rule) • • Pr( A, B ) Pr( A B ) Pr(B ) Conditional probability Bayes’ Theorum Pr(B A)Pr A Pr( A B) Pr(B) . Pr(A) is called the prior probability Pr(A|B) is called the posterior probability Pr( Aj B ) Pr(B Aj )Pr Aj k Pr(B A )Pr A i 1 i i Bayes’ Theorum Example • Pr(A) = 0.20 is the probability that a plant will get a disease (prior) • • • Pr(B) = 0.30 is the frequency of a genetic marker Pr(B|A) = 0.60 is the frequency of the marker that is observed in a sample of diseased plants We would like to know the chance of .getting the disease if a plant has the marker (posterior probability) Pr(B A)Pr A (0.6) 0.20 0.12 Pr( A B) 0.40 Pr(B) 0.3 0.3 There is a 40% probability that a plant will get the disease if it has the marker Discrete probability distributions • Let X be a discrete random variable that can take on a value Xi, where i = 1, 2, 3,… A countable number of values • The probability distribution of X is described by specifying Pi = Pr(Xi) for every possible value of Xi • • • 0 ≤ Pr(Xi) ≤ 1 for all values of Xi ΣiPi = 1 The expected value of X is E(X) = ΣiXiPr(Xi) =X What would the probability distribution be for rolling a single die? (this is an example of a uniform distribution) What would the expected value be? Binomial Probability Function • A Bernoulli random variable can have a value of one or zero. The Pr(X=1) = p, which can be viewed as the probability of success. The Pr(X=0) is 1-p. • A binomial distribution is derived from a series of independent Bernouli trials. Let n be the number of trials and y be the number of successes. – Calculate the number of ways to obtain that result: n! n y y! (n y )! – Calculate the probability of that result: n! n y y Pr( y ) p 1 p y! (n y )! Probability Function Binomial Distribution Average = np = 20*0.5 = 10 Variance = np(1-p) = 20*0.5*(1-0.5) = 5 Probability Binomial Distribution (n=20, p=0.5) 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of successes For a normal distribution, the variance is independent of the mean For a binomial distribution, the variance changes with the mean Binomial Example • Patricia has developed some doubled haploids of barley from a cross between two purelines. • She needs at least 3 plants that have a particular SNP marker found in one of the parents. • She decides to grow 10 plants because that seems like plenty, and space in the greenhouse is limited. • Assuming that all of her plants survive, what are the chances that she will meet her objective? Mean and variance of linear functions • Mean and variance of a constant (c) E (c ) c 0 2 c • Adding a constant (c) to a random variable Xi E( X c ) E( X ) c the mean increases by the value of the constant the variance remains the same 2 X c 2 X Mean and variance of linear functions • Multiplying a random variable by a constant E (cX ) c [E ( X )] multiply the mean by the constant 2 cX c 2 X2 multiply the variance by the square of the constant Adding two random variables X and Y E ( X Y ) E ( X ) E (Y ) mean of the sum is the sum of the means (2X Y ) VAR( X ) VAR(Y ) 2COVXY variance of the sum the sum of the variances if the variables are independent Variance - definition • The variance of variable X V ( X ) E ( X i X )2 E ( X i2 ) X2 • Usual formula X 2 X i X n 2 2 X i X i2 n n • Formula for frequency data (weighted) 2 2 V ( X ) fi X i - x Covariance - definition • The covariance of variable X and variable Y Cov ( X ,Y ) E ( X X )(Y Y ) E ( XY ) X Y • Usual formula XY X i Yi X i Yi n X i X Yi Y n n • Formula for frequency data (weighted) Cov ( X , Y ) XY f i X iYi - X Y Linear Regression and Correlation X b i X Yi Y X i X 2 SCPX,Y SSX X,Y Y 2X X X X Y Y SCP SS 2 i 2 Re g 2 i X X X,Y r X i X X Y Y i i 2 X X X Yi Y 2 X,Y 2 i 2 SCPX,Y SSX SSY 2 X,Y 2X 2Y