PBG 650 Advanced Plant Breeding Mathematical Statistics Concepts

advertisement
PBG 650 Advanced Plant Breeding
Mathematical Statistics Concepts
– Probability Laws
– Probability Distributions
– Binomial distributions
– Mean and Variance of Linear Functions
– Frequency data
– Regression and Correlation
Probability Laws
•
Union: The probability that A or B (or both) occurs
Pr( A  B)  Pr( A )  Pr(B)  Pr( A  B)
A
•
Intersection: The probability that both A and B occur
simultaneously
Also called the
joint probability
•
B
Pr( A  B)  Pr( A,B)
Complement: The probability that A does not occur
Pr(A)  1  Pr(A)
Conditional Probability and Independence
• the conditional probability of A given B
Pr(A,B)
Pr(A B) 
Pr(B)
• If events A and B are independent, then
Pr(A B)  Pr(A)
Pr(B A)  Pr(B)
Pr(A,B)  Pr(A)Pr(B)
Probabilities
Plant Height
B1B1
B1B2
B2B2
Marginal
Prob.
height ≤ 50 cm
0.10
0.14
0.06
0.30
50 < height ≤ 75
0.04
0.18
0.10
0.32
height > 75 cm
0.02
0.16
0.20
0.38
Marginal Prob.
0.16
0.48
0.36
1.00
Marginal Probability:
Pr(Genotype= B1B1) = 0.16
Joint Probability:
Pr(height ≤ 50, Genotype= B1B1) = 0.10
Conditional Probability:
Pr height  50 Genotype  B1B1 
Pr (height  50,Genotype  B1B1) 0.10


 0.625
Pr(Genotype  B1B1)
0.16
Statistical Independence
If X is statistically independent of Y, then their joint probability
is equal to the product of the marginal probabilities of X and Y
If Independent
Pr(height ≤ 50, Genotype= B1B1) = Pr(height ≤ 50) x Pr(Genotype= B1B1)
= 0.30 x 0.16 = 0.0480
B1B1
B1B2
B2B2
Marginal
Prob.
height ≤ 50 cm
0.0480
0.1440
0.1080
0.30
50 < height ≤ 75
0.0512
0.1536
0.1152
0.32
height > 75 cm
0.0608
0.1824
0.1368
0.38
Marginal Prob.
0.16
0.48
0.36
1.00
Plant Height
Joint Probability (observed): Pr(height ≤ 50, Genotype= B1B1) = 0.10
Bayes’ Theorum (Bayes’ Rule)
•
•
Pr( A, B )
Pr( A B ) 
Pr(B )
Conditional probability
Bayes’ Theorum
Pr(B A)Pr  A 
Pr( A B) 
Pr(B)
.
Pr(A) is called the prior probability
Pr(A|B) is called the posterior probability
Pr( Aj B ) 
Pr(B Aj )Pr  Aj 
k
 Pr(B A )Pr  A 
i 1
i
i
Bayes’ Theorum Example
• Pr(A) = 0.20 is the probability that a plant will get a disease
(prior)
•
•
•
Pr(B) = 0.30 is the frequency of a genetic marker
Pr(B|A) = 0.60 is the frequency of the marker that is observed
in a sample of diseased plants
We would like to know the chance of .getting the disease if a
plant has the marker (posterior probability)
Pr(B A)Pr  A  (0.6)  0.20  0.12
Pr( A B) 


 0.40
Pr(B)
0.3
0.3
There is a 40% probability that a plant will get the disease if it has the marker
Discrete probability distributions
•
Let X be a discrete random variable that can take on
a value Xi, where i = 1, 2, 3,… A countable number of values
•
The probability distribution of X is described by
specifying Pi = Pr(Xi) for every possible value of Xi
•
•
•
0 ≤ Pr(Xi) ≤ 1 for all values of Xi
ΣiPi = 1
The expected value of X is E(X) = ΣiXiPr(Xi) =X
What would the probability distribution be for rolling a single die?
(this is an example of a uniform distribution)
What would the expected value be?
Binomial Probability Function
•
A Bernoulli random variable can have a value of one or zero.
The Pr(X=1) = p, which can be viewed as the probability of
success. The Pr(X=0) is 1-p.
•
A binomial distribution is derived from a series of
independent Bernouli trials. Let n be the number of trials and
y be the number of successes.
– Calculate the number of ways to obtain that result:
n!
n
y  y! (n  y )!
– Calculate the probability of that result:


n!
n y
y
Pr( y ) 
p 1  p
y! (n  y )!

Probability Function
Binomial Distribution
Average = np = 20*0.5 = 10
Variance = np(1-p) = 20*0.5*(1-0.5) = 5
Probability
Binomial Distribution (n=20, p=0.5)
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Number of successes
For a normal distribution, the variance is independent of the mean
For a binomial distribution, the variance changes with the mean
Binomial Example
•
Patricia has developed some doubled haploids of barley from
a cross between two purelines.
• She needs at least 3 plants that have a particular SNP
marker found in one of the parents.
•
She decides to grow 10 plants because that seems like
plenty, and space in the greenhouse is limited.
•
Assuming that all of her plants survive, what are the chances
that she will meet her objective?
Mean and variance of linear functions
•
Mean and variance of a constant (c)
E (c )  c
 0
2
c
•
Adding a constant (c) to a random variable Xi
E( X  c )  E( X )  c
the mean increases by the
value of the constant

the variance remains the same
2
X c

2
X
Mean and variance of linear functions
•
Multiplying a random variable by a constant
E (cX )  c [E ( X )]
multiply the mean by the constant
2
 cX
 c 2 X2
multiply the variance by the
square of the constant
Adding two random variables X and Y
E ( X  Y )  E ( X )  E (Y )
mean of the sum is the sum
of the means
 (2X Y )  VAR( X )  VAR(Y )  2COVXY
variance of the sum  the sum of the variances if the variables
are independent
Variance - definition
• The variance of variable X
V ( X )  E ( X i  X )2   E ( X i2 )  X2
• Usual formula
X


 
2
X
i
 X
n
2
2


X



i
  X i2 

n




n
• Formula for frequency data (weighted)
2
2
V ( X )   fi X i - x
Covariance - definition
• The covariance of variable X and variable Y
Cov ( X ,Y )  E ( X   X )(Y  Y )  E ( XY )   X Y
• Usual formula

XY

X i  Yi 

  X i Yi 

n
X i  X  Yi  Y  





n
n
• Formula for frequency data (weighted)
Cov ( X , Y )   XY   f i X iYi -  X Y
Linear Regression and Correlation
X


b
i
 X  Yi  Y 
X
i
X

2

SCPX,Y
SSX

 X,Y
Y
2X
X
X  X  Y  Y  
SCP 
 





 


SS

2
i
2
Re g
2
i
X  X
X,Y
r
X
i
X  X Y  Y
i
i
2
X
X
 X  Yi  Y 
2
X,Y
2
i
2

SCPX,Y
SSX SSY
2

 X,Y
2X 2Y
Download