1MA01: Probability Sinéad Ryan November 29, 2013 TCD

advertisement
1MA01: Probability
Sinéad Ryan
TCD
November 29, 2013
Binomial theory
If a probability problem asks for the the number of
successful outcomes when a trial is repeated n times it
is described by a Binomial (n,p) distribution.
Definition
a binomial experiment is one in which
a trial is repeated n times
trials are independent
the outcome is either success or failure
each trial has a probability of success, p and a
probability of failure q = 1 − p
the goal of each experiment is to count the number
of successes in n trials
Example: rolling a die, repeated 3 times, ask how many
sixes are rolled. This is a binomial experiment since
the rolls are independent
each roll is either success (ie a 6) or failure (ie not a
6)
the probability of success in each roll is p = 1/ 6 and
failure q = 5/ 6
n=3
we are counting how many successes (sixes)
Random variables
A random variable X is a variable whose value may
change with a random experiment. Eg. let X by the
number of sixes in 3 rolls of a die. This can change with
different rolls.
A discrete random variable may only assume a finite
number of values. Eg in 3 rolls, X = 0, 1, 2 or 3 ie a
finite number of values.
The possible vlaue of X and their probabilities give a
distribution of X.
Example:
Roll a die once. If it lands on 6 you get $4, otherwise
you lose $1. Let M be the money you win. Find the
distribution of M.
Now, P(M = 4) = 16 and P(M = −1) = 56 and plotting this
distribution gives
0.8
P(M)
0.6
0.4
0.2
0
M=-1
M=4
Binomial (n,p) and Binomial coefficients
Theorem
Let X be the number of successes in a binomial experiment with n trials and probability of success, p. Then
n
P(X = k) =
pk qn−k , k = 0, 1, . . . , n.
k
and
n
k
n!
=
k!(n − k)!
where n! = 1 × 2 × 3 . . . n, n ≥ 1 and 0! = 1.
Calculating binomial coefficients
14
8
6
0
=
=
6!
0!(6 − 0)!
14!
8!(14 − 8)!
=1
= 3003.
see tutorial sheets for more examples.
Calculating probabilities in binomial
experiments
Example 1:
Consider a multiple choice test. Each question has 5
possible answers of which only 1 is correct. A student
guesses on 6 questions. What is the probability that 2
of those guesses are correct?
You can verify this is a binomial expt. with, n = 6,
p = 51 = 0.2 and q = 45 = 0.8. Then,
P(X = 2) =
6
2
p2 q6−2
6
(0.2)2 (0.8)4
2
= 15(0.04)(0.4096)
=
= 0.24576
ie ≈ 25% probability to guess 2 correctly.
Example 2:
Consider a box with 5 tickets inside, each labelled with
a 1 or a 0 as;
0
0
0
0
1
6 tickets are drawn with replacement (ie a ticket is
drawn, the number recorded and replaced before
drawing the next). What is the probability that the sum
of the tickets drawn is two.
You can verify this is a binomial expt. with n = 6,
p = 1/ 5 so then
6
P(X = 2) =
(0.2)2 (0.8)4 = 0.24576.
2
Note: consider the random sampling of a population
(this is akin to drawing from a box without replacement
since once a person has been sampled they are not
asked again in the same poll)
Now, in principle this means that the probability is not
constant since one person is removed from the
experiment each trial. However, this still works if
population is large enough so that removing one does
not change the result by very much.
application in genetics
Consider 2 heterozygous pea-plants eg with alleles (Ff)
which are crossed. Find the probability the offspring will
also be heterozygous.
Let X be the number of F alleles in the offspring. The
probability that each gamete is F is 12 . 2 gametes are
selected from the parents independently so X has a
binomial (n, p) distribution and then
P(X = 1) =
2
1
‚ Œ1 ‚ Œ1
1
1
2
2
=
1
2
Note that here allele means a unique form of a gene so
eg blood has genotypes: AA,AO,BB,BO,OO,AB;
phenotypes: A, B, 0, AB.
Hardy-Weinberg Law
Suppose a trait has 2 alleles A and a and the frequency
of A in sperm and egg gametes in a population is p so
the frequency for a is 1 − p = q.
Now, consider a sperm and egg randomly selected and
crossed. Find the probability of having 0 A, 1 A and 2 A.
2
P(0 A) = P(aa) = P(X = 0) =
p0 q2−0 = q2
0
2
P(1 A) = P(Aa) = P(X = 1) =
p1 q2−1 = 2pq
1
2
P(2 A) = P(AA) = P(X = 2) =
p2 q2−2 = p2
2
Then, eg. of A is dominant then the proportion of the
population showing the dominant phenotype is
p2 + 2pq.
A population with proportions for trait
expression/allele distribution as
p2 : 2pq : q2
is in equilibrium.
This law can be generalised for 3 alleles etc.
Example: albinism, a recessilvely inherited trait with a
rate of approx 1/10,000.
Assuming a population follows the Harvey-Weinberg
distribution find the proportion of heterozygotes and
homozygotes.
P(aa) = q2 = 1/ 10, 000 so q = 0.01 Therefore
p = 1 − q = 0.99.
The probability → P(AA) = (0.99)2 = 0.9801 gives
the proportion of homozygous dominant.
The probability
P(aA) = 2pq − 2(0.99)(0.01) = 0.0198 is the
proportion of heterozygotes (carriers). This implies
the allele for albinism is present but not expressed
in ∼ 2% of the population.
Histograms
If X has a binomial (n, p) = (6, 1/ 2) distribution then
P(X = 0) = 1/ 64
P(X = 1) = 3/ 32
5/16
P(X = 3) = 5/ 16
P(X=k)
P(X = 2) = 15/ 64
15/64
P(X = 4) = 15/ 64
3/32
P(X = 5) = 3/ 32
1/64
P(X = 6) = 1/ 64
0
1
2
3
4
5
6
7
X
rectangle with highest value is called the mode ie
the most likely value
area of a rectangle = height×width =
P(X = P
k) × 1 = P(X = k)
since k P(X = k) = 1 the area of the histogram is 1.
Download