Bio/statistics Handout 9: Common probability functions

advertisement
Bio/statistics Handout 9: Common probability functions
There are various standard probability functions that serve as standard models to
compare with when trying to decide when a given phenomena is ‘unexpected’ or not.
a) What does ‘random’ really mean? Consider a bacterial cell that moves one cell
length, either to the right or left, in each unit of time. If we start some large number of
these cells at x = 0 and wait t units of time, we can determine a function, pt(x), which is
the fraction of bacteria that end up at position x  {0, ±1, ±2, …} at time t. We can now
ask: What do we expect the function x  pt(x) to look like?
Suppose that we think that the bacteria is moving ‘randomly’. Two questions
then arise:
How do we translate our intuitive notion of the English term ‘random’ into a
prediction for pt(x)?
Granted we have a prediction, for each t and x, then how far must pt(x) be from its
predicted value before we must accept the fact that the bacteria is not moving
randomly?
(9.1)
These questions go straight to the heart of what is called the ‘scientific method’. We
made a hypothesis: The bacteria moves left or right at random’. We want to first
generate some testable predictions of the hypothesis (the first point in (9.1)), and then
compare these predictions with experiment. The second point in (9.1) asks for criteria to
use to evaluate whether the experiment confirms or rejects our hypothesis.
The first question in (9.1) is the provenance of ‘probability theory’ and the second
the provenance of ‘statistics’. This handout addresses the first question in (9.1), while
aspects of the second are addressed in subsequent handouts.
b) A mathematical translation of the term ‘random’
To say that an element is chosen from a given set at ‘random’ is traditionally
given the following mathematical definition:
Probabilities are defined using the probability function that assigns all elements the same
probability. Thus, if the set has some N elements, then the probability of any given
element appearing is N1 .
(9.2)
The probability function that assigns this constant value to all elements is called the
uniform probability function.
Here is an archetypal example: A coin has probability 12 for landing heads up
and so probability 12 for landing tails up. The coin is flipped T times. Our sample space
is the set S that consists of the N = 2T possible sequences (±1, ±1, …, ±1), where +1 is in
the k’th spot when the k’th flip landed heads up, while -1 sits in this slot when the k’th
flip landed tails up.
Note that after setting T = t, this same sample space describes all of the
possiblities for the moves of our bacteria from Section a), above.
You might think that the uniform probability distribution is frightfully dull—after
all, how much can you say about a constant?
c) Some standard counting solutions
The uniform probability distribution becomes interesting when you consider the
probabilities for certain subsets, or the probabilities for the values of certain random
variables. To motivate our interest in the uniform probability distribution, consider first
its appearance in the case of bacteria. Here, our model of random behavior takes S as just
described. Now define f so that f(1,···,N) = 1+2+···+N and ask for the probabilities
of the possible values of f. With regards to bacteria, f tells us the position of the bacteria
after N steps in the model where the bacteria moves right or left with equal probability at
each step. The probabilities for the possible values of f provide the theoretical
predictions for the measured function pt(x).
In general, if S has N elements and a subset K  S has k elements, then the
uniform probability distribution gives probability Nk to the set K. Even so, it may be
some task to count the elements in any given set. Moreover, the degree of difficulty may
depend on the manner in which the set is described. There are, however, some standard
counting formula available to facilitate things.
For example, let S denote the sample space with 2N elements as described in
Section a) above. For n  {0, 1, . . . , N}, let Kn denote the subset of elements in S with n
occurrences of +1, thus with N-n occurrences of –1. Note that Kn has an alternate
description, this as the set of elements on which f(1, . . ., N) = 1+2+···+N has value
2n-N.
In any event, here is a basic fact:
The set Kn has
N!
n!(N n)!
members.
(9.3)
In this regard, remember that k! is defined for any positive integer k as k(k-1)(k-2)···1.
Also, 0! is defined to by equal to 1. For those who don’t like to take facts without proof, I
explain in the last section below how to derive (9.3) and also the formulae that follow
N!
By the way, n!(N
n)! arises often enough in counting problems to warrant its own
symbol, this
 N
.
 n
(9.4)
Here is another standard counting formula: Let b ≥ 1 be given, and let S denote
the set of bN elements of the form (1, . . . , N) where each k is now {1, . . ., b}. If N >
b, then there are no elements in S with no two k and k´ are alike. If N ≤ b, then such
elements can be present. Fix b and let Eb denote the subset of those N-tuples (1, …, N)
where k ≠ k´ when k ≠ k´.
The set Eb has
b!
(b N)!
members
(9.5)
The case n = N in (9.5) provides the following:
There are N! ways to order a set of N distinct elements.
(9.6)
Here, a set of elements is ‘ordered’ simply by listing them one after the other. For
example, the set that consists of 1 apple and 1 orange has two orderings, (apple, orange)
and (orange, apple).
d) Some standard probability functions
The counting formulae just presented can be used to derive some probability
functions that you will almost surely see again and again in your scientific career.
The Equal Probability Binomial: Let S denote the sample space with the 2N
elements of the form (1, . . ., N) where each k can be 1 or –1. For any given integer n
 {0, 1, . . . , n}, let Kn denote the event that there are precisely n occurrences of +1 in
the N-tuple (1, . . . , N). Then the uniform probability function assigns Kn the
probability
P(n) = 2
N!
n!(N n)!
.
(9.7)
The fact that Kn has probability P(n) follows from (9.3).
The assignment of P(n) to an integer n defines a probability function on the N+1
element set {0, 1, …, N}, this a probability function that is decidedly not uniform. We
will investigate some of its properties below.
Equation (9.7) can be used to answer the first question in (9.1) with regards to our
random bacteria in Section a). To see how this comes about, let S denotes the sample
space with the 2N elements of the form (±1,±1, …, ±1). Let f again denote the random
variable that assigns 1+···+N to any given (1, . . . , N). Then, the event f = 2n-N is
exactly our set Kn. This understood, then P(f = x) = 0 unless both N and x are even, or
both are odd, in which case
P(f = x) = 2N
N!
Nx
( Nx
2 )!( 2 )!
.
(9.8)
If we believe that the bacteria chooses left and right with equal probability, then we
should be comparing our experimentally deterimined pt(x) with the N = t version of (9.8).
The Binomial Probability Function: The probability function P in (9.7) is an
example of what is called the binomial probability function on the set {0, . . . , N}. The
‘generic’ version of the binomial probability distribution requires the choice of a number,
q  [0, 1]. With q chosen, the probability q-version assigns the following probability to
an integer n:
Pq(n) =
N!
n!(N n)!
qn (1-q)N-n .
(9.9)
The latter can also be seen as stemming ultimately from a uniform distribution. In
particular, when q is a rational number, say ab where a  {0, … , b}, this comes about in
the following manner: Take the bN element sample space of N-tuples {1, . . ., N} where
each k can be any integer from the set {1, . . . b}. Give this sample space the uniform
distribution. Thus, each of its elements has probability bN. Now, define a function, ga,
on this sample space that assigns to any given such N-tuple the number of its entries k
that obey k ≤ a. For example, g1 counts the number of entries that equal 1, while g2
counts the number that are either 1 or 2. As is explained next, Pa/b(n) is the probability
as decreed by the uniform probability function that the random variable ga takes value n.
To see how the q = ab version of (9.9) comes about, use Sb to denote the just
described sample space with bN elements. Let S denote our old sample space with the 2N
n-tuples of the form (±1, …, ±1). Now, define a map, Fa, from Sb to S as follows: Take
Fa(1, . . ., N) = (1, . . ., N) where each k is set equal to +1 when k ≤ a. Otherwise,
k = -1. This understood, the random variable ga has value n on any given element if
and only Fa maps the element to the set Kn. Now, there are precisely an (b-a)N-n points in
Sb that map to any given element in Kn. This is because there are a possibilities for each
k where the corresponding k is +1, and (b-a) possiblities for each k where the
corresponding k is –1. Since there are an(b-a)N-n elements in Sb that map to any given
N!
element in Kn and n!(N
n)! elements in KN, so the set of points in Sb where g = n must have
probability
N!
n
N-n
bN n!(N
.
n)! a (b-a)
b N
b n
b(N-n)
(9.10)
and then
To obtain the q = version of (9.9), take the factor
and write it as
n n
n
(N-n)
N-n
N-n


write b a as q while writing b
(b-a) as (1-q) .
Even in the case that q is not rational, the probability function in (9.9) arises from
a related problem on a sample space with 2N elements. Take the sample space to be our
a
b
friend S whose elements are the N-tuples with elements (±1, …, ±1). However, now take
not the uniform probability, but the probability function whereby +1 occurs in any given
entry with probability q and –1 occurs with probability (1-q). Define the random
variable, g, to be assign to any given element the number of appearances of +1. Then
(9.8) gives the probability that g = n. To see why, note that the event that g = n is just
our set Kn. With this new probability, each element in Kn has probability qn (1-q)N-n. As
(9.3) gives the number of elements in Kn, its probability is therefore given by (9.9).
The probability function in (9.9) is relevant to our bacterial walking scenario
when we make the hypothesis that the bacteria moves to the right at any given step with
probability q, thus to the left with probability 1-q. I’ll elaborate on this in a subsequent
handout.
The Poisson probability function: This is a probability function on the sample
space,  = {0, 1, . . . }, the non-negative integers. As you can see, this sample space has
an infinite number of elements. Even so, I trust that you find it reasonable that we define
a probability function on  to be a function, f, with f(n)  (0, 1) for each n, and such that
∑n=0,1,… f(n) = f(1) + f(2) + ···
(9.11)
is a convergent series with limit equal to 1. As with the binomial probability function,
there is a whole family of Poisson functions, one for each choice of a positive real
number. Let  > 0 denote the given choice. The  version of the Poisson function assigns
to any given non-negative integer the probability
P(n) =
1
n!
n e .
(9.12)
You can see that ∑n=0,1,… P(n) = 1 if you know about power series expansions of
the exponential function. In particular, the function   e has the power series
expansion
e = 1 +  +
1
2
2 +
1
6
3 + · · · +
1
n!
n + · · · .
(9.13)
Granted (9.13), then ∑

= (∑
)
=
= 1.
The Poisson probability enters when trying to decide if an observed pattern is or is
not random. For example, suppose that on average, some number, , of newborns in the
United States carry exhibit a certain birth defect. Suppose that some number, n, such
births are observed in 2004. Does this constitute an unexpected clustering that should be
investigated? If the defects are unrelated and if the causative agent is similar in all cases
over the years, then the probability of n occurences in a given year should be very close
to the value of the  =  version of the Poisson function P(n).
1
n=0,1,… n !
n
e
1
n=0,1,… n !
n
e
e
e
The Poisson function is an N  ∞ limit of the binomial function. To be more
precise, the  version of the Poisson probability P(n) is the N  ∞ limit of the versions
of (9.9) with q set equal to 1-e/N. This is to say that
1
n!
n e = limN∞
N!
n!(N n)!
(1-e/N)n e(/N)(N-n) .
(9.14)
The proof that (9.14) holds takes us in directions that we don’t have time for here. Let
me just say that it uses the approximation to the factorial known as Stirling’s formula:
ln(k!)  k ln(k) – k
(9.15)
e k .
with the error of size
I give some further examples of how the Poisson function is used in a separate
handout.
c) Means and standard deviations
Let me remind you that if P is a probability function on some subset S of the set
of integers, then its mean is
  ∑nS n P(n)
(9.16)
and the square of its standard deviation, , is
2 = ∑nS (n-)2 P(n) .
(9.17)
In this regard, keep in mind that when S is an infinite number of elements, then  and 
are defined only when the corresponding sums on the right sides of (9.16) and (9.17) are
those of convergent series. The mean and standard deviation characterize any given
probability function to some extent. More to the point, both the mean and standard
deviation are often used in applications of probability and statistics.
The mean and standard deviation for the binomial probability on {0, 1, . . . , N}
are:
 = Nq and
2 = N q (1-q)
(9.18)
For the Poisson probability function on {0, 1, … }, they are
 =  and
2 =  .
(9.19)
I describe a slick method for computing the relevant sums in the next section.
To get more of a feel for the binomial probability function, note first that the
mean for the q = 12 version is N2 . This conforms to the expectation that half of the entries
in the ‘average’ N-tuple (1, . . . , N) should be +1 and half should be –1. Meanwhile in
the general version, the assertion that the mean is Nq suggests that the fraction q of the
entries in the ‘average’ N-tuple should be +1 and the fraction (1-q) should be –1.
To get a sense for the standard deviation, one can ask for the value of n that
makes Pq(n) largest. To see where this is, note that
Pq(n+1)/Pq(n) =
N n
n1
q
1 q
.
(9.20)
This ratio is less than 1 if and only if
n < Nq – (1-q) .
(9.21)
Since (1-q) < 1, this then means that the ratio in (9.20) peaks at a value of n that is within
±1 of the mean.
As I remarked in Handout 3, the standard deviation indicates the extent to which
the probabilities concentrate about the mean. To see this, consider the following basic
fact:
Theorem: Suppose that P is a probability function on a subset of {…, -1, 0, 1, , …} with
a well defined mean and standard deviation. For any R ≥ 1, the probability assigned to
the set where |n| > R is less than R2.
For example, this says that the probability of being 2 away from the mean is less than
1
1
4 , and the probability of being 3 away is less than 9 .
This theorem justifies the focus in the literature on the mean and standard
deviation, since knowing these two numbers gives you rigorous bounds for probabilities.
The probability bound stated in the theorem is known as the Chebychev
inequality. Here is the proof: Let S denote the sample space here, and let E  S
denote the set where |n| > R. The probability of E is then ∑nE P(n). However, since
|n-| > R for n  E, one has
1≤
|n | 2
R2  2
(9.22)
on E. Thus,
∑nE P(n) ≤ ∑nE
|n | 2
R2  2
P(n) .
(9.23)
To finish the story, note that the right side of (9.23) is even larger when we allow the sum
to include all points in S instead of restricting only to points in E. Thus, we learn that
∑nE P(n) ≤ ∑n
|n | 2
R2  2
P(n)
(9.24)
The definition of  from (9.17) can now be invoked from (9.17) to identify the sum on
the right hand side of (9.24) with R2.
2
d) Characteristic polynomials
The slick computation of the mean and standard deviation that I mentioned
involves the introduction of the notion of the characteristic polynomial. The latter is an
often useful way to encode any given probability function on a subset of {…, -1, 0, 1,
…}. In the case when the subset is {0, 1, 2, … , N} and the probability function is the
binomial function from (9.9), the characteristic polynomial is the function of x given by
(x) = Pq(0) + x Pq(1) + x2 Pq(2) + · · · + xN Pq(N) .
(9.25)
Thus,  is a degree N polynomial in the variable x. As it turns out, the polynomial in
(9.25) can be factored completely:
(x) = (q x + (1-q))N
(9.26)
Indeed, to see why (9.26) is true, consider multiplying out an N-fold product of the form:
(a1 x + b1)(a2 x + b2) ··· (aN x + bN)
(9.27)
A given term in the resulting sum can be labled as (1, . .. , N) where k = 1 if the k’th
factor in (9.27) contributed akx, while k = -1 if the k’th factor contributes bk. The power
N!
of x for such a term is equal to the number of k that are +1. This is the n!(N
n)! , the
N!
number of elements in the set Kn that appears in (9.3). Thus, n!(N n)! terms contribute to
the coefficient of xn in (9.27). In the case of (9.26), all versions of ak are equal to q, and
all versions of bk are equal to (1-q), so each term that contributes to xn is qn(1-q)N-nxn. As
N!
n
there are n!(N
n)! of them, the coefficient of x in (9.25) is Pq(n) as claimed.
Now, in general the characteristic polynomial for a probability function, P, on a
subset of {0, 1, …} has the form
(x) = P(0) + P(1) x + P(2) x2 + ··· = ∑n P(n) xn .
(9.28)
Here are two of the salient features of this polynomial: First, the values of  and its
derivative and second derivative at x = 1 are:
1 = (1)
 = ( dxd ) x-1
2
2 = ( dxd 2 ) |x=1 - ( - 1) .
(9.29)
Second, the values of the value of , its derivative and its higher order derivatives at x =
0 determine P since
1
n!
( dxd n ) x=0 = P(n) .
n
(9.30)
To explain (9.29), note that (1) = P(0) + P(1) + ··· = 1. Meanwhile, the derivative of
 at x = 1 is 1·P(1) + 2·P(2) + ···, and this is the mean . With the help of (9.17), a very
similar argument establishes the third point in (9.29).
In the case of the binomial distribution,
d
dx
 = Nq (qx + (1-q))N-1
(9.31)
Set x = 1 here to find the mean equal to Nq as claimed. Meanwhile
d2
dx 2
(x) = N(N-1)q2 (qx + (1-q))N-2 .
(9.32)
Set x = 1 here finds the right hand side of (9.29) equal to
N(N-1)q2 – N2q2 + Nq = N q(1-q)
(9.33)
which is the asserted value for  .
For the Poisson probability function, the characteristic polynomial is the infinite
power series
2
xnn + · · · ) e .
(9.34)
x
As can be seen by replacing  in (9.13) with x, the sum on the right here is e . Thus,
P(0) + x P(1) + x2 P(2) + ··· = (1 + x +
1
2
x22 +
1
6
x33 + · · · +
1
n!
(x) = e(x-1) .
(9.35)
In particular, the first and second derivatives of this function at x = 1 are both equal to .
With (9.29), this last fact serves to justify the claim that the both the mean and standard
deviation for the Poisson probability are equal to .
The characteristic polynomial for a probability function is often used to simplify
seemingly hard computations in the manner just illustrated.
e) Loose ends about counting elements in various sets
My purpose in this last section is to explain where the formula in (9.3) and (9.5)
come from. To start, consider (9.5). There are n choices for 1. With 1 chosen, there
are n-1 choices for 2, one less than that for 1 since we are not allowed to have these two
equal. Given choices for 1 and 2, there are n-2 choices for 3. Continuing in this vein
finds n-k choices available for k+1 if (1, . . ., k) have been chosen. Thus, the total
number of choices is n·(n-1)····(n-N+1), and this is the claim in (9.5).
To see how (9.3) arises, let me introduce the following notation: Let mn(N)
denote the number of elements in the (1, . . . , N) version of Kn. If we are counting
elements in this set, then we can divide this version of Kn into two subsets, one where 1
= 1 and the other where 1 = -1. The number of elements in the first is mn-1(N-1) since
the (N-1)-tuple (2, . . . , N) must have n-1 occurrences of +1. The number in the second
is mn(N-1) since in this case, the (N-1)-tuple (2, . . . , N) must have all of the n
occurrences of +1. Thus, we see that
mn(N) = mn-1(N-1) + mn(N-1) .
(9.36)
This formula looks much like a matrix equation. Indeed, fix some integer T ≥ 1
and make a T-component vector, m (N), whose coefficients are the values of mn for the
cases that 1 ≤ n ≤ T. This equation asserts that m (N) = A m (N-1) where A is the matrix
with Ak,k and Ak,k-1 both equal to 1 and all other entries equal to zero. Iterating this
equation then finds
m (N) = AN-1 m (1) ,
(9.37)
where m (1) is the vector with top component 1 and all others equal to zero.
Now, we don’t have the machinery to realistically compute AN-1, so instead, lets
just verify that the expression in (9.3) gives the solution to (9.36). In this regard, note
that m (N) is uniquely determined by (9.37) for each N > 1 from m (1) and so if we
believe that we have a set { m (1), m (2), …} of solutions, then we need only plug in our
candidate and see if (9.37) holds. This is to say that in order to verify that (9.3) is the
correct, one need only check that the formula in (9.34) holds. This amounts to verifying
that
N!
n!(N n)!
=
(N1)!
(n 1)!(N n)!
+
(N 1)!
n!(N  n1)!
.
(9.38)
I leave this to you as an exercise.
Exercises:
 Let A denote the 44 version of the matrix in (9.37). Thus,
1
1
A =
0

0
0
1
1
0
0 0
0 0
 .
1 0

1 1
a) Present the steps of the reduced row echelon reduction of A to verify that it is
invertible.
b) Find A1 using Fact 2.3.5 of LA&A
2. Let  denote a fixed number in (0, 1). Now define a probability function, P, on the set
{0, 1, 2, …} by setting P(n) = (1-) n.
a) Verify that P(0) + P(1) + ··· = 1, and thus verify that P is a probability function.
b) Sum the series P(0) + x P(1) + x2 P(2) + ··· to verify that the characteristic
(1 )
function is the (x) = (1
x ) .
c) Use the formula in (9.29) to compute the mean and standard deviation of P.
d) In the case  = 12 , the mean is 1 and the standard deviation is √2. As 6 ≥  + 3,
the Theorem in Section c), asserts that the probability for the set {6, 7, …} should
be less than 19 . Verify this prediction by summing P(6) + P(7) + ···.
e) In the case  = 23 , the mean is 2 and  = √6. Verify the prediction of the Theorem
in Section c) that {7, 8, …} has probability less than 14 by summing P(7) + ···.
 This exercise fills in some of the details in the verification of (9.3).
a) Multiply both sides of (9.38) by (n-1)!(N-n-1)! and divide both sides of the result
by (N-1)! Give the resulting equation.
b) Use this last result to verify (9.38).
Download