The Theory of Probability

advertisement
The Theory of Probability
Chapter 5, Sections 1 and 2
Probability is the name of a branch of mathematics which deals with random variation.
Probability is also the name of a numerical measure of the likelihood of occurrence of an event
where occurrences are random. Probabilities are numbers in the interval (0, 1). Zero signifies
impossible to occur, one signifies certain to occur, and one-half indicates a 50-50 chance.
Statistics theory is based on Probability theory. The likelihood of various outcomes when a
random sample is taken from a population are stated as probabilities.
A major concept in theory is that of a random variable. Every random variable X is defined on a
specified population. X obtains its value by taking a random sample of size 1 from the population
(i.e., getting a number (element) such that every element is equally likely to be the one selected).
We say that this is the value assumed by the random variable.
Populations on which random variables are defined fall into two categories. Discrete
populations are such that the set of unique values in the population do not constitute a continuum
(Binomial populations for example). Continuous populations are such that the unique elements
in the population form a continuum (Standard Normal population for example).
Any random variable defined on a discrete population is called a discrete random variable, and
we may consider the probability that the random variable assumes any one of the unique values
in the population. Any random variable defined on a continuous population is called a
continuous random variable, and we may consider the probability that the random variable
assumes a value in any specified interval of the real line. In each case, the probability is the
proportion of population elements satisfying the condition.
Populations and associated random variables that arise in theory (i.e., theoretical populations)
have accompanying functions which are used to find probabilities. Each discrete random
variable has an associated probability function which is evaluated to give probability. Each
continuous random variable has an associated probability density function which is integrated to
give probability relative to an interval.
Suppose X is a discrete random variable having probability function f(x), and Y is a continuous
random variable having probability density function h(y). We use the notation P(X = x) and P(Y
∈ (a, b) ) for the probabilities that, respectively, X assumes the value x and Y assumes a value in
the interval (a, b). Thus
P( X = x) = f ( x)
P (Y ∈ (a, b) ) =
b
∫a
h( y )dy .
Example
Consider the discrete population having proportions of values as follows:
Value
0
1
2
Proportion
1/10
7/10
2/10
Thus the unique elements of this population are 0, 1, and 2. Seven tenths of the elements are 1.
Define a random variable X on this population. X is a discrete random variable. It can assume
value 0, 1, or 2. The likelihood (probability) of each outcome is the associated proportion. Thus
x
0
1
2
P(X = x) = f(x)
1/10
7/10
2/10
The probability function f(x) in this example is defined by the table of proportions.
*
*
*
*
In the case of discrete random variables, the probability that the random variable assumes one of
several given possible values x1, x2, …, xn is the proportion of population elements which are x1,
x2, …, and xn . This probability is found as f(x1) + f(x2) + … + f(xn).
Thus for the previous example P(X = 0 or X = 1) = 1/10 + 7/10, P(X = 0 or X = 2) = 3/10, P(X =
0 or X = 1 or X = 2) = 1, and P(X = 3) = 0.
Example
Consider the Standard Normal population. Define the random variable Z on this population. Z is
called a Standard Normal random variable. It is a continuous kind of random variable. The
probability that Z assumes a value in a given interval (a, b) is found by integrating the probability
density function φ (z) defined as
φ ( z) =
1
2π
Thus
2
e− z
2
/2
P(Z ∈ (0, 1) ) =
1
2π
P(Z ∈ (−∞, ∞) ) =
1
2π
1
∫0
e− y
∞
∫−∞
2
/2
dy
2
/2
e− y
dy
= 1
P(Z ∈ (0, ∞) ) =
1
2π
∞
∫0
e− y
2
/2
dy
1
2
=
P(Z ∈ (2, 2) ) =
1
2π
2
∫2
e− y
2
/2
dy
= 0
Table B.3 gives probabilities for intervals (−∞, z).
Example
Your textbook, on page 232, defines what I will say is a Binomial Experiment. This is an
activity which consists of a set of identical trials. The trials are such that the outcome of any one
is unpredictable and
1. Each trial results in one of two possible outcomes. Call the outcomes “success” and
“failure” for lack of better names.
2. The trials are independent in the sense that the outcome of any one trial (success or
failure) is in no way connected to the outcome of any other trial.
You can see that this kind of experiment will fit many different situations.
Mathematical theory gives the result that the number of successes obtained, in a Binomial
Experiment consisting of n trials, is the value assumed by a Binomial (n, p) random variable.
The value of p is the probability of success in a trial. Define, for a Binomial (n, p) experiment,
the Binomial random variable
X = the number of successes obtained.
3
That is to say define X on a Binomial (n, p) population.
Then we have, for this discrete random variable, that
f ( x) = P( X = x) =
n!
p x (1 − p ) n− x ,
x!(n − x)!
for x = 0, 1, 2, ..., n .
End of chapter exercises which feature a Binomial Experiment include 5.1, 5.12(c), 5.15(b),
5.45(b) and 5.46(b).
The related Geometric random variable is defined on page 237. Its value is the number of trials
required to obtain the first success. The Geometric distribution applies in this case.
*
*
*
*
It is important to note that a probability is always a proportion of population elements.
Continuing along this line, and thinking in terms of quantiles in a population, consider the
Cumulative Probability Function F(x) of a random variable X. This function is defined as
F ( x) = P ( X ≤ x) = p
(i.e., F(x) is the proportion p of population elements that don't exceed x). Thus x is by definition
the p quantile of the population, so x ≡ Q(p) using the notation from chapter 3. Thus,
summarizing we have
F (Q( p ) ) = p
Q (F ( x ) ) = x
Q = F −1
4
Probability Theory, Subsection 5.5.5
Sampling Random Variables, Sample Mean Random Variable, and the
Central Limit Theorem
When a random sample is taken from a population P we can imagine that it is drawn:
1. Without replacement, i.e., each selected element is not a candidate for subsequent
selections. Each draw after the first is from a subset of the entire population.
or
2. With replacement, i.e., each draw is made from the entire population.
If the population P is very large compared to the sample size, as a practical matter, it doesn't
matter which sampling method is used because the likelihood of drawing any given element two
or more times is negligible. Let us henceforth assume that any population of interest to us is
very large, and we sample with replacement. Furthermore, let us say that the mean and variance
parameters of P are μ and σ 2.
The outcome of a random sample of size n taken with replacement from P can be thought of as
the values assumed by n random variables each defined on P. Call these random variables, X1,
X2, …, Xn. Since there is no connection between value assumed by any two of these random
variables, we say they are independent. Also, since each Xi is defined on the same population P,
all of the random variables have the same probability distribution. The X1, X2, …, Xn are said to
be independently and identically distributed which we abbreviate (iid). The random variables
X1, X2, …, Xn are called sampling random variables.
Sometimes in real applications, we are able to only observe one value taken at random from a
continuous population of interest, and our question is of the kind “how likely is it that when we
get the value it will be in the interval (say) (1.65, 2.13)?”. Thus, the situation is that we can take
a random sample of size n = 1. The mathematical model is a single sampling random variable X1
≡ X (no need for a subscript here). In order to answer our question we need to compute P[X ∈
(1.65, 2.13)]. This is only possible if we know the probability distribution of X, i.e. can compute
the proportion of population elements that lie in any given interval. In almost every instance this
requires that we assume that our population of interest is sufficiently like a known theoretical
population so that we can reasonably assume our sample of n = 1 comes from that theoretical
population. End of chapter exercises 5.1(a), 5.12(a), 5.15(a), 5.24, 5.30(a), 5.36(a), 5.42(a),
5.44(a) and 5.45(a) all fit this situation.
Now consider the sample mean x for a random sample of size n from P. We may think of x as
the value assumed by the random variable X which is the linear combination
X =
1
( X 1 + X 2 + ... + X n )
n
of sampling random variables.
5
We have said that every random variable is defined on some population. Consider the
population on which X is defined. Call it the derived population Pd. Obviously the elements in
Pd are all the possible means x of subsets of n elements from P, selecting subsets with
replacement. If, for example, N is the number of elements in P, then the number of elements in
Pd is:
n
2
3
4
-etc-
# elements in Pd
N2
N3
N4
The population mean and variance of Pd are given in Equations (5.55) and (5.56) on page 309 as
EX =
1
n
n
∑
i =1
E Xi = μ
⎛1⎞
Var X = ⎜ ⎟
⎝n⎠
2
2
∑
i =1
Var X i = σ 2 / n
This uses the facts that EXi = μ, Var Xi = σ .
2
To summarize, we may think of the mean of a random sample of size n (from a population P) as
the value assumed by a random variable X defined on the derived population Pd. We have that
E X = μ and Var X = σ 2/n where μ and σ 2 are parameters of P.
What is the probability distribution of X ? I.e., what are the proportions of occurrences of
values in Pd? Theory gives us that if P is a Normal (μ, σ 2) population then Pd is a Normal (μ,
σ 2 /n) population. That is to say X ~ N ( μ, σ 2 /n) when Xi ~ N( μ, σ 2), i = 1, 2, …, n and the Xi
are independent. When P is not a Normal population, and the sample size is large (say n ≥ 25)
then the following theorem called the Central Limit Theorem says that Pd is approximately a
Normal population and X is approximately a Normal (μ, σ 2 /n) random variable (see
Proposition 3, page 316) regardless of the distribution of elements in P.
* * * *
Central Limit Theorem
If X1, X2, …, Xn are iid random variables (with mean μ and variance σ 2), then for large n, the
random variable X is approximately normally distributed ( μ, σ 2 /n).
* * * *
6
I will write this approximation as X ≈ N (μ, σ 2 /n).
As an example consider the outcomes of n identical Binomial (t, p) experiments. Each
experiment results in a number of successes which we view as the value assumed by a Binomial
1 n
(t, p) random variable Bi. The mean outcome for the n experiments is B = ∑i =1 Bi . We know
n
(from Equations (5.4) and (5.5), page 236) that EBi = tp and Var Bi = tp(1 − p), so E B = tp and
Var B = tp(1 − p) /n. Then according to the Central Limit Theorem, if n is large
B
B
B ≈ N (tp, tp(1 − p) / n )
The Central Limit Theorem suggests that the Normal distribution is generally of great value.
This theorem gives one of the most amazing results in all of mathematics (in my opinion).
End of chapter exercises 5.12(b), 5.13(b), 5.15(c), 5.20, 5.22(b), 5.36(b), 5.43(b), 5.44(b) and
5.45(d) all illustrate computation of, or approximation of, probability for X .
The general term “Sampling Distribution” is used to refer to the probability distribution of a
random variable which is a function of sampling random variables. The distribution of random
variables X and (n − 1)S 2/σ 2 are examples of Sampling Distributions. Names of theoretical
Sampling Distributions which we will encounter are Student’s t, chi-Squared, and F.
7
Download