Statistics 2014, Fall 2001

advertisement
1
Chapter 3 – Probability
Defn: An experiment is a test or series of tests in which purposeful changes are made to the input
variables of a process or system so that we may observe and identify reasons for changes in the output
response.
Defn: A random experiment is one whose outcome cannot be predicted with certainty.
Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and
bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone
concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three
replications. In this case, there are two experimental factors – concentration and temperature – with
two levels of concentration and five levels of temperature. The (random) response variable is
reflectivity.
Defn: A set is a collection of elements.
Defn: A sample space is the set of all possible outcomes of a random experiment.
Example: If our random experiment is to flip a fair coin twice, then the sample space is S = {HH, HT,
TH, TT}.
Example: If our random experiment is to select a random sample of size 3 from a class of 23 students,
then the sample space is the set of all possible collections of 3 students from the class. There are 1771
elements in the sample space, corresponding to the number of ways that I could select three people
from the class.
If the random experiment has a finite number of possible outcomes, then the sample space will be
finite, as in the above two examples.
The sample space is said to be discrete if the number of possible outcomes of the random experiment is
either finite or countably infinite.
Example: I want to count the number of traffic accidents occurring on freeways in Duval County in
the course of a year. I cannot say ahead of time what the maximum count might be. Hence my sample
space is countably infinite: S = {0, 1, 2, 3, ....}.
The sample space is said to be continuous if the number of possible outcomes of the random
experiment is uncountably infinite.
Example: The random experiment is to measure the lifetime of a randomly selected AAA battery
coming off an assembly line. The measurement is in hours, and the set of possible values is all points
on an interval of real numbers starting at 0 hours as the left-hand endpoint. There is an uncountably
infinite number of possible values for the lifetime (of course, the set of possible recorded values
depends on the resolution of the measuring instrument).
Defn: Given a set S, another set A is called a subset of S, denoted A  S , if every element of A is
also an element of S.
2
Defn: An event is a subset of a sample space.
Defn: Given a set S, and two subsets, A and B, we say that A = B if
A  B and B  A; i.e., if every element of A is also an element of B and every element of B is also
an element of A.
Defn: Given a set S, and two sets A  S and B  S , we define the union of A and B, denoted by
A  B , to be the set of all elements of S that are either elements of A or elements of B or elements of
both A and B.
Note: If S is the sample space of a random experiment, then A and B are events, and
event that either A or B (or both A and B) happens when we perform the experiment.
A  B is the
Defn: Given a set S, and two sets A  S and B  S , we define the intersection of A and B,
denoted by A  B , to be the set of all elements of S that are elements of both A and B.
Note: If S is the sample space of a random experiment, then A and B are events, and A  B is the
event that both A and B happen when we perform the experiment.
Defn: The empty set, or null set, , is the set that contains no elements.
Note: The null set is a subset of every set.
Defn: Two sets A, B are said to be mutually exclusive if
A B  .
Defn: The complement
A
of a set A  S is
A  x  S : x  A .
Example: Let the random experiment be to flip a fair coin three times. Then the sample space is
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Let A = {exactly 2 heads occur when we flip a fair coin 3 times}, or
A = {HHT, HTH, THH}.
B = {at least one tail occurs when we flip a fair coin 3 times}, or
B = {HHT, HTH, HTT, THH, THT, TTH, TTT}.
C = {the same result occurs for every flip of the coin}, or
C = {HHH, TTT}.
Then 𝐴 ∪ 𝐵 = 𝐵, 𝐴 ∪ 𝐶 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻, 𝑇𝑇𝑇},
𝐴 ∩ 𝐵 = 𝐴, 𝐴 ∩ 𝐶 = 𝜙, 𝐵 ∪ 𝐶 = 𝑆, and 𝐵 ∩ 𝐶 = {𝑇𝑇𝑇}.
We see that events A and C are mutually exclusive - they cannot both happen at the same time. We
also see that A is a subset of B, and that 𝐶 = 𝐵̅ .
A useful way to visualize relationships among events is through the use of Venn diagrams (see pp. 4849).
Counting
3
In many situations, the size of the sample space, or of the events of interest, are so large that special
tools may be needed to find a) the total number of possible outcomes of the random experiment, and b)
the number of ways that a particular even can happen.
Theorem 3.1: If sets 𝐴1 , 𝐴2 , … , 𝐴𝑘 contain, respectively, 𝑛1 , 𝑛2 , … , 𝑛𝑘 elements, then the number of
ways that we can create another set by choosing one element from each of the k sets is
(𝑛1 )(𝑛2 ) … (𝑛𝑘 ).
Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and
bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone
concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three
replications. In this case, there are two experimental factors – concentration and temperature – with
two levels of concentration and five levels of temperature. The (random) response variable is
reflectivity. The number of different possible combinations of sulfone concentration and temperature
is
(2)(5) = 10.
Example: Let the random experiment be to flip a fair coin 10 times.
We want to find the size of the sample space. There are two possible outcomes of each flip of the coin.
Therefore, the total number of possible outcomes (size of the sample space) is
210 = 1024.
Example: p. 51.
Defn: For any positive integer n, we define the factorial function by
𝑛! = (1)(2)(3) ⋯ (𝑛 − 1)(𝑛).
We also define 0! = 1.
Defn: If we have a set of n objects, and we select r (where r ≤ n) of them in a particular order, the
particular arrangement of the objects is called a permutation of n objects taken r at a time.
Theorem 3.2: The number of permutations of n objects selected from a set of n distinct objects
𝑛!
𝑛𝑃𝑟 = (𝑛)(𝑛 − 1) ⋯ (𝑛 − 𝑟 + 1) =
.
(𝑛 − 𝑟)!
Example: p. 52
We are more often concerned with random sampling, i.e., with selecting a subset from a population
without regard to ordering of the subset.
Theorem 3.3: The number of ways in which r objects can be selected from a set of n distinct objects
is
(𝑛)(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑟 + 1)
𝑛!
𝑛
( )=
=
.
𝑟
𝑟!
𝑟! (𝑛 − 𝑟)!
4
Example: I want to estimate the average height of students in the class by using a simple random
sample of size n = 3. I have a population of size N = 23. The number of possible samples that I could
select is
23!
23
( )=
= 1771.
3
3! 20!
1
If I do simple random sampling, then each of these possible samples has an equal chance, 1771 =
0.000564653, of being selected.
Probability
The concept of probability is fundamental to all of statistics, since statistical inference involves
drawing conclusions from incomplete information, implying that there is some degree of uncertainty
about the conclusions. Mathematically, a number called the probability of an event should be a
measure of our assessment of the likelihood of the occurrence of that event when the random
experiment is performed.
There are two primary interpretations of probability:
1) Subjective approach: Probability values are assigned based on educated guesses about the relative
likelihoods of the different possible outcomes of our random experiment. This approach involves
advanced concepts and principles, such as entropy.
2) Relative frequency approach: In this approach to assigning probabilities to events, we look at the
long-run proportion of occurrences of particular outcomes, when the random experiment is performed
many times. This long-run proportion tells us the approximate probability of occurrence of each
outcome.
We will consider only the relative frequency approach, since the other approach involves concepts
(such as entropy and information) that are beyond the scope of this course.
Example: If we flip a coin once, what is the likelihood that the outcome is a head? Why? For a single
coin flip, we cannot say with certainty what the outcome will be. However, if we flip a coin
1,000, 000 times, we are fairly sure that approximately one-half of the outcomes will be heads.
This approach is based on the Law of Large Numbers, which says, in particular, that the relative
frequency of occurrence of a particular outcome of a random experiment approaches a specific limiting
number between 0 and 1 if we perform the experiment a very large number of times.
The earliest work on probability theory as a mathematical discipline considered situations in which all
possible outcomes of a random experiment were equally likely to occur. This classical probability
concept may be stated as follows: if there are m equally likely possibilities, of which one must occur
𝑠
and s are regarded as favorable, or as a "success," then the probability of a "success" is given by 𝑚.
In many situations encountered in statistics (such as simple random sampling), the concept of equally
likely outcomes applies.
5
Example: The random experiment is to flip a fair coin twice. The sample space of the experiment is S
= {HH, HT, TH, TT}.
Let A = {at least one head occurs} = {HH, HT, TH}. Since the coin is fair, each of the four possible
outcomes of the experiment has an equal chance of occurring. Therefore, the probability that event A
3
happens is 𝑃(𝐴) = 4 = 0.75.
Example: A second example is my random sampling example. Since there are 1771 possible samples
that could be selected, the probability that I will select a particular sample of 3 people from the group
1
of 23 students is 1771 = 0.000564653.
Example: p. 56.
Sometimes, we cannot use the equally likely outcomes concept, but must find probabilities by some
other means. In any case, whenever we assign probabilities to events, there are certain conditions that
must be satisfied. These conditions are sometimes considered to define, in a concrete way, the concept
of probability.
Basic Laws of Probability (Kolmogorov’s Axioms):
For a random experiment with sample space S, we have
1) PS   1 ,
2) For any A  S , P A  0; .
3) If E1, E2, E3, ..., En  S are mutually exclusive (i.e., no two of the events can happen at the same
time), then
PE1  E2  E3  ...  En   PE1   PE2   PE3     PEn .
These axioms must be satisfied by any assignment of probabilities to events. The axioms by
themselves, however, do not tell us what the numerical values of those probabilities are. We must also
use other information that is specific to the particular random experiment being performed.
Theorem 3.5: If the sample space, S, of a random experiment is finite, and if A is any event, then
P(A) equals the sum of the probabilities of the individual outcomes comprising A.
The validity of this theorem follows directly from Kolmogorov's axioms.
Example: From handout.
Theorem 3.6: (Generalized Addition Rule) If S is the sample space of a random experiment, and if
A and B are any two events (not necessarily mutually exclusive), then
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵).
It is easy to verify this result using a Venn diagram.
Note: If A and B are mutually exclusive, then the above equation reduces to Kolmogorov's Third
Axiom.
Example: From handout.
6
Example: p. 62.
Theorem 3.7: (Complement Rule) If S is the sample space of a random experiment, and if A is any
event, then 𝑃(𝐴̅) = 1 − 𝑃(𝐴).
Example: from handout.
Example: p. 63.
Conditional Probability and Independence
Sometimes (often) a random experiment is performed in several successive stages, and at each stage,
some information is gained about the ultimate outcome of the experiment. Before beginning the
experiment, we may find the sample space, S, of all possible outcomes. After performing the first
stage of the experiment, it may be that the information gained allows us to conclude that some of the
elements of S are no longer possible as ultimate outcomes of the experiment. We may then want to
recalculate the probabilities of events of interest.
Defn: If A and B are two events such that P(B) > 0, then the conditional probability that A occurs,
P A  B 
given that B occurs, is given by P A | B  
.
P B 
Note: The definition also says that P  A  B   P  A | B  P  B  , provided that P(B) > 0.
Example: Suppose that our random experiment consists of rolling a pair of fair dice, one green and
one red. Let A = {the sum of numbers showing on top face of dice is 7}; let D = {neither number
showing on the top faces is greater than 4}. What is P(A)? What is P(A|D)?
Example: p. 68.
Defn: Two events A and B are independent if the occurrence or non-occurrence of one does not
change the probability that the other occurs. If the events are not independent, then they are called
dependent.
Note: If two events are mutually exclusive, they cannot be independent. If they are independent, they
cannot be mutually exclusive. However, if they are not mutually exclusive, then they may or may not
be independent.
Theorem 3.9 (Multiplication Rule): If the events A and B are independent, then P(A  B) =
P(A)P(B). Alternatively, the events are independent if P  A | B   P  A and P  B | A  P  B  .
Example: Suppose that our random experiment is to flip a fair coin twice. The sample space of the
experiment is S = {HH, HT, TH, TT}. Let A = {first flip results in a head} = {HH, HT}, and let
B = {second flip results in a head} = {HH, TH}. Thus, the event
𝐴 ∩ 𝐵 = {𝐻𝐻}.
7
Our intuition tells us that the result of the second flip of the coin should be unrelated to the result of the
first flip. Since the coin is fair, we may assume that all outcomes of the random experiment are
equally likely to occur, so that
2
𝑃(𝐴) = = 0.5,
4
2
𝑃(𝐵) = = 0.5,
4
1
Also, 𝑃(𝐴 ∩ 𝐵) = 4 = 0.25. Thus, P(A  B) = P(A)P(B). The multiplication confirms our intuition
that the two events are independent.
Example: from handout.
Example: p. 69, at bottom
Example: p. 70, at bottom
Important Note: Random sampling guarantees that members of the sample will be independent of each
other.
Bayes' Theorem
In many situations, certain conditional probabilities are relatively simple to calculate, but the
conditional probabilities that are of interest to the researcher are more difficult to calculate.
Sometimes, we may make use of a result first discovered by a 17th century English clergyman,
Thomas Bayes, who also dabbled in mathematics and probability.
Theorem 3.10 (The Law of Total Probability): Assume that 𝐵1 , 𝐵2 , … , 𝐵𝑛 are a collection of events
satisfying the following conditions:
i) 𝐵1 ∪ 𝐵2 ∪ 𝐵3 ∪ … ∪ 𝐵𝑛 = 𝑆, and
ii) 𝐵𝑖 ∩ 𝐵𝑗 = 𝜙, for all i ≠ j.
In other words, the collection of events partitions the sample space.
Let A be any other event. Then
𝑛
𝑃(𝐴) = ∑ 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 ).
𝑖=1
Theorem 3.11 (Bayes' Theorem): If 𝐵1 , 𝐵2 , … , 𝐵𝑛 are a partition of the sample space S, and if A is
any other event, then
𝑃(𝐵𝑟 )𝑃(𝐴|𝐵𝑟 )
𝑃(𝐵𝑟 |𝐴) = 𝑛
,
∑𝑖=1 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 )
for r = 1, 2, 3, ..., n.
Note that the order of the conditioning is reversed between the left-hand side of the equation and the
right-hand side. In the situations in which Bayes' Theorem is useful, the conditional probability that is
of interest to the researcher is 𝑃(𝐵𝑟 |𝐴), while the conditional probabilities 𝑃(𝐴|𝐵𝑖 ) are relatively
simple to calculate, or are easily known.
Example: from handout
8
Example: p. 74, at bottom.
Download