Uploaded by minje7659

Lecture I

advertisement
Lecture Notes 1
- Probability Space -
March 2, 2022
1
Probability Space ( Ω, F, P )
Figure 1: Probability Space
Definition 1. Sample Space and Outcome
The sample space Ω is the set consisting of all the possible outcomes of a random experiment.
Definition 2. Event
The event E is an subset of Ω.
F is supposed to be a collection of subsets of Ω or events.
1
Definition 3. Probability
Probability or Probability measure P is a real-valued function defined on F which satisfies the
following axioms.
1. For every event E ∈ F, P (E) ≥ 0
2. P (Ω) = 1
3. For any sequence of disjoint events E1 , E2 , . . .
P(
∞
[
Ei ) =
∞
X
i=1
P (Ei )
i=1
Example 1. Tossing a fair coin three times
Ω = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }
F = {φ, {HHH}, {HHT }, · · · , {HHH, HHT }, · · · , Ω }
1
P ({HHH, HHT, HT H, T HH}) =
2
..
.
Example 2. Tossing a fair coin until the first head appears
Ω = {H, T H, T T H, T T T H, T T T T H, . . . }
F = {φ, {H}, {H, T H}, {H, T H, T T H, }, . . . }
3
P ({H, T H}) =
4
..
.
Example 3. Romeo and Juliet have a date at a given time, and each will arrive at the meeting
place with a delay between 0 and 1 hour, with all pairs of delays being equally likely.
Ω = {(x, y)|0 ≤ x ≤ 1, 0 ≤ y ≤ 1}
F = {E|E ⊂ Ω}
P ({(x, y)||x − y|≤
23
1
, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 }) =
12
144
..
.
2
2
Basic Theorems : Properties of Probability
Theorem 1.
P (φ) = 0
(1)
Proof.
Ei = φ
for all
i
∞
∞
∞
[ [
X
X
1 = P (Ω) = P (Ω ( Ei )) = P (Ω) +
P (Ei ) = 1 +
P (Ei )
i=1
i=1
i=1
∴ P (φ) = 0
Theorem 2. For any finite sequence of n disjoint events E1 , . . . En ,
P(
n
[
Ei ) =
i=1
n
X
P (Ei )
(2)
i=1
Proof.
Ei = φ
P(
n
[
for i > n
Ei ) = P (
i=1
∞
[
Ei ) =
i=1
∞
X
P (Ei ) =
i=1
n
X
P (Ei ) +
i=1
∞
X
i=n+1
P (Ei ) =
n
X
P (Ei )
i=1
Theorem 3. For every event E, P (E c ) = 1 − P (E)
Proof.
Ω=E
[
Ec,
E
\
Ec = φ
and
P (Ω) = 1
∴ 1 = P (Ω) = P (E) + P (E c )
P (E c ) = 1 − P (E)
Theorem 4.
If
Ei ⊂ Ej ,
then
P (Ei ) ≤ P (Ej )
(3)
Proof.
P (Ej ) = P (Ei ) + P (Eic
\
P (Eic Ej ) ≥ 0
\
Ej )
∴ P (Ei ) ≤ P (Ej )
3
Theorem 5. For every event E, 0 ≤ P (E) ≤ 1
Proof.
P (E) ≥ 0
P (E) ≤ P (Ω) = 1
∴ 0 ≤ P (E) ≤ 1
Theorem 6. For every two events Ei and Ej ,
P (Ei
[
Ej ) = P (Ei ) + P (Ej ) − P (Ei
\
Ej )
Proof.
\
\
\
Ej ) = P (Ei Ejc ) + P (Ei Ej ) + P (Eic Ej )
\
\
\
= P (Ei ) − P (Ei Ej ) + P (Ei Ej ) + P (Ej ) − P (Ei Ej )
\
= P (Ei ) + P (Ej ) − P (Ei Ej )
P (Ei
[
4
(4)
3
Computation of Probability
Definition 4. Simple Sample Space
A sample space Ω containing N outcomes is called a simple sample space if the probability
assigned to each of the outcomes is
1
N
.
Remark. The probability P (E) of an event E containing N (E) outcomes in this simple sample
space is
P (E) =
N (E)
N
Remark. Methods of Counting
1. Multiplication Rule
If an experiment has k parts, where each part has ni possible outcomes, regardless of the outQ
comes in the other parts , then the experiment has ki=1 ni outcomes.
2. Two crucial aspects when counting
There are two crucial aspects to consider when we count the outcomes. First, we should distinguish between sampling with replacement and sampling without replacement. Second, we
should determine whether or not the ordering of the outcome is important.
Taking these two considerations into account, the possible methods of counting can be categorized into four cases- without replacement/ordered, without replacement/ unordered, with
replacement/ordered, with replacement/ unorderedConsider selecting k out of n. The methods of counting all of the possible outcome under each
of four cases can be represented as follows
Figure 2: Methods of Counting
[1] With replacement / Ordered
In this case, the number of distinct outcomes is given by nk
Example 4. If a password is required to have 8 characters (letters or numbers), how many
distinct passwords can we get ?
5
[2] Without replacement / Ordered
In this case, the number of distinct outcomes is given by
n Pk
=
n!
= n × n − 1 × n − 2 × ··· × n − k + 1
(n − k)!
Example 5. A box contains n balls numbered 1, . . . n. If k balls are selected with replacement,
what is the probability of the event E that each of the k balls that are selected will have a different
number (n ≥ k)?
Example 6. (Birthday Problem) What is the probability that at least two people in a group of
k people (2 ≤ k ≤ 365) will have the same birthday ?
[3] Without replacement / Unordered
In this case, the order is not important. The outcome of the same kind would be regarded
as the same outcome, Reflecting this fact, the number of distinct outcomes is given by
n!
n
n Pk
=
=
n Ck =
k!
(n − k)!k!
k
Example 7. Suppose that an urn contains 8 red balls and 4 white balls. We draw 3balls from
the urn without replacement. Assuming that at each draw each ball in the urn is equally likely
to be chosen, what is the probability that two of three balls drawn are red ?
Example 8. Suppose that a deck of 52 cards containing four aces is shuffled thoroughly and the
cards are then distributed among four players so that each player receives 13 cards. Determine
the probability that each player will receive one ace.
Figure 3
6
[4] With replacement / Unordered
To count in this case, it is the easiest way to think of putting k balls in the bins numbered from
1 to n. For example, one outcome can be shown as the following figure.
Figure 4
Thinking in this way, counting the number of distinct outcomes is equivalent to counting
the number of arrangements of k balls and n − 1 walls in the middle (except the two end
walls).Therefore, the number of distinct outcomes is given by
n−1+k
(n − 1 + k)!
C
=
=
n−1+k k
(n − 1)!k!
k
Example 9. From the numbers 1, 2, . . . , 45, you may pick any six for your ticket. If the winning number is decided by randomly selecting six numbers from the forty-five with replacement,
determine the probability of your winning.
7
4
Conditional Probability
Definition 5. Conditional Probability
For an event F such that P (F ) > 0, we define the conditional probability of E given F by
T
P (E F )
P (E|F ) =
P (F )
(5)
Figure 5: Conditional Probability
Remark. Conditional probability, P (E|F ), is the name given to the new belief after receiving
the new information, in this case that event F occurred.
Conditional probability is also a probability because it satisfies the axioms of probability.
P (E|F ) ≥ 0
T
P (F )
P (Ω F )
=
=1
P (Ω|F ) =
P (F )
P (F )
T
S
T
S
T
∞
∞
∞
[
F )) X P (Ei F ) X
P( ∞
F)
P (( ∞
i=1 (Ei
i=1 Ei )
=
=
=
P (Ei |F )
P ( Ei |F ) =
P (F )
P (F )
P (F )
i=1
i=1
i=1
Example 10. We roll two fair 6-sided dice. Given that the two dice land on different numbers,
find the conditional probability that at least one die roll is a 6?
Figure 6
8
Theorem 7. Suppose P (F ) > 0. Then
\
P (E
F ) = P (E|F )P (F )
(6)
Assuming that all of the conditioning events have positive probability, we get
P(
n
\
Ei ) = P (E1 )P (E2 |E1 )P (E3 |E1
\
E2 ) · · · P (En |
i=1
n−1
\
Ei )
(7)
i=1
Example 11. Three cards are drawn from an ordinary 52-card deck without replacement. Find
the probability that none of the three cards is a heart
Theorem 8. The Theorem of Total Probability
If F1 , F2 , . . . Fn is a partition
P (E) =
n
X
1
of Ω and P (Fi ) > 0 for all i, then
P (E|Fi )P (Fi )
(8)
i=1
Figure 7: Probability Space
Proof.
E=
n
[
\
(E
Fi )
i=1
(E
\
P (E
Fi )
\
\ \
(E
Fj ) = φ
for any
i, j
Fi ) = P (E|Fi )P (Fi )
∴ P (E) =
n
X
P (E
\
Fi ) =
i=1
n
X
P (E|Fi )P (Fi )
i=1
1
It is said that F1 , F2 , . . . Fn is a partition when F1 , F2 , . . . Fn are disjoint and exhaustive,
that is,
S=
n
[
Fi
and
Fi
\
Fj = φ
f or
any
i, j
i=1
9
Example 12. Suppose that a box contains one fair coin and one coin with a head on each side.
Suppose also that one coin is selected at random and it is tossed. What is the probability that a
head will be obtained ?
Figure 8
Theorem 9. Bayes’ Theorem
If Fi , F2 , . . . Fn is a partition of Ω and P (Fj ) > 0,
P (E) > 0, then
P (E|Fi )P (Fi )
2
P (Fi |E) = Pn
j=1 P (E|Fj )P (Fj )
(9)
Proof.
P (Fi |E) =
T
P (E|Fi )P (Fi )
P (Fi E)
= Pn
P (E)
j=1 P (E|Fj )P (Fj )
Example 13. If a person has the disease, the test results are positive with probability 0.95,and
if the person does not have the disease, the test results are negative with probability 0.95 . A
random person drawn from a certain population has probability 0.001 of having the disease.
Given that the person just tested positive, what is the probability of having the disease ?
We will call the event that a person has the disease ”D” and the event that the test results
are negative ”N”.
We know
P (D) = 0.001
P (N c |D) = 0.95
P (Dc ) = 0.999
P (N |D) = 0.05
P (N c |Dc ) = 0.05
P (N |Dc ) = 0.95
By applying the Bayes’ theorem, we get
P (D|N c ) =
2
0.95 · 0.001
P (N c |D)P (D)
=
≈ 0.0187
c
c
c
c
P (N |D)P (D) + P (N |D )P (D )
0.95 · 0.001 + 0.05 · 0.999
This way of updating the probability of event Fi is usually called Bayesian Updating.
10
5
Independence
Definition 6. Independence
E and F are called independent if
P (E
\
F ) = P (E) · P (F )
(10)
Remark. When P (E) > 0, P (F ) > 0, this definition of independence is equivalent to
P (E|F ) = P (E)
and P (F |E) = P (F )
(11)
Theorem 10. Suppose that
P (E) > 0
P (F ) > 0
1. If E and F are disjoint, then E and F are dependent
2. If E and F are independent, then E and F are not disjoint
Remark. Suppose that P (E) = 0 and (or) P (F ) = 0. Then whether E and F are disjoint or
not, two events are independent
Theorem 11. If E and F are independent, then E and F c are also independent.
Proof.
P (E
\
F c ) = P (E) − P (E
\
F ) = P (E) − P (E)P (F )
= P (E)[1 − P (F )] = P (E) · P (F c )
Example 14. If E and F are independent, then are E c and F c also independent ?
P (E c
\
F c ) = P ((E
[
F )c ) = 1 − P (E
[
F)
\
= 1 − (P (E) + P (F ) − P (E
F ))
= 1 − (P (E) + P (F ) − P (E) · P (F ))(since E and F are independent)
= 1 − P (F ) − P (E)(1 − P (F )) = (1 − P (E)) · (1 − P (F ))
= P (E c ) · P (F c )
Example 15. A card is selected at random from an ordinary deck of 52 playing cards. E is the
event that the selected card is an ace and F is the event that it is a spade. Are events E and F
independent ?
11
Definition 7. (Mutual) Independence
E1 , E2 , . . . En are said to be (mutually) independent if for all sub-collection of k ≤ n events,
P (Ei
\
Ej
\
···
\
Ek ) = P (Ei ) · P (Ej ) · · · P (Ek )
(12)
Example 16. Consider two independent fair coin tosses, and the following events :
H1 = {1st toss is a head}
H2 = {2nd toss is a head}
D = {the two tosses have different results}
Are these three events independent ?
We know
1
1
1
P (H1 ) =
P (H2 ) =
P (D) =
2
2
2
\
\
\
1
1
1
P (H1 H2 ) =
P (H1 D) =
P (H2 D) =
4
4
4
\
\
P (H1 H1 D) = 0
From these facts,
P (H1
\
H2 ) =
1
= P (H1 ) · P (H2 )
4
1
D) = = P (H1 ) · P (D) for i = 1, 2
4
\
\
1
P (H1 H2 D) = 0 6= = P (H1 ) · P (H2 ) · P (D)
8
P (Hi
\
Therefore, H1 and H2 are independent, H1 and D are independent, H2 and D are independent,
but H1 , H2 and D are not (mutually)independent .
Remark. (Mutual) Independence implies that the occurrence or non-occurrence of any number
of the events from collection of events, E1 , E2 , . . . En carries no information on the remaining
events or their complements
Example 17. Assumethat E1 , E2 , E3 , E4 are (mutually) independent and P (E3
Show that
P (E1
[
E2 |E3
\
E4 ) = P (E1
[
E2 )
We have
P (E1 |E3
\
T
T
P (E1 E3 E4 )
P (E1 )P (E3 )P (E4 )
T
E4 ) =
=
= P (E1 )
P (E3 E4 )
P (E3 )P (E4 )
12
T
E4 ) > 0.
We similarly obtain
\
P (E2 |E3 E4 ) = P (E2 )
\
\
\
P (E1 E2 |E3 E4 ) = P (E1 E2 )
From these facts, we get
P (E1
[
E2 |E3
\
\
\
\
E4 ) + P (E1 |E3 E4 ) − P (E1 E2 |E3 E4 )
\
[
= P (E1 ) + P (E2 ) − P (E1 E2 ) = P (E1 E2 )
E4 ) = P (E1 |E3
\
Definition 8. Conditional Independence
Given an event G, the event E and F are called conditionally independent if
P (E
\
F |G) = P (E|G) · P (F |G)
(13)
Remark. The conditional independence is also characterized by
T
T T
\
P (G) · P (E F |G)
P (E F G)
T
=
P (E|F
G) =
P (F G)
P (G) · P (F |G)
P (G) · P (E|G) · P (F |G)
=
= P (E|G)
P (G) · P (F |G)
(14)
(15)
This relation states that if G is known to have occurred, the additional knowledge that F also
occurred does not change the probability of E
Remark. Independence of two event E and F does not imply conditional independent, and
vice versa.
Example 18. There are two coins, a blue and a red one. We choose one of the two at random,
each being chosen with probability
1
2,
and proceed with two independent tosses. The coins are
biased: with the blue coin,the probability of heads in any given toss is 0.99, whereas for the red
coin, it is 0.01
Let B be the event that the blue coin was selected. Let also Hi be the event that the ith toss
resulted in heads.
1. When the event B is known to have occurred , are the events H1 and H2 conditionally
independent?
2. Are the events H1 and H2 independent?
We know
P (Hi |B) = 0.99
P (Hi |R) = 0.01
∀i
13
P (H1
\
H2 |B) = 0.99 · 0.99 = P (H1 |B) · P (H2 |B)
Thus, H1 and H2 are conditionally independent given B
1
1
· 0.99 + · 0.01 = 0.5
2
2
1
1
P (H2 ) = P (B) · P (H2 |B) + P (R) · P (H2 |R) = · 0.99 + · 0.01 = 0.5
2
2
\
\
\
1
1
P (H1 H2 ) = P (B) · P (H1 H2 |B) + P (R) · P (H1 H2 |R) = · 0.992 + · 0.012 = 0.4901
2
2
P (H1 ) = P (B) · P (H1 |B) + P (R) · P (H1 |R) =
From these facts,
P (H1
\
H2 ) = 0.4901 6= 0.25 = P (H1 ) · P (H2 )
Thus, H1 and H2 are not independent
14
6
Appendix
Theorem 12. Inclusion and Exclusion Formula
P (∪ni=1 Ei )
=
n
X
X
P (Ei ) −
i=1
X
P (Ei ∩ Ej ) +
1≤i<j≤n
P (Ei ∩ Ej ∩ Ek ) − . . .
1≤i<j<k≤n
+ (−1)n−1 P (∩ni=1 Ei )
Proof. We already proved this formula for n = 2. Now let’s prove that if this formula is true for
k, then it is also true for k + 1
k
k
k
P (∪k+1
i=1 Ei ) = P ((∪i=1 Ei ) ∪ Ek+1 ) = P (∪i=1 Ei )) + P (Ek+1 ) − P (∪i=1 Ei ) ∩ Ek+1 )
= P (∪ki=1 Ei )) + P (Ek+1 ) − P (∪ki=1 (Ei ∩ Ek+1 ))
=
k
X
X
P (Ei ) −
i=1
X
P (Ei ∩ Ej ) +
1≤i<j≤k
P (Ei ∩ Ej ∩ Em ) − . . .
1≤i<j<m≤k
+ (−1)k−1 P (∩ki=1 Ei ) + P (Ek+1 )
−
k
X
i=1
=
k+1
X
i=1
X
P (Ei ∩ Ek+1 ) +
P (Ei ∩ Ej ∩ Ek+1 ) − · · · + (−1)k+1−1 P (∩k+1
i=1 Ei )
1≤i≤j≤k
X
P (Ei ) −
P (Ei ∩ Ej ) +
1≤i<j≤k+1
X
P (Ei ∩ Ej ∩ Em ) − . . .
1≤i<j<m≤k+1
+ (−1)k+1−1 P (∩k+1
i=1 Ei )
Remark. The principle of mathematical induction
Let n0 ∈ N where N is the set of natural numbers and let P (n) be a statement for each
natural number n ≥ n0 . Suppose that
1. The statement P (n0 ) is true
2. For all k ≥ n0 , the truth of P (k) implies the truth of P (k + 1).
Then P (n) is true for all n ≥ n0
15
Example 19. The Monty Hall Problem
A prize is equally likely to be found behind any one of three closed doors in front of you. You
point to one of the doors. A friend opens for you one of the remaining two doors, after making
sure that the prize is not behind it. At this point, you can stick to your initial choice, or switch
to the other unopened door.You win the prize if it lies behind your final choice of a door. Which
door will you choose ?
Figure 9: The Monty Hall Problem
Proof. First, the probability that the prize is behind door i, i = 1, 2, 3, P (i) is
P (1) = P (2) = P (3) =
1
3
Second, given that you has chosen door 1, the probability of your friend opening door 3 under
the condition that the prize is behind door i, P (3|i) is
P (opening 3|1) =
1
2
P (opening 3|2) = 1
P (opening 3|3) = 0
Now, we can the probability of the prize being behind door j given door 3 was opened,
P (j|3), j = 1, 2 by applying Bayes’ rule.
P (opening 3|1)P (1)
P (opening 3|1)P (1) + P (opening 3|2)P (2) + P (opening 3|3)P (3)
1 1
·
1
= 1 1 2 13
1 = 3
2 · 3 +1· 3 +0· 3
P (1|opening 3) =
P (opening 3|2)P (2)
P (opening 3|1)P (1) + P (opening 3|2)P (2) + P (opening 3|3)P (3)
1 · 31
2
= 1 1
1
1 = 3
2 · 3 +1· 3 +0· 3
P (2|opening 3) =
Therefore, you are better off switching.
16
Example 20. A new couple, known to have two children, has just moved into town. Suppose
that the mother is encountered walking with one of her children. If this is a girl, what is the
probability that both children are girls?
Proof. Let G be the event that the child seen with the mother is a girl. And let’s define
G1 , G2 , B1 , B2 as follows
G1 : The first child is a girl
G2 : The second child is a girl
B1 : The first child is a boy
B2 : The second child is a boy
The probability this example requires us to find is given by
T
T
T
\
P (G1 G2 )
P (G1 G2 G)
=
P (G1 G2 |G) =
P (G)
P (G)
Also
\
\
\
\
P (G) = P (G|G1 G2 ) · P (G1 G2 ) + P (G|G1 B2 ) · P (G1 B2 )
\
\
\
\
P (G|B1 G2 ) · P (B1 G2 ) + P (G|B1 B2 ) · P (B1 B2 )
\
\
\
\
\
= P (G1 G2 ) + P (G|G1 B2 ) · P (G1 B2 ) + P (G|B1 G2 ) · P (B1 G2 )
where the final equation used the reults P (G|G1
T
G2 ) = 1 and P (G|B1
T
B2 ) = 0.
Assuming that all 4 gender possibilities are equally likely, we get
P (G1
\
G2 |G) =
1
4
+ P (G|G1
T
B2 ) ·
1
4
1
4
+ P (G|B1
T
G2 ) ·
1
4
Thus the answer depends on whatever assumptions we make about the conditional probabilities
T
T
P (G|G1 B2 ) and P (G|B1 G2 ).
17
Download