Conditional distributions (discrete case)

advertisement
Conditional distributions (discrete case)
The basic idea behind conditional distributions is simple: Suppose (X � Y ) is
a jointly-distributed random vector with a discrete joint distribution. Then
we can think of P{X = � | Y = �}, for every fixed �, as a distribution of
probabilities indexed by the variable �.
Example 1. Recall the following joint distribution from your text:
possible value for X
1
2
3
possible 3 1/6 1/6
0
values
2 1/6 0
1/6
for Y
1 0 1/6
1/6
Now suppose we know that Y = 3. Then, the conditional distribution
of X, given that we know Y = 3, is given by the following:
P{X = 1 � Y = 3}
1/6
1
=
= �
P{Y = 3}
1/3
2
1
P{X = 2 | Y = 3} = �
2
P{X = 1 | Y = 3} =
similarly, and �
P{X = � | Y = 3} = 0 for all other values of �. Note that, in
this example, � P{X = � | Y = 3} = 1. Therefore, the “conditional distribution” of X given that Y = 3 is indeed a total collection of probabilities.
As such, it has an expectation, variance, etc., as well. For instance,
�
� �
�
�
1
1
3
E(X | Y = 3) =
P{X = � | Y = 3} = 1 ×
+ 2×
= �
2
2
2
�
That is, if Y = 3, then our best predictor of X is 3/2. Whereas, EX =
2, which means that our best predictor of X, in light of no additional
83
84
17
information, is 2. You should check that also
5
E(X | Y = 1) = � E(X | Y = 2) = 2�
2
Similarly,
�
� �
�
1
1
2
2
2
E(X | Y = 3) = 1 ×
+ 2 ×
= 3�
2
2
whence
� �2
3
3
Var(X | Y = 3) = 3 −
= �
2
4
You should compare this with the unconditional variance VarX = 8/3
(check the details!).
Fix a number �. In general, the conditional distribution of X given that
Y = � is given by the table [function of �]:
P{X = � � Y = �}
P{X = � � Y = �}
P{X = � | Y = �} =
=�
�
P{Y = �}
� P{X = � � Y = �}
�
It follows easily from this that: (i) P{X = � | Y = �} ≥ 0; and (ii) � P{X =
� | Y = �} = 1. This shows that we really are studying a [here conditional]
probability distribution. As such,
�
�
� �
E �(X) � Y = � =
�(�)P{X = � | Y = �}�
�
as long as the sum converges absolutely [or when �(�) ≥ 0 for all �].
Example 2. Choose and fix 0 < � < 1, and two integers �� � ≥ 1. Let X
and Y be independent random variables; we suppose that X has a binomial
distribution with parameters � and �; and Y has a binomial distribution
with parameters � and �. Because X + Y can be thought of as the total
number of successes in � + � tosses of independent �-coins, it follows
that X + Y has a binomial distribution with parameters � + � and �. Our
present goal is to find the conditional distribution of X, given that X+Y = �,
for a fixed integer 0 ≤ � ≤ � + �.
P{X = � � X + Y = �}
P{X = � | X + Y = �} =
P{X + Y = �}
P{X = �} · P{Y = � − �}
=
�
P{X + Y = �}
The numerator is zero unless � = 0 � � � � � �. But if 0 ≤ � ≤ �, then
��� � �−� � � � �−� �−�+�
� �
·
� �
��+�� �−�
P{X = � | X + Y = �} = �
�
�+�−�
� �
��� � � ��
·
� �
= ���+��−�
�
Conditional distributions (discrete case)
85
That is, given that X + Y = �, then the conditional distribution of X is
a hypergeometric distribution! We can now read off the mean and the
variance from facts we know about hypergeometrics. Namely, according
to Example 1 on page 61 of these notes,
E(X | X + Y = �) =
and Example 2 on page 61 tells us that
Var(X | X + Y = �) = �
�
× ��
�+�
�
� �+�−�
�
�+��+� �−1
(Check the details! Some care is needed; on page 61, the notation was
slightly different than it is here: The variable B there is now �; N there is
now � + �, etc.)
Example 3. Suppose X1 and X2 are independent, distributed respectively
as Poisson(λ1 ) and Poisson(λ2 ), where λ1 � λ2 > 0 are fixed constants. What
is the distribution of X1 , given that X1 +X2 = � for a positive [fixed] integer
�?
We will need the distribution of X1 + X2 . Therefore, let us begin with
that: The possible values of X1 +X2 are 0� 1� � � � ; therefore the “convolution”
formula for discrete distributions implies that for all � ≥ 0,
P{X1 + X2 = �} =
=
=
=
�
�
∞
�
�=0
�
�
P{X1 = �} · P{X2 = � − �}
λ1� �−λ1
· P{X2 = � − �}
�!
� � �
λ1� �−λ1 λ2�−� �−λ2
�−(λ1 +λ2 ) � � � �−�
·
=
·
λ λ
�!
(� − �)!
�!
� 1 2
�=0
�−(λ1 +λ2 )
�!
· (λ1 + λ2 )�
�=0
[binomial theorem];
and P{X1 + X2 = �} = 0 for other values of �. In other words, X1 + X2 is
distributed as Poisson(λ1 + λ2 ).1
1Aside: This and induction together prove that if X � � � � � X are independent and all distributed
1
�
respectively as Poisson(λ1 ), . . . , Poisson(λ� ), then X1 + · · · + X� is distributed according to a Poisson
distribution with parameter λ1 + · · · + λ� .
86
Now,
where
17
P{X1 = � | X1 + X2 = �} =
P{X1 = �} · P{X2 = � − �}
P{X1 + X2 = �}
λ1� �−λ1 λ2�−� �−λ2
·
�!
(� − �)!
=
(λ1 + λ2 )� �−(λ1 +λ2 )
�!
� �
�
=
�� ��−� �
�
λ1
λ2
� � := 1 − � =
�
λ1 + λ2
λ1 + λ2
That is, the conditional distribution of X1 , given that X1 + X2 = �, is
Binomial(� � �).
� :=
Continuous conditional distributions
Once we understand conditional distributions in the discrete setting, we
could predict how the theory should work in the continuous setting [although it is quite difficult to justify that what is about to be discussed is
legitimate].
Suppose (X � Y ) is jointly distributed with joint density �(� � �). Then we
define the conditional density of X given Y = � [assuming that �Y (�) �= 0]
as
�(� � �)
�X|Y (�| �) :=
for all −∞ < � < ∞�
�Y (�)
This leads to conditional expectations:
� ∞
�
�
E �(X) | Y = � =
�(�)�X|Y (�| �) ���
−∞
etc. As such we have now E(X | Y = �), Var(X | Y = �), etc. in the usual way
using [now conditional] densities.
Example 4 (Uniform on a triangle, p. 414 of your text). Suppose (X � Y )
is chosen uniformly at random from the triangle {(� � �) : � ≥ 0� � ≥
0� � + � ≤ 2}. Find the conditional density of Y , given that X = � [for a
fixed 0 ≤ � ≤ 2].
We know that �(� � �) = 12 if (� � �) is in the triangle and �(� � �) = 0
otherwise. Therefore, for all fixed 0 ≤ � ≤ 2,
� 2−�
2−�
�X (�) =
�(� � �) �� =
�
2
0
Continuous conditional distributions
87
and �X (�) = 0 for other values of �. This yields the following: For all
0 ≤ � ≤ 2 − �,
�(� � �)
1/2
1
�Y |X (� | �) =
=
=
�
�X (�)
(2 − �)/2
2−�
and �Y |X (� | �) = 0 for all other values of �. In other words, given that
X = �, the conditional distribution of Y is Uniform(0 � 2 − �). Thus, for
instance, P{Y > 1 | X = �} = 0 if 2 − � < 1 [i.e., � > 1] and P{Y > 1 | X =
�} = (1 − �)/(2 − �) for 0 ≤ � ≤ 1. Alternatively, we can work things out
the longer way by hand:
� ∞
P{Y > 1 | X = �} =
�Y |X (� | �) ���
1
Now the integrand is zero if � > 2 − �. Therefore, unless 0 ≤ � ≤ 1, the
preceding probability is zero. When 0 ≤ � ≤ 1, then we have
� 2−�
1
1−�
P{Y > 1 | X = �} =
�� =
�
2−�
2−�
1
As another example within this one, let us compute E(Y | X = �):
� ∞
E(Y | X = �) =
��Y |X (� | �) ��
−∞
2−�
�
�
2−�
�� =
�
2−�
2
0
[Alternatively, we can read this off from facts that we know about uniforms; for instance, we should be able to tell—without computation—that
Var(Y | X = �) = (2 − �)2 /12. Check this by direct computation also!]
=
Download