Math-UA.233: Theory of Probability Lecture 21 Tim Austin

advertisement
Math-UA.233: Theory of Probability
Lecture 21
Tim Austin
From last time: covariance
Let X and Y be RVs, and let µX
covariance of X and Y is
E rX s and µY E rY s. The
Cov
pX , Y q E pX
looooomooooon
µX qpY µY q
notation
First properties:
1. Symmetry: CovpX , Y q CovpY , X q
2. Generalizes variance: CovpX , X q VarpX q
3. Alternative formula: CovpX , Y q E rXY s E rX sE rY s
4. If X and Y are independent then CovpX , Y q 0.
Here’s what makes covariance really useful: how it transforms
under sums and products:
Proposition (Ross Prop 7.4.2)
1. For any RVs X and Y and any real value a, we have
CovpaX , Y q CovpX , aY q a CovpX , Y q
p q CovpaX , aX q a2 CovpX , X q a2 VarpX q — sanity check).
(so Var aX
2. For any RVs X1 , . . . , Xn and Y1 , . . . , Ym , we have
Cov
ņ
i 1
m̧
Xi ,
j 1
Yj
ņ
m̧
CovpYi , Yj q.
i 1j 1
(so it behaves just like multiplying out a product of sums of numbers.)
In particular, if m
eqn (7.4.1))
Var
ņ
Xi
n and Yi Xi in part 2 above, we get (Ross
ņ
VarpXi q
2
i 1
looooomooooon
i 1
¸
CovpXi , Xj q
1¤i j ¤n
loooooooooooomoooooooooooon
the ‘diagonal terms1
the ‘cross terms1
If X1 , . . . , Xn are independent, then CovpXi , Xj q 0 whenever
i j, so in this case we’re left with
Var
ņ
i 1
Xi
ņ
i 1
VarpXi q.
Example (Ross E.g. 7.4b)
If X is binompn, p q, then
VarpX q np p1 p q.
IDEA: Recall that X
indicator variables.
X1 Xn , a sum of independent
Example (Special case of Ross E.g. 7.4f)
An experiment has three possible outcomes with respective
probabilities p1 , p2 , p3 . After n independent trials are
performed, we write Xi for the number of times outcome i
occurred, i 1, 2, 3.
Find CovpX1 , X2 q.
Now some continuous examples...
Example (Ross E.g. 7.2ℓ)
A flee starts at the origin in the plane. Each second it jumps
one inch. For each jump it chooses the direction uniformly at
random, independently of its previous jumps. Find the expected
square of the distance from the origin after n jumps.
Example (Normal RVs)
If X1 , . . . , Xn are independent normal RVs with respective
parameters µi , σi for i 1, . . . , n and, then
E
ņ
Xi
ņ
and Var
ņ
i 1
normal with those parameters.
Xi
ņ
σi2 .
°
This checks out with our previous calculation that ni1 Xi is still
i 1
µi
i 1
i 1
Conditioning with random variables
Our next topic is the use of conditional probability in connection
with random variables, as well as single events.
As for events, this can come up in various ways.
Ÿ
Sometimes we know the joint distribution of RVs X and Y ,
and want to compute ‘updated’ probabilities for the
behaviour of X in light of some information about Y .
Ÿ
Sometimes there is a natural modeling choice for the
conditional probabilities, and we must reconstruct the
unconditioned probabilities from them.
Ÿ
The tools we introduce below can give a convenient way to
do a calculation, even when we start and end with
‘unconditional’ quantities.
Conditional distributions: discrete case (Ross Sec 6.4)
Suppose that X and Y are discrete RVs with joint PMF p. We
have various events defined in terms of X , and others defined
in terms of Y .
Sometimes we need the conditional probabilities of “X -events”
given “Y -events”, or vice-versa.
This requires no new ideas beyond conditional probability. But
some new notation can be convenient. The conditional PMF
of X given Y is the function
pX |Y px |y q P tX
loooomoooon
x | Y y u pppx,pyyqq .
Y
notation
It is defined for any x, and any y for which pY py q ¡ 0 (we
simply don’t use it for other choices of y).
Example (Ross E.g. 6.4a)
Suppose that X and Y have joint PMF given by
p p0, 0q 0.4,
p p0, 1q 0.2,
p p1, 0q 0.1,
Calculate the conditional PMF of X given that Y
p p1, 1q 0.3.
1.
Example (Ross E.g. 6.4b)
Let X and Y be independent Poipλq and Poipµq RVs. Calculate
the conditional PMF of X given that X Y n.
Conditioning can also involve three or more RVs. This requires
care, but still no new ideas.
Example (Special case of Ross E.g. 6.4c)
An experiment has three possible outcomes with respective
probabilities p1 , p2 , p3 . After n independent trials are
performed, we write Xi for the number of times outcome i
occurred, i 1, 2, 3.
Find the conditional distribution (that is, the conditional joint
PMF) of pX1 , X2 q given that X3 m.
WHAT THE QUESTION IS ASKING FOR:
pX1 ,X2 |X3 pk, ℓ|mq P tX1
k, X2 ℓ | X3 mu
for all possible values of k, ℓ and m.
Another fact to note:
If X and Y are independent, then
pX |Y px |y q pX px qpY py q
pY py q
pX px q.
So this is just like for events: if X and Y are independent, then
knowing the value taken by Y doesn’t influence the probability
distribution of X .
Conditional distributions: continuous case (Ross Sec 6.5)
When continuous RVs are involved, we do need a new idea.
THE PROBLEM:
Suppose that Y is a continuous RV and E is any event. Then
P tY
yu 0
for any real value y,
so we cannot define the conditional probability P pE | tY
using the usual formula.
y uq
However, we can sometimes make sense of this in another
way...
Instead of assuming that Y takes the value y exactly, let us
condition on Y taking a value in a tiny window around y:
P pE | ty
¤Y ¤y
dy uq P pE X ty ¤ Y ¤ y dy uq
P ty ¤ Y ¤ y dy u
P pE X ty ¤ Y ¤ y dy uq
,
fY py q dy
where we use the infinitessimal interpretation of fY py q. This
makes sense provided fY py q ¡ 0.
Now let dy ÝÑ 0. The conditional probability of E given that
Y y is defined to be
P pE | Y
P pE X ty ¤ Y ¤ y
y q dylim
ÝÑ0
f py q dy
Y
dy uq
.
That definition again:
P pE | Y
P pE X ty ¤ Y ¤ y
y q dylim
f py q dy
ÝÑ0
dy uq
.
Y
Theorem (Surprisingly hard)
The above limit always exists, except maybe for a “negligible”
set of possible values y which you can safely ignore.
“Negligible” sets are another idea from measure theory, so
won’t describe them here. Ross doesn’t mention this result at
all. In this course you can always just assume that the limit
exists.
Example (Ross E.g. 6.5b)
Suppose that X , Y have joint PMF
f px, y q Find P tX
# x {y y
e
e
y
0
0 x, y 8
otherwise.
¡ 1 | Y y u for y ¡ 0.
In this example, E is defined in terms of another RV, jointly
continuous with Y . In fact there is a useful general tool for this
situation.
Definition (Ross p250)
Suppose X and Y are jointly continuous with joint PDF f . The
conditional PDF of X given Y is the function
fX |Y px |y q f px, y q
.
fY py q
It’s defined for all real values x and all y such that fY py q ¡ 0.
BEWARE: This is not a conditional probability, since it’s a ratio
of densities, not of probability values.
INTUITIVE INTERPRETATION: Instead of conditioning tX x u
on tY y u, let’s allow very small windows around both x and
y. We find:
P tx ¤ X ¤ x dx | y ¤ Y ¤ y dy u
P tx ¤ X P¤tyx ¤ Ydx,¤ yy ¤ Ydy¤u y dy u
f px,f ypyqqdxdydy fX |Y px |y q dx.
Y
So fX |Y px |y q is a PDF which describes the probability
distribution of X , given that Y lands in a very small window
around y.
Also, just as in the discrete case, if X and Y are independent
then
fX |Y px |y q fX px q
So knowing the value taken by Y (to arbitrary accuracy) doesn’t
influence the probability distribution of X .
The conditional PDF enables us to compute probabilities of
“X -events” given that Y y, without taking a limit:
Proposition (See Ross p251)
Let a b, or a 8 or b
fY py q ¡ 0. Then
P ta ¤ X
8. Let y be a real value such that
¤ b | Y yu »b
a
fX |Y px |y q dx.
MORAL: fX |Y px |y q is a new PDF. It describes the probabilities
of “X -events” given that Y y. It has all the other properties
that we’ve already seen for PDFs.
Download