Lecture Notes 1 1 Probability Review

advertisement
Lecture Notes 1
Brief Review of Basic Probability
1
Probability Review
I assume you know basic probability. Chapters 1-3 are a review. I will assume you have
read and understood Chapters 1-3. If not, you should be in 36-700.
1.1
Random Variables
A random variable is a map X from a set Ω (equipped with a probability P ) to R. We write
P (X ∈ A) = P ({ω ∈ Ω : X(ω) ∈ A})
and we write X ∼ P to mean that X has distribution P . The cumulative distribution
function (cdf ) of X is
FX (x) = F (x) = P (X ≤ x).
If X is discrete, its probability mass function (pmf ) is
pX (x) = p(x) = P (X = x).
If X is continuous, then its probability density function function (pdf ) satisfies
Z
Z
p(x)dx
pX (x)dx =
P (X ∈ A) =
A
A
and pX (x) = p(x) = F 0 (x). The following are all equivalent:
X ∼ P,
X ∼ F,
X ∼ p.
Suppose that X ∼ P and Y ∼ Q. We say that X and Y have the same distribution
if P (X ∈ A) = Q(Y ∈ A) for all A. In that case we say that X and Y are equal in
d
distribution and we write X = Y .
d
Lemma 1 X = Y if and only if FX (t) = FY (t) for all t.
1
1.2
Expected Values
The mean or expected value of g(X) is
Z
E (g(X)) =
Z
g(x)dF (x) =
( R∞
g(x)p(x)dx if X is continuous
−∞
g(x)dP (x) = P g(xj )p(xj )
if X is discrete.
j
Recall that:
P
P
1. E( kj=1 cj gj (X)) = kj=1 cj E(gj (X)).
2. If X1 , . . . , Xn are independent then
E
n
Y
!
Xi
=
i=1
Y
E (Xi ) .
i
3. We often write µ = E(X).
4. σ 2 = Var (X) = E ((X − µ)2 ) is the Variance.
5. Var (X) = E (X 2 ) − µ2 .
6. If X1 , . . . , Xn are independent then
Var
n
X
!
ai X i
i=1
=
X
a2i Var (Xi ) .
i
7. The covariance is
Cov(X, Y ) = E((X − µx )(Y − µy )) = E(XY ) − µX µY
and the correlation is ρ(X, Y ) = Cov(X, Y )/σx σy . Recall that −1 ≤ ρ(X, Y ) ≤ 1.
The conditional expectation of Y given X is the random variable E(Y |X) whose
value, when X = x is
Z
E(Y |X = x) = y p(y|x)dy
where p(y|x) = p(x, y)/p(x).
2
The Law of Total Expectation or Law of Iterated Expectation:
Z
E(Y ) = E E(Y |X) = E(Y |X = x)pX (x)dx.
The Law of Total Variance is
Var(Y ) = Var E(Y |X) + E Var(Y |X) .
The moment generating function (mgf ) is
MX (t) = E etX .
d
If MX (t) = MY (t) for all t in an interval around 0 then X = Y .
(n)
Exercise: show that MX (t)|t=0 = E (X n ) .
1.3
Transformations
Let Y = g(X). Then
Z
FY (y) = P(Y ≤ y) = P(g(X) ≤ y) =
pX (x)dx
A(y)
where
Ay = {x : g(x) ≤ y}.
Then pY (y) = FY0 (y).
If g is monotonic, then
dh(y) pY (y) = pX (h(y)) dy where h = g −1 .
Example 2 Let pX (x) = e−x for x > 0. Hence FX (x) = 1 − e−x . Let Y = g(X) = log X.
Then
FY (y) = P (Y ≤ y) = P (log(X) ≤ y)
y
= P (X ≤ ey ) = FX (ey ) = 1 − e−e
y
and pY (y) = ey e−e for y ∈ R.
Example 3 Practice problem. Let X be uniform on (−1, 2) and let Y = X 2 . Find the
density of Y .
3
Let Z = g(X, Y ). For example, Z = X + Y or Z = X/Y . Then we find the pdf of Z as
follows:
1. For each z, find the set Az = {(x, y) : g(x, y) ≤ z}.
2. Find the CDF
Z Z
FZ (z) = P (Z ≤ z) = P (g(X, Y ) ≤ z) = P ({(x, y) : g(x, y) ≤ z}) =
pX,Y (x, y)dxdy.
Az
3. The pdf is pZ (z) = FZ0 (z).
Example 4 Practice problem. Let (X, Y ) be uniform on the unit square. Let Z = X/Y .
Find the density of Z.
1.4
Independence
X and Y are independent if and only if
P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B)
for all A and B.
Theorem 5 Let (X, Y ) be a bivariate random vector with pX,Y (x, y). X and Y are independent iff pX,Y (x, y) = pX (x)pY (y).
X1 , . . . , Xn are independent if and only if
P(X1 ∈ A1 , . . . , Xn ∈ An ) =
n
Y
P(Xi ∈ Ai ).
i=1
Q
Thus, pX1 ,...,Xn (x1 , . . . , xn ) = ni=1 pXi (xi ).
If X1 , . . . , Xn are independent and identically distributed we say they are iid (or that they
are a random sample) and we write
X1 , . . . , X n ∼ P
1.5
or
X1 , . . . , X n ∼ F
or
Important Distributions
Normal (Gaussian). X ∼ N (µ, σ 2 ) if
1
2
2
p(x) = √ e−(x−µ) /(2σ ) .
σ 2π
4
X1 , . . . , Xn ∼ p.
If X ∈ Rd then X ∼ N (µ, Σ) if
1
T −1
exp − (x − µ) Σ (x − µ) .
p(x) =
(2π)d/2 |Σ|
2
P
Chi-squared. X ∼ χ2p if X = pj=1 Zj2 where Z1 , . . . , Zp ∼ N (0, 1).
Bernoulli. X ∼ Bernoulli(θ) if P(X = 1) = θ and P(X = 0) = 1 − θ and hence
1
p(x) = θx (1 − θ)1−x
x = 0, 1.
Binomial. X ∼ Binomial(θ) if
n x
p(x) = P(X = x) =
θ (1 − θ)n−x
x
x ∈ {0, . . . , n}.
Uniform. X ∼ Uniform(0, θ) if p(x) = I(0 ≤ x ≤ θ)/θ.
−λ x
Poisson. X ∼ Poisson(λ) if P (X = x) = e x!λ x = 0, 1, 2, . . .. The E (X) = Var (X) = λ
t
and MX (t) = eλ(e −1) . We can use the mgf to show: if X1 ∼ Poisson(λ1 ), X2 ∼ Poisson(λ2 ),
independent then Y = X1 + X2 ∼ Poisson(λ1 + λ2 ).
Multinomial. The multivariate version of a Binomial is called a Multinomial. Consider
drawing a ball from an urn with has balls with kPdifferent colors labeled “color 1, color
2, . . . , color k.” Let p = (p1 , p2 , . . . , pk ) where
j pj = 1 and pj is the probability of
drawing color j. Draw n balls from the urn (independently and with replacement) and let
X = (X1 , X2 , . . . , Xk ) be the count of the number of balls of each color drawn. We say that
X has a Multinomial (n, p) distribution. The pdf is
n
p(x) =
px1 . . . pxkk .
x1 , . . . , x k 1
Exponential. X ∼ exp(β) if pX (x) = β1 e−x/β , x > 0. Note that exp(β) = Γ(1, β).
Gamma. X ∼ Γ(α, β) if
1
pX (x) =
xα−1 e−x/β
α
Γ(α)β
R ∞ 1 α−1 −x/β
for x > 0 where Γ(α) = 0 β α x e
dx.
More on the Multivariate Normal. Let Y ∈ Rd . Then Y ∼ N (µ, Σ) if
1
1
T −1
p(y) =
exp − (y − µ) Σ (y − µ) .
(2π)d/2 |Σ|1/2
2
Then E(Y ) = µ and cov(Y ) = Σ. The moment generating function is
tT Σt
T
M (t) = exp µ t +
.
2
5
Theorem 6 (a). If Y ∼ N (µ, Σ), then E(Y ) = µ, cov(Y ) = Σ.
(b). If Y ∼ N (µ, Σ) and c is a scalar, then cY ∼ N (cµ, c2 Σ).
(c). Let Y ∼ N (µ, Σ). If A is p × n and b is p × 1, then AY + b ∼ N (Aµ + b, AΣAT ).
Theorem 7 Suppose that Y ∼ N (µ, Σ). Let
Y1
µ1
Σ11 Σ12
Y =
, µ=
, Σ=
.
Y2
µ2
Σ21 Σ22
where Y1 and µ1 are p × 1, and Σ11 is p × p.
(a). Y1 ∼ Np (µ1 , Σ11 ), Y2 ∼ Nn−p (µ2 , Σ22 ).
(b). Y1 and Y2 are independent if and only if Σ12 = 0.
(c). If Σ22 > 0, then the condition distribution of Y1 given Y2 is
−1
Y1 |Y2 ∼ Np (µ1 + Σ12 Σ−1
22 (Y2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 ).
(1)
Lemma 8 Let Y ∼ N (µ, σ 2 I), where Y T = (Y1 , . . . , Yn ), µT = (µ1 , . . . , µn ) and σ 2 > 0 is a
scalar. Then the Yi are independent, Yi ∼ N1 (µ, σ 2 ) and
T Y TY
µ µ
||Y ||2
2
=
∼ χn
.
2
2
σ
σ
σ2
Theorem 9 Let Y ∼ N (µ, Σ). Then:
(a). Y T Σ−1 Y ∼ χ2n (µT Σ−1 µ).
(b). (Y − µ)T Σ−1 (Y − µ) ∼ χ2n (0).
1.6
Sample Mean and Variance
The sample mean is
Xn =
1X
Xi
n i
and the sample variance is
Sn2 =
1 X
(Xi − X)2 .
n−1 i
Let X1 , . . . , Xn be iid with µ = E(Xi ) = µ and σ 2 = Var(Xi ) = σ 2 . Then
E(X n ) = µ,
σ2
Var(X n ) = ,
n
Theorem 10 If X1 , . . . , Xn ∼ N (µ, σ 2 ) then
2
(a) X n ∼ N (µ, σn ).
(b)
2
(n−1)Sn
σ2
∼ χ2n−1 .
(c) X n and Sn2 are independent.
6
E(S 2 ) = σ 2 .
1.7
Delta Method
If X ∼ N (µ, σ 2 ), Y = g(X) and σ 2 is small then
Y ≈ N (g(µ), σ 2 (g 0 (µ))2 ).
To see this, note that
Y = g(X) = g(µ) + (X − µ)g 0 (µ) +
(X − µ)2 00
g (ξ)
2
for some ξ. Now E((X − µ)2 ) = σ 2 which we are assuming is small and so
Y = g(X) ≈ g(µ) + (X − µ)g 0 (µ).
Thus
E(Y ) ≈ g(µ),
Var(Y ) ≈ (g 0 (µ))2 σ 2 .
Hence,
g(X) ≈ N g(µ), (g 0 (µ))2 σ 2 .
7
Download