PPT

advertisement
Introduction to probability
Stat 134
FAll 2005
Berkeley
Lectures prepared by:
Elchanan Mossel
elena Shvets
Follows Jim Pitman’s
book:
Probability
Section 3.3
Histo 1
0.1
X = 2*Bin(300,1/2) – 300
E[X] = 0
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
Histo 2
0.1
Y = 2*Bin(30,1/2) – 30
E[Y] = 0
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
Histo 3
0.1
Z = 4*Bin(10,1/4) – 10
E[Z] = 0
0.
-50
-40
-30
-20
-10
-38
10
20
30
38
48
Histo 4
1.
0.9
W = 0
E[W] = 0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
A natural question:
•Is there a good parameter that allow
to distinguish between these
distributions?
•Is there a way to measure the spread?
Variance and Standard Deviation
•The variance of X, denoted by Var(X) is
the mean squared deviation of X from its
expected value m = E(X):
Var(X) = E[(X-m)2].
The standard deviation of X, denoted by
SD(X) is the square root of the variance of
X:
SD(X) = Var(X).
Computational Formula for
Variance
Claim:
Var(X) = E(X )  E(X) .
2
Proof:
E[ (X-m)2] = E[X2 – 2m X + m2]
E[ (X-m)2] = E[X2] – 2m E[X] + m2
E[ (X-m)2] = E[X2] – 2m2+ m2
E[ (X-m)2] = E[X2] – E[X]2
2
Variance and SD
For a general distribution Chebyshev inequality
states that for every random variable X, X is
expected to be close to E(X) give or take a few
SD(X).
Chebyshev Inequality:
For every random variable X and for all k >
0:
P(|X – E(X)| ¸ k SD(X)) · 1/k2.
Properties of Variance and SD
1. Claim: Var(X) ¸ 0.
Pf: Var(X) =  (x-m)2 P(X=x) ¸ 0
2.Claim: Var(X) = 0 iff P[X=m] = 1.
Chebyshev’s Inequality
P(|X – E(X)| ¸ k SD(X)) · 1/k2
proof:
• Let m = E(X) and s = SD(X).
• Observe that |X–m| ¸ k s , |X–m|2 ¸ k2 s 2.
• The RV |X–m|2 is non-negative, so we can use
Markov’s inequality:
• P(|X–m|2 ¸ k2 s 2) · E [|X–m|2 ] / k2 s
P(|X–m|2 ¸ k2 s 2)
· s 2 / k2 s 2 = 1/k2.
2
Variance of Indicators
Suppose IA is an indicator of an event A with
probability p. Observe that IA2 = IA.
c
A
A
IA=0=IA
2
IA=1=IA2
E(IA2) = E(IA) = P(A) = p, so:
Var(IA) = E(IA2) – E(IA)2 = p – p2 = p(1-p).
Variance of a Sum of
Independent Random Variables
Claim: if X1, X2, …, Xn are independent then:
Var(X1+X2+…+Xn) = Var(X1)+Var(X2)+…+Var(Xn).
Pf: Suffices to prove for 2 random variables.
E[( X+Y – E(X+Y) )2 ] = E[( X-E(X) + Y–E(Y) )2] =
E[( X-E(X))2 + 2 E[(X-E(X)) (Y-E(Y))] + E(Y–E(Y) )2]=
Var(X) + Var(Y) + 2 E[(X-E(X))] E[(Y-E(Y))]
Var(X) + Var(Y) + 0
(mult.rule) =
Variance and Mean under scaling
and shifts
• Claim: SD(aX + b) = |a| SD(X)
• Proof:
Var[aX+b] = E[(aX+b – am –b)2] =
= E[a2(X-m)2] = a2 s2
•
•
•
•
Corollary: If a random variable X has
E(X) = m and SD(X) = s > 0, then
X*=(X-m)/s has
E(X*) =0 and SD(X*)=1.
Square Root Law
Let X1, X2, … , Xn be
independent random
variables with the same
distribution as X, and
let Sn be their sum:
Sn = i=1n Xi, and
Sn
X =
n
their average, then:
Weak Law of large numbers
Thm: Let X1, X2, … be a sequence of independent
random variables with the same distribution. Let m
denote the common expected value
X1 +X2 +...+ Xn
.
m = E(Xi). And let Xn =
n
Then for every e > 0:
P(|Xn  m | e )  1 as n  .
Weak Law of large numbers
Proof: Let m = E(Xi) and s = SD(Xi). Then from the
square root law we have:
E(Xn ) =m and SD(Xn ) =
Now Chebyshev inequality gives us:
s
n
.
2
e n s
 s 
P(|Xn  m | e )  P(|Xn  m |
)

s n e n 
For a fixed e right hand side tends to 0 as n
tends to 1.
The Normal Approximation
•Let Sn = X1 + … + Xn be the sum of independent
random variables with the same distribution.
•Then for large n, the distribution of Sn is
approximately normal with mean E(Sn) = n m and
SD(Sn) = s n1/2,
• where m = E(Xi) and s = SD(Xi).
In other words:
Sums of iid random variables
Suppose Xi represents the number obtained on
the i’th roll of a die.
Then Xi has a uniform distribution on the set
{1,2,3,4,5,6}.
Distribution of X1
0.2
0.1
0.
1
2
3
4
5
6
Sum of two dice
We can obtain the distribution of S2 = X1 +X2
by the convolution formula:
P(S2 = k) = i=1k-1 P(X1=i) P(X2=k-i| X1=i),
by independence
= i=1k-1 P(X1=i) P(X2=k-i).
Distribution of S2
0.2
0.1
0.
2
3
4
5
6
7
8
9
10
11
12
Sum of four dice
We can obtain the distribution of
S4 = X1 + X2 + X3 + X4 = S2 + S’2 again by the
convolution formula:
P(S4 = k) = i=1k-1 P(S2=i) P(S’2=k-i| S2=i),
by independence of S2 and S’2
= i=1k-1 P(S2=i) P(S’2=k-i).
Distribution of S4
0.2
0.1
0.
4
5
6
7
8
9
10
11 12 13
14
15 16
17 18 19 20 21 22 23 24
Distribution of S8
0.1
0.
8
12
16
20
24
28
32
36
40
44
48
Distribution of S16
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
16
24
32
40
48
56
64
72
80
88
96
Distribution of S32
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
32
48
64
80
96
112
128
144
160
176
192
Distribution of X1
0.6
0.5
0.4
0.3
0.2
0.1
0.
1
2
3
Distribution of S2
0.4
0.3
0.2
0.1
0.
2
3
4
5
6
Distribution of S4
0.3
0.2
0.1
0.
4
5
6
7
8
9
10
11
12
Distribution of S8
0.2
0.1
0.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Distribution of S16
0.15
0.1
0.05
0
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
Distribution of S32
0.1
0.
32
37
42
47
52
57
62
67
72
77
82
87
92
Distribution of X1
0.4
0.3
0.2
0.1
0.
0
1
2
3
4
5
Distribution of S2
0.3
0.2
0.1
0.
0
1
2
3
4
5
6
7
8
9
10
Distribution of S4
0.2
0.1
0.
0
1
2
3
4
5
6
7
8
9
10
11 12
13 14
15 16
17 18
19 20
Distribution of S8
0.1
0.
0
4
8
12
16
20
24
28
32
36
40
Distribution of S16
0.06
0.05
0.04
0.03
0.02
0.01
0
0
8
16
24
32
40
48
56
64
72
80
Distribution of S32
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
16
32
48
64
80
96
112
128
144
160
Download