Introduction to probability Stat 134 FAll 2005 Berkeley Lectures prepared by: Elchanan Mossel elena Shvets Follows Jim Pitman’s book: Probability Section 3.3 Histo 1 0.1 X = 2*Bin(300,1/2) – 300 E[X] = 0 0. -50 -40 -30 -20 -10 0 10 20 30 40 50 Histo 2 0.1 Y = 2*Bin(30,1/2) – 30 E[Y] = 0 0. -50 -40 -30 -20 -10 0 10 20 30 40 50 Histo 3 0.1 Z = 4*Bin(10,1/4) – 10 E[Z] = 0 0. -50 -40 -30 -20 -10 -38 10 20 30 38 48 Histo 4 1. 0.9 W = 0 E[W] = 0 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0. -50 -40 -30 -20 -10 0 10 20 30 40 50 A natural question: •Is there a good parameter that allow to distinguish between these distributions? •Is there a way to measure the spread? Variance and Standard Deviation •The variance of X, denoted by Var(X) is the mean squared deviation of X from its expected value m = E(X): Var(X) = E[(X-m)2]. The standard deviation of X, denoted by SD(X) is the square root of the variance of X: SD(X) = Var(X). Computational Formula for Variance Claim: Var(X) = E(X ) E(X) . 2 Proof: E[ (X-m)2] = E[X2 – 2m X + m2] E[ (X-m)2] = E[X2] – 2m E[X] + m2 E[ (X-m)2] = E[X2] – 2m2+ m2 E[ (X-m)2] = E[X2] – E[X]2 2 Variance and SD For a general distribution Chebyshev inequality states that for every random variable X, X is expected to be close to E(X) give or take a few SD(X). Chebyshev Inequality: For every random variable X and for all k > 0: P(|X – E(X)| ¸ k SD(X)) · 1/k2. Properties of Variance and SD 1. Claim: Var(X) ¸ 0. Pf: Var(X) = (x-m)2 P(X=x) ¸ 0 2.Claim: Var(X) = 0 iff P[X=m] = 1. Chebyshev’s Inequality P(|X – E(X)| ¸ k SD(X)) · 1/k2 proof: • Let m = E(X) and s = SD(X). • Observe that |X–m| ¸ k s , |X–m|2 ¸ k2 s 2. • The RV |X–m|2 is non-negative, so we can use Markov’s inequality: • P(|X–m|2 ¸ k2 s 2) · E [|X–m|2 ] / k2 s P(|X–m|2 ¸ k2 s 2) · s 2 / k2 s 2 = 1/k2. 2 Variance of Indicators Suppose IA is an indicator of an event A with probability p. Observe that IA2 = IA. c A A IA=0=IA 2 IA=1=IA2 E(IA2) = E(IA) = P(A) = p, so: Var(IA) = E(IA2) – E(IA)2 = p – p2 = p(1-p). Variance of a Sum of Independent Random Variables Claim: if X1, X2, …, Xn are independent then: Var(X1+X2+…+Xn) = Var(X1)+Var(X2)+…+Var(Xn). Pf: Suffices to prove for 2 random variables. E[( X+Y – E(X+Y) )2 ] = E[( X-E(X) + Y–E(Y) )2] = E[( X-E(X))2 + 2 E[(X-E(X)) (Y-E(Y))] + E(Y–E(Y) )2]= Var(X) + Var(Y) + 2 E[(X-E(X))] E[(Y-E(Y))] Var(X) + Var(Y) + 0 (mult.rule) = Variance and Mean under scaling and shifts • Claim: SD(aX + b) = |a| SD(X) • Proof: Var[aX+b] = E[(aX+b – am –b)2] = = E[a2(X-m)2] = a2 s2 • • • • Corollary: If a random variable X has E(X) = m and SD(X) = s > 0, then X*=(X-m)/s has E(X*) =0 and SD(X*)=1. Square Root Law Let X1, X2, … , Xn be independent random variables with the same distribution as X, and let Sn be their sum: Sn = i=1n Xi, and Sn X = n their average, then: Weak Law of large numbers Thm: Let X1, X2, … be a sequence of independent random variables with the same distribution. Let m denote the common expected value X1 +X2 +...+ Xn . m = E(Xi). And let Xn = n Then for every e > 0: P(|Xn m | e ) 1 as n . Weak Law of large numbers Proof: Let m = E(Xi) and s = SD(Xi). Then from the square root law we have: E(Xn ) =m and SD(Xn ) = Now Chebyshev inequality gives us: s n . 2 e n s s P(|Xn m | e ) P(|Xn m | ) s n e n For a fixed e right hand side tends to 0 as n tends to 1. The Normal Approximation •Let Sn = X1 + … + Xn be the sum of independent random variables with the same distribution. •Then for large n, the distribution of Sn is approximately normal with mean E(Sn) = n m and SD(Sn) = s n1/2, • where m = E(Xi) and s = SD(Xi). In other words: Sums of iid random variables Suppose Xi represents the number obtained on the i’th roll of a die. Then Xi has a uniform distribution on the set {1,2,3,4,5,6}. Distribution of X1 0.2 0.1 0. 1 2 3 4 5 6 Sum of two dice We can obtain the distribution of S2 = X1 +X2 by the convolution formula: P(S2 = k) = i=1k-1 P(X1=i) P(X2=k-i| X1=i), by independence = i=1k-1 P(X1=i) P(X2=k-i). Distribution of S2 0.2 0.1 0. 2 3 4 5 6 7 8 9 10 11 12 Sum of four dice We can obtain the distribution of S4 = X1 + X2 + X3 + X4 = S2 + S’2 again by the convolution formula: P(S4 = k) = i=1k-1 P(S2=i) P(S’2=k-i| S2=i), by independence of S2 and S’2 = i=1k-1 P(S2=i) P(S’2=k-i). Distribution of S4 0.2 0.1 0. 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Distribution of S8 0.1 0. 8 12 16 20 24 28 32 36 40 44 48 Distribution of S16 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 16 24 32 40 48 56 64 72 80 88 96 Distribution of S32 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 32 48 64 80 96 112 128 144 160 176 192 Distribution of X1 0.6 0.5 0.4 0.3 0.2 0.1 0. 1 2 3 Distribution of S2 0.4 0.3 0.2 0.1 0. 2 3 4 5 6 Distribution of S4 0.3 0.2 0.1 0. 4 5 6 7 8 9 10 11 12 Distribution of S8 0.2 0.1 0. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Distribution of S16 0.15 0.1 0.05 0 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 Distribution of S32 0.1 0. 32 37 42 47 52 57 62 67 72 77 82 87 92 Distribution of X1 0.4 0.3 0.2 0.1 0. 0 1 2 3 4 5 Distribution of S2 0.3 0.2 0.1 0. 0 1 2 3 4 5 6 7 8 9 10 Distribution of S4 0.2 0.1 0. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Distribution of S8 0.1 0. 0 4 8 12 16 20 24 28 32 36 40 Distribution of S16 0.06 0.05 0.04 0.03 0.02 0.01 0 0 8 16 24 32 40 48 56 64 72 80 Distribution of S32 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 16 32 48 64 80 96 112 128 144 160