CSE 312, 2012 Autumn, W.L.Ruzzo 5. independence independence Defn: Two events E and F are independent if P(EF) = P(E) P(F) If P(F)>0, this is equivalent to: P(E|F) = P(E) (proof below) Otherwise, they are called dependent 2 independence Roll two dice, yielding values D1 and D2 1) E = { D1 = 1 } F = { D2 = 1 } P(E) = 1/6, P(F) = 1/6, P(EF) = 1/36 P(EF) = P(E)•P(F) ⇒ E and F independent Intuitive; the two dice are not physically coupled 2) G = {D1 + D2 = 5} = {(1,4),(2,3),(3,2),(4,1)} P(E) = 1/6, P(G) = 4/36 = 1/9, P(EG) = 1/36 not independent! E, G are dependent events The dice are still not physically coupled, but “D1 + D2 = 5” couples them mathematically: info about D1 constrains D2. (But dependence/independence not always intuitively obvious; “use the definition, Luke”.) 3 independence Two events E and F are independent if P(EF) = P(E) P(F) If P(F)>0, this is equivalent to: P(E|F) = P(E) Otherwise, they are called dependent Three events E, F, G are independent if P(EF) = P(E) P(F) P(EG) = P(E) P(G) and P(EFG) = P(E) P(F) P(G) P(FG) = P(F) P(G) Example: Let X, Y be each {-1,1} with equal prob E = {X = 1}, F = {Y = 1}, G = { XY = 1} P(EF) = P(E)P(F), P(EG) = P(E)P(G), P(FG) = P(F)P(G) 4 independence In general, events E1, E2, …, En are independent if for every subset S of {1,2,…, n}, we have (Sometimes this property holds only for small subsets S. E.g., E, F, G on the previous slide are pairwise independent, but not fully independent.) 5 independence Theorem: E, F independent ⇒ E, Fc independent E = EF ∪ EFc c Proof: P(EF ) = P(E) – P(EF) = P(E) – P(E) P(F) = P(E) (1-P(F)) = P(E) P(Fc) S E F Theorem: if P(E)>0, P(F)>0, then E, F independent ⇔ P(E|F)=P(E) ⇔ P(F|E) = P(F) Proof: Note P(EF) = P(E|F) P(F), regardless of in/dep. Assume independent. Then P(E)P(F) = P(EF) = P(E|F) P(F) ⇒ P(E|F)=P(E) (÷ by P(F)) Conversely, P(E|F)=P(E) ⇒ P(E)P(F) = P(EF) P(F)) (× by 6 biased coin Suppose a biased coin comes up heads with probability p, independent of other flips P(n heads in n flips) = pn P(n tails in n flips) = (1-p)n P(exactly k heads in n flips) Aside: note that the probability of some number of heads = as it should, by the binomial theorem. 7 biased coin Suppose a biased coin comes up heads with probability p, independent of other flips P(exactly k heads in n flips) Note when p=1/2, this is the same result we would have gotten by considering n flips in the “equally likely outcomes” scenario. But p≠1/2 makes that inapplicable. Instead, the independence assumption allows us to conveniently assign a probability to each of the 2n outcomes, e.g.: Pr(HHTHTTT) = p2(1-p)p(1-p)3 = p#H(1-p)#T 8 hashing A data structure problem: fast access to small subset of data drawn from a large space. (Large) space of potential data items, say names or SSNs, only a few of which are actually used D x R 0 . . i • . n-1 (Small) hash table containing actual data A solution: hash function h:D→{0,...,n-1} crunches/scrambles names from large space into small one. E.g., if x is integer: h(x) = x mod n Good hash functions approximately randomize 10 hashing m strings hashed (uniformly) into a table with n buckets Each string hashed is an independent trial E = at least one string hashed to first bucket What is P(E) ? Solution: Fi = string i not hashed into first bucket (i=1,2,…,m) P(Fi) = 1 – 1/n = (n-1)/n for all i=1,2,…,m Event (F1 F2 … Fm) = no strings hashed to first bucket indp P(E) = 1 – P(F1 F2 ⋯ Fm) = 1 – P(F1) P(F2) ⋯ P(Fm) = 1 – ((n-1)/n)m ≈1-exp(-m/n) 11 hashing m strings hashed (non-uniformly) to table w/ n buckets Each string hashed is an independent trial, with probability pi of getting hashed to bucket i E = At least 1 of buckets 1 to k gets ≥ 1 string What is P(E) ? Solution: Fi = at least one string hashed into i-th bucket P(E) = P(F1 ∪ ⋯ ∪ Fk) = 1-P((F1 ∪ ⋯ ∪ Fk)c) = 1 – P(F1c F2c … Fkc) = 1 – P(no strings hashed to buckets 1 to k) = 1 – (1-p1-p2-⋯-pk)m 12 hashing Let D0 D be a fixed set of m strings, R = {0,...,n-1}. A hash function h:D→R is perfect for D0 if h:D0→R is injective (no collisions). How hard is it to find a perfect hash function? 1) Fix h; pick m elements of D0 independently at random ∈D Suppose h maps ≈ (1/n)th of D to each element of R. This is like the birthday problem: P(h is perfect for D0) = 13 caution; this analysis is heuristic, not rigorous, but still useful. hashing Let D0 D be a fixed set of m strings, R = {0,...,n-1}. A hash function h:D→R is perfect for D0 if h:D0→R is injective (no collisions). How hard is it to find a perfect hash function? 2) Fix D0; pick h at random E.g., if m = |D0| = 23 and n = 365, then there is ~50% chance that h is perfect for this fixed D0. If it isn’t, pick h’, h’’, etc. With high probability, you’ll quickly find a perfect one! “Picking a random function h” is easier said than done, but, empirically, picking among a set of functions like h(x) = (a•x +b) mod n 14 network failure Consider the following parallel network p1 p2 … pn n routers, ith has probability pi of failing, independently P(there is functional path) = 1 – P(all routers fail) = 1 – p1p2 pn 15 network failure Contrast: a series network p1 p2 pn n routers, ith has probability pi of failing, independently P(there is functional path) = P(no routers fail) = (1 – p1)(1 – p2) (1 – pn) 16 deeper into independence Recall: Two events E and F are independent if P(EF) = P(E) P(F) If E & F are independent, does that tell us anything about P(EF|G), P(E|G), P(F|G), when G is an arbitrary event? In particular, is P(EF|G) = P(E|G) P(F|G) ? In general, no. 17 deeper into independence Roll two 6-sided dice, yielding values D1 and D2 E = { D1 = 1 } F = { D2 = 6 } G = { D1 + D 2 = 7 } E and F are independent P(E|G) = 1/6 P(F|G) = 1/6, but P(EF|G) = 1/6, not 1/36 so E|G and F|G are not independent! 18 conditional independence Definition: Two events E and F are called conditionally independent given G, if P(EF|G) = P(E|G) P(F|G) Or, equivalently (assuming P(F)>0, P(G)>0), P(E|FG) = P(E|G) 19 do CSE majors get fewer A’s? Say you are in a dorm with 100 students 10 are CS majors: P(C) = 0.1 30 get straight A’s: P(A) = 0.3 3 are CS majors who get straight A’s P(CA) = 0.03 P(CA) = P(C) P(A), so C and A independent At faculty night, only CS majors and A students show up So 37 students arrive Of 37 students, 10 are CS ⇒ P(C | C or A) = 10/37 = 0.27 < .3 = P(A) Seems CS major lowers your chance of straight A’s ☹ Weren’t they supposed to be independent? In fact, CS and A are conditionally dependent at fac20 conditioning can also break DEPENDENCE Randomly choose a day of the week A = { It is not a Monday } B = { It is a Saturday } C = { It is the weekend } A and B are dependent events P(A) = 6/7, P(B) = 1/7, P(AB) = 1/7. Now condition both A and B on C: P(A|C) = 1, P(B|C) = ½, P(AB|C) = ½ P(AB|C) = P(A|C) P(B|C) ⇒ A|C and B|C independent Dependent events can become independent Another reason why by conditioning on additional information! conditioning is so useful 22 independence: summary Events E & F are independent if P(EF) = P(E) P(F), or, equivalently P(E|F) = P(E) (if p(E)>0) More than 2 events are indp if, for alI subsets, joint probability = product of separate event probabilities Independence can greatly simplify calculations For fixed G, conditioning on G gives a probability measure, P(E|G) But “conditioning” and “independence” are orthogonal: Events E & F that are (unconditionally) independent may become dependent when conditioned on G Events that are (unconditionally) dependent may become independent when conditioned on G 23 CSE 312, 2012 Autumn, W.L.Ruzzo 6. random variables T T T T H T H H random variables A random variable is some numeric function of the outcome, not the outcome itself. (Technically, neither random nor a variable, but...) Ex. Let H be the number of Heads when 20 coins are tossed Let T be the total of 2 dice rolls Let X be the number of coin tosses needed to see 1st head Note; even if the underlying experiment has “equally likely outcomes,” the associated random variable may Outcome H P(H) not TT 0 P(H=0) = 1/4 TH 1 HT 1 } P(H=1) = 1/2 HH 2 P(H=2) = 1/4 25 numbered balls 26 first head Flip a (biased) coin repeatedly until 1st head observed How many flips? Let X be that number. P(X=1) = P(H) = p P(X=2) = P(TH) = (1-p)p P(X=3) = P(TTH) = (1-p)2p memorize me! ... Check that it is a valid probability distribution: 1) 2) 27 probability mass functions 28 head count n=2 n=8 29 cumulative distribution function cdf pmf NB: for discrete random variables, be careful about “≤” vs “<” 30 why random variables Why use random variables? A. Often we just care about numbers If I win $1 per head when 20 coins are tossed, what is my average winnings? What is the most likely number? What is the probability that I win < $5? ... B. It cleanly abstracts away from unnecessary detail about the experiment/sample space; PMF is all we need. Outcom e H P(H) TT 0 P(H=0) = 1/4 TH 1 HT 1 HH 2 → P(H=1) = 1/2 P(H=2) = 1/4 → Flip 7 coins, roll 2 dice, and throw a dart; if dart landed in sector = dice roll mod #heads, then X = ... 31 expectation 32 expectation average of random values, weighted by their respective probabilities 33 expectation average of random values, weighted by their respective probabilities 34 expectation average of random values, weighted by their respective probabilities 35 first head dy0/dy = 0 How much would you pay to play? 36 (To geo) how many heads How much would you pay to play? 37 expectation of a function of a random variable i 2 3 4 5 6 7 8 9 10 11 12 p(i) = P[X=i] i•p(i) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 j q(j) = P[Y = j] j•q(j)- 0 4/36+3/36 =7/36 0/361 5/36+2/36 =7/36 7/362 1/36+6/36+1/36 =8/36 16/363 2/36+5/36 =7/36 21/364 3/36+4/36 =7/36 28/36- E[Y] = Σj jq(j) = 72/3672/36 =2 E[X] = Σi ip(i) = 252/36 = 7 38 expectation of a function of a random variable i 2 3 4 5 6 7 8 9 10 11 12 p(i) = P[X=i] g(i)•p(i) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 E[g(X)] = Σi g(i)p(i) = j 2/36 6/36 12/36 0/36 5/36 12/36 15/36 16/36 0/36 2/36 2/36 72/36 252/3= q(j) = P[Y = j] j•q(j)- 0 4/36+3/36 =7/36 0/361 5/36+2/36 =7/36 7/362 1/36+6/36+1/36 =8/36 16/363 2/36+5/36 =7/36 21/364 3/36+4/36 =7/36 28/36- E[Y] = Σj jq(j) = 72/3672/36 =2 2 39 expectation of a function of a random variable BT pg.84-85 X g Y xi1 xi2 yj1 xi3 xi6 yj3 yj2 xi4 xi5 Note that Sj = { xi | g(xi)=yj } is a partition of the domain of g. 40 properties of expectation A & B each bet $1, then flip 2 coins: HH A wins $2 HT Each takes TH back $1 TT B wins $2 Let X be A’s net gain: +1, 0, -1, resp.: P(X = +1) = 1/4 P(X = 0) = 1/2 What is E[X]? P(X = -1) = 1/4 E[X] = 1•1/4 + 0•1/2 + (-1)•1/4 = 0 What is E[X2]? Note: E[X2] ≠ E[X]2 E[X2] = 12•1/4 + 02•1/2 + (-1)2•1/4 = 1/2 41 properties of expectation Linearity of expectation, I For any constants a, b: E[aX + b] = aE[X] + b Proof: Example: Q: In the 2-person coin game above, what is E[2X+1]? A: E[2X+1] = 2E[X]+1 = 2•0 + 1 = 1 42 properties of expectation Linearity, II Let X and Y be two random variables derived from outcomes of a single experiment. Then E[X+Y] = E[X] + E[Y] True even if X, Y dependent Proof: Assume the sample space S is countable. (The result is true without this assumption, but I won’t prove it.) Let X(s), Y(s) be the values of these r.v.’s for outcome s∈S. Claim: Proof: similar to that for “expectation of a function of an r.v.,” i.e., the events “X=x” partition S, so sum above can be rearranged to match the definition of Then: E[X+Y] = Σs∈S(X[s] + Y[s]) p(s) = Σs∈SX[s] p(s) + Σs∈SY[s] p(s) = E[X] + E[Y] 43 properties of expectation Example X = # of heads in one coin flip, where P(X=1) = p. What is E(X)? E[X] = 1•p + 0 •(1-p) = p Let Xi, 1 ≤ i ≤ n, be # of H in flip of coin with P(Xi=1) = pi What is the expected number of heads when all are flipped? E[ΣiXi] = ΣiE[Xi] = Σipi ☜ Compare to slide 35 Special case: p1 = p2 = ... = p : E[# of heads in n flips] = pn 44 properties of expectation Note: Linearity is special! It is not true in general that E[X•Y] = E[X] • E[Y] ← counterexample above E[X2] = E[X]2 E[X/Y] = E[X] / E[Y] E[asinh(X)] = asinh(E[X]) • • • 45 variance 46 risk Alice & Bob are gambling (again). X = Alice’s gain per flip: E[X] = 0 . . . Time passes . . . Alice (yawning) says “let’s raise the stakes” E[Y] = 0, as before. 47 E[X] measures the “average” or “central tendency” of X. What about its variability? If E[X] = μ, then E[|x-μ|] seems like a natural quantity to look at: how much do we expect X to deviate from its average. Unfortunately, it’s a bit inconvenient mathematically; following is easier/more common. Definition The variance of a random variable X with mean E[X] = μ is Var[X] = E[(X-μ)2], often denoted σ2. The standard deviation of X is σ = √Var[X] 48 what does variance tell us? The variance of a random variable X with mean E[X] = μ is Var[X] = E[(X-μ)2], often denoted σ2. 1: Square always ≥ 0, and exaggerated as X moves away from μ, so Var[X] emphasizes deviation from the mean. II: Numbers vary a lot depending on exact distribution of X, but typically X is within μ ± σ ~66% of the time, and within μ ± 2σ ~95% of the time. (We’ll see the reasons for this soon.) 49 mean and variance μ = E[X] is about location; σ = √Var(X) is about spread σ≈2.2 # heads in 20 flips, p=.5 μ # heads in 150 flips, p=.5 σ≈6.1 μ (and note σ bigger in absolute terms in second ex., but smaller as a proportion of max.) 50 risk Alice & Bob are gambling (again). X = Alice’s gain per flip: E[X] = 0 Var[X] = 1 . . . Time passes . . . Alice (yawning) says “let’s raise the stakes” E[Y] = 0, as before. Var[Y] = 1,000,000 51 example Two games: a) flip 1 coin, win Y = $100 if heads, $-100 if tails b) flip 100 coins, win Z = (#(heads) - #(tails)) dollars Same expectation in both: E[Y] = E[Z] = 0 Same extremes in both: max gain = $100; max loss = $100 σY = 100 σZ = 10 But variability is very different: ~ ~ ~ ~ more variance examples X1 = sum of 2 fair dice, minus 7 σ2 = 5.83 σ2 = 10 X2 = fair 11-sided die labeled -5, ..., 5 -1, 0, +1 X3 = Y-6•signum(Y), where Y is the difference of 2 fair dice, given no doubles X4 = 3 pairs of dice all give NB: Wow, same X3 kinda complex; see slide 29 σ2 = 15 σ2 = 19.7 53 properties of variance 54 properties of variance Example: What is Var[X] when X is outcome of one fair die? E[X] = 7/2, so 55 properties of variance NOT linear; insensitive to location (b), quadratic in scale (a) E[X] = 0 Var[X] = 1 Y = 1000 X E[Y] = E[1000 X] = 1000 E[x] = 0 Var[Y] = Var[1000 X] =106Var[X] = 106 56 properties of variance NOT linear ^^^^^^^ 57 58 r.v.s and independence Defn: Random variable X and event E are independent if the event E is independent of the event {X=x} (for any fixed x), i.e. x P({X = x} & E) = P({X=x}) • P(E) Defn: Two random variables X and Y are independent if the events {X=x} and {Y=y} are independent (for any fixed x, y), i.e. x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y}) Intuition as before: knowing X doesn’t help you guess Y or E and vice versa. 59 r.v.s and independence Random variable X and event E are independent if x P({X = x} & E) = P({X=x}) • P(E) Ex 1: Roll a fair die to obtain a random number 1 ≤ X ≤ 6, then flip a fair coin X times. Let E be the event that the number of heads is even. P({X=x}) = 1/6 for any 1 ≤ x ≤ 6, P(E) = 1/2 P( {X=x} & E ) = 1/6 • 1/2, so they are independent Ex 2: as above, and let F be the event that the total number of heads = 6. P(F) = 2-6/6 > 0, and considering, say, X=4, we have P(X=4) = 1/6 > 0 (as above), but P({X=4} & F) = 0, since you can’t see 6 heads in 4 flips. So X & F are dependent. (Knowing that X is small renders F impossible; knowing that F happened means X must be 61 6.) r.v.s and independence Two random variables X and Y are independent if the events {X=x} and {Y=y} are independent (for any x, y), i.e. x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y}) Ex: Let X be number of heads in first n of 2n coin flips, Y be number in the last n flips, and let Z be the total. X and Y are independent: But X and Z are not independent, since, e.g., knowing that X = 0 precludes Z > n. E.g., P(X = 0) and P(Z = n+1) are both 62 joint distributions Often, several random variables are simultaneously observed X = height and Y = weight X = cholesterol and Y = blood pressure X1, X2, X3 = work loads on servers A, B, C Joint probability mass function: fXY(x, y) = P({X = x} & {Y = y}) Joint cumulative distribution function: FXY(x, y) = P({X ≤ x} & {Y ≤ y}) 63 examples Two joint PMFs W Z 1 2 3 Y X 1 2 3 1 2/24 2/24 2/24 1 4/24 1/24 1/24 2 2/24 2/24 2/24 2 0 3/24 3/24 3 2/24 2/24 2/24 3 0 4/24 2/24 4 2/24 2/24 2/24 4 4/24 0 2/24 P(W = Z) = 3 * 2/24 = 6/24 P(X = Y) = (4 + 3 + 2)/24 = 9/24 Can look at arbitrary relationships among variables this way 64 bottom row: dependent variables Top row; independent variables (a simple linear dependence) sampling from a joint distribution 65 another example Flip n fair coins X = #Heads seen in first n/2+k Y = #Heads seen in last n/2+k 66 marginal distributions Two joint PMFs W Z 1 2 3 fW(w) Y X 1 2 3 fX(x) 1 2/24 2/24 2/24 6/24 1 4/24 1/24 1/24 6/24 2 2/24 2/24 2/24 6/24 2 0 3/24 3/24 6/24 3 2/24 2/24 2/24 6/24 3 0 4/24 2/24 6/24 4 2/24 2/24 2/24 6/24 4 4/24 0 2/24 6/24 fZ(z) 8/24 8/24 8/24 fY(y) 8/24 8/24 8/24 Marginal PMF of one r.v.: sum over the other (Law of total fY(y) = Σx fXY(x,y) fX(x) = Σy fXY(x,y) probability) Question: Are W & Z independent? Are X & Y independent? 67 joint, marginals and independence Repeating the Definition: Two random variables X and Y are independent if the events {X=x} and {Y=y} are independent (for any fixed x, y), i.e. ∀x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y}) Equivalent Definition: Two random variables X and Y are independent if their joint probability mass function is the product of their marginal distributions, i.e. ∀x, y fXY(x,y) = fX(x) • fY(y) Exercise: Show that this is also true of their cumulative distribution functions 68 expectation of a function of 2 r.v.’s A function g(X, Y) defines a new random variable. Its expectation is: E[g(X, Y)] = ΣxΣy g(x, y) fXY(x,y) ☜ like slide 38 Expectation is linear. E.g., if g is linear: E[g(X, Y)] = E[a X + b Y + c] = a E[X] + b E[Y] + c Example: g(X, Y) = 2X-Y E[g(X,Y)] = 72/24 = 3 E[g(X,Y)] = 2•E[X] - E[Y] Y X 1 2 3 1 1 • 4/24 0 • 1/24 -1 • 1/24 2 3 • 0/24 2 • 3/24 1 • 3/24 3 5 • 0/24 4 • 4/24 3 • 2/24 4 7 • 4/24 6 • 0/24 5 • 2/24 = 2•2.5 - 2 = 3 recall both marginals are uniform 69 products of independent r.v.s Theorem: If X & Y are independent, then E[X•Y] = E[X]•E[Y] Proof: independence Note: NOT true in general; see earlier example E[X2]≠E[X]2 70 a zoo of (discrete) random variables 71 bernoulli random variables A single experiment with outcomes “Success” or “Failure” X is a random indicator variable (1 = success, 0 = failure) P(X=1) = p and P(X=0) = 1-p X is called a Bernoulli random variable: X ~ Ber(p) E[X] = E[X2] = p Var(X) = E[X2] – (E[X])2 = p – p2 = p(1-p) Examples: coin flip random binary digit whether a disk drive crashed Jacob (aka James, Jacques) Bernoulli, 1654 – 1705 72 binomial random variables Consider n independent random variables Yi ~ Ber(p) X = Σi Yi is the number of successes in n trials X is a Binomial random variable: X ~ Bin(n,p) By Binomial theorem, Examples: # of heads in n coin flips # of 1’s in a randomly generated length n bit string # of disk drive crashes in a 1000 computer cluster E[X] = pn Var(X) = p(1-p)n proof coming 73 binomial pmfs 74 variance of independent r.v.s is additive (Bienaymé, 1853) Theorem: If X & Y are independent, then Var[X+Y] = Var[X]+Var[Y] Proof: Let Var(aX+b) = a2Var(X) 75 mean, variance of the binomial (II) 76 disk failures A RAID-like disk array consists of n drives, each of which will fail independently with probability p. Suppose it can operate effectively if at least one-half of its components function, e.g., by “majority vote.” For what values of p is a 5-component system more likely to operate effectively than a 3-component system? X5 = # failed in 5-component system ~ Bin(5, p) X3 = # failed in 3-component system ~ Bin(3, p) 77 disk failures X5 = # failed in 5-component system ~ Bin(5, p) X3 = # failed in 3-component system ~ Bin(3, p) P(5 component system effective) = P(X5 < 5/2) = P(3 component system effective) = P(X3 < 3/2) = Calculation: 5-component system is better iff p < 1/2 n=1 n=5 n=3 78 noisy channels Goal: send a 4-bit message over a noisy communication channel. Say, 1 bit in 10 is flipped in transit, independently. What is the probability that the message arrives correctly? Let X = # of errors; X ~ Bin(4, 0.1) P(correct message received) = P(X=0) Can we do better? Yes: error correction via redundancy. E.g., send every bit in triplicate; use majority vote. Let Y = # of errors in one triple; Y ~ Bin(3, 0.1); P(a triple is OK) = If X’ = # errors in triplicate msg, X’ ~ Bin(4, 0.028), and Coding theory: good error correction with less wasted transmission 79 models & reality Sending a bit string over the network In real networks, large bit strings (length n ≈ 104) Corruption probability is very small: p ≈ 10-6 X ~ Bin(104, 10-6) is unwieldy to compute Extreme n and p values arise in many cases # bit errors in file written to disk # of typos in a book # of server crashes per day in giant data center 80 poisson random variables Suppose “events” happen, independently, at an average rate of λ per unit time. Let X be the actual number of events happening in a given time unit. Then X is a Poisson r.v. with parameter λ (denoted X ~ Poi(λ)) and has distribution (PMF): Siméon Poisson, 1781-1840 Examples: # of alpha particles emitted by a lump of radium in 1 sec. # of traffic accidents in Seattle in one year # of babies born in a day at UW Med center # of visitors to my web page today See B&T Section 6.2 for more on theoretical basis for Poisson. 81 poisson random variables X is a Poisson r.v. with parameter λ if it has PMF: Is it a valid distribution? Recall Taylor series: So: 82 expected value of poisson r.v.s i = 0 term is zero j = i-1 As expected, given definition in terms of “average rate λ” (Var[X] = , too; proof similar, see B&T example 6.20) 83 binomial random variable is poisson in the limit Poisson approximates binomial when n is large, p is small, and λ = np is “moderate” Formally, Binomial is Poisson in the limit as n → ∞ (equivalently, p → 0) while holding np = λ 84 sending data on a network, again Recall example of sending bit string over a network Send bit string of length n = 104 Probability of (independent) bit corruption is p = 10-6 Number of errors: Y ~ Bin(104, 10-6) Number of errors approx. X ~ Poi(λ = 104•10-6 = 0.01) What is probability that message arrives uncorrupted? P(Y=0) ≈ 0.990049829 Poisson approximation (here) is accurate to ~5 parts per billion. 85 binomial vs poisson 86 more on conditioning Recall: conditional probability P(X | A) = P(X & A) / P(A) Notation: For a random variable X, take this as shorthand for “∀x P(X=x | A) ...” Define: The conditional expectation of X E[X | A] = ∑x x P(X | A) I.e., the value of X averaged over outcomes where we know A happened 87 total expectation Recall: the law of total probability P(X) = P(X | A) P(A) + P(X | ¬ A) P(¬ A) I.e., unconditional probability is the weighted average of conditional probabilities, weighted by the probabilities of the conditioning events The Law of Total Expectation E[X] = E[X | A] P(A) + E[X | ¬ A] P(¬ A) I.e., unconditional expectation is the weighted average of conditional expectations, weighted by the probabilities of the conditioning events 88 total expectation The Law of Total Expectation 89 balls in urns – the hypergeometric distribution B&T, exercise 1.61 Draw d balls (without replacement) from an urn d containing N, of which w are white, the rest black. Let X = number of white balls drawn N [note: (n choose k) = 0 if k < 0 or k > n] E[X] = dp, where p = w/N (the fraction of white balls) proof: Let Xj be 0/1 indicator for j-th ball is white, X = Σ Xj The Xj are dependent, but E[X] = E[Σ Xj] = Σ E[Xj] = dp Var[X] = dp(1-p)(1-(d-1)/(N-1)) 90 random variables – summary RV: a numeric function of the outcome of an experiment Probability Mass Function p(x): prob that RV = x; Σp(x)=1 Cumulative Distribution Function F(x): probability that RV ≤ x Generalize to joint distributions; independence & marginals Expectation: mean, average, “center of mass,” fair price for a game of chance (probability)-weighted average of a random variable: E[X] = Σx xp(x) of a function: if Y = g(X), then E[Y] = Σx g(x)p(x) linearity: E[aX + b] = aE[X] + b E[X+Y] = E[X] + E[Y]; even if dependent this interchange of “order of operations” is quite special to linear combinations. E.g., E[XY]≠E[X]•E[Y], in general (but 91 see below) random variables – summary Conditional Expectation: E[X | A] = ∑x x•P(X | A) Law of Total Expectation E[X] = E[X | A]•P(A) + E[X | ¬ A]•P(¬ A) Variance: Var[X] = E[ (X-E[X])2 ] = E[X2] - (E[X])2] Standard deviation: σ = √Var[X] “Variance is insensitive to location, quadratic in scale” 2 Var[aX+b] = a Var[X] If X & Y are independent, then E[X•Y] = E[X]•E[Y] Var[X+Y] = Var[X]+Var[Y] (These two equalities hold for indp rv’s; but not in general.) 92 random variables – summary Important Examples: Bernoulli: P(X = 1) = p and P(X = 0) = 1-p Binomial: Poisson: μ = p, σ2= p(1-p) μ = np, σ2 = np(1-p) μ = λ, σ2 = λ Bin(n,p) ≈ Poi(λ) where λ = np fixed, n →∞ (and so p=λ/n → 0) Geometric P(X = k) = (1-p)k-1p 1/p, σ2 = (1-p)/p2 μ= Many others, e.g., hypergeometric 93