Document

advertisement
CSE 312, 2012 Autumn, W.L.Ruzzo
5. independence
independence
Defn: Two events E and F are independent if
P(EF) = P(E) P(F)
If P(F)>0, this is equivalent to: P(E|F) = P(E) (proof below)
Otherwise, they are called dependent
2
independence
Roll two dice, yielding values D1 and D2
1) E = { D1 = 1 }
F = { D2 = 1 }
P(E) = 1/6, P(F) = 1/6, P(EF) = 1/36
P(EF) = P(E)•P(F) ⇒ E and F independent
Intuitive; the two dice are not physically coupled
2) G = {D1 + D2 = 5} = {(1,4),(2,3),(3,2),(4,1)}
P(E) = 1/6, P(G) = 4/36 = 1/9, P(EG) = 1/36
not independent!
E, G are dependent events
The dice are still not physically coupled, but “D1 + D2 = 5”
couples them mathematically: info about D1 constrains D2.
(But dependence/independence not always intuitively obvious;
“use the definition, Luke”.)
3
independence
Two events E and F are independent if
P(EF) = P(E) P(F)
If P(F)>0, this is equivalent to: P(E|F) = P(E)
Otherwise, they are called dependent
Three events E, F, G are independent if
P(EF) = P(E) P(F)
P(EG) = P(E) P(G)
and
P(EFG) = P(E) P(F)
P(G)
P(FG) = P(F) P(G)
Example: Let X, Y be each {-1,1} with equal prob
E = {X = 1}, F = {Y = 1}, G = { XY = 1}
P(EF) = P(E)P(F), P(EG) = P(E)P(G), P(FG) =
P(F)P(G)
4
independence
In general, events E1, E2, …, En are independent if
for every subset S of {1,2,…, n}, we have
(Sometimes this property holds only for small
subsets S. E.g., E, F, G on the previous slide are
pairwise independent, but not fully independent.)
5
independence
Theorem: E, F independent ⇒ E, Fc independent
E = EF ∪ EFc
c
Proof: P(EF ) = P(E) – P(EF)
= P(E) – P(E) P(F)
= P(E) (1-P(F))
= P(E) P(Fc)
S
E
F
Theorem: if P(E)>0, P(F)>0, then
E, F independent ⇔ P(E|F)=P(E) ⇔ P(F|E) = P(F)
Proof: Note P(EF) = P(E|F) P(F), regardless of
in/dep.
Assume independent. Then
P(E)P(F) = P(EF) = P(E|F) P(F) ⇒ P(E|F)=P(E) (÷
by P(F))
Conversely, P(E|F)=P(E) ⇒ P(E)P(F) = P(EF)
P(F))
(× by
6
biased coin
Suppose a biased coin comes up heads with
probability p, independent of other flips
P(n heads in n flips)
= pn
P(n tails in n flips)
= (1-p)n
P(exactly k heads in n flips)
Aside: note that the probability of some number of heads =
as it should, by the binomial theorem.
7
biased coin
Suppose a biased coin comes up heads with
probability p, independent of other flips
P(exactly k heads in n flips)
Note when p=1/2, this is the same result we would
have gotten by considering n flips in the “equally
likely outcomes” scenario. But p≠1/2 makes that
inapplicable. Instead, the independence assumption
allows us to conveniently assign a probability to each
of the 2n outcomes, e.g.:
Pr(HHTHTTT) = p2(1-p)p(1-p)3 = p#H(1-p)#T
8
hashing
A data structure problem: fast access to small subset
of data drawn from a large space.
(Large) space of
potential data
items, say
names or SSNs,
only a few of
which are
actually used
D
x
R
0
.
.
i •
.
n-1
(Small) hash table
containing actual
data
A solution: hash function h:D→{0,...,n-1}
crunches/scrambles names from large space into small
one. E.g., if x is integer:
h(x) = x mod n
Good hash functions approximately randomize
10
hashing
m strings hashed (uniformly) into a table with n
buckets
Each string hashed is an independent trial
E = at least one string hashed to first bucket
What is P(E) ?
Solution:
Fi = string i not hashed into first bucket (i=1,2,…,m)
P(Fi) = 1 – 1/n = (n-1)/n for all i=1,2,…,m
Event (F1 F2 … Fm) = no strings hashed to first bucket
indp
P(E) = 1 – P(F1 F2 ⋯ Fm)
= 1 – P(F1) P(F2) ⋯ P(Fm)
= 1 – ((n-1)/n)m
≈1-exp(-m/n)
11
hashing
m strings hashed (non-uniformly) to table w/ n
buckets
Each string hashed is an independent trial, with
probability pi of getting hashed to bucket i
E = At least 1 of buckets 1 to k gets ≥ 1 string
What is P(E) ?
Solution:
Fi = at least one string hashed into i-th bucket
P(E) = P(F1 ∪ ⋯ ∪ Fk) = 1-P((F1 ∪ ⋯ ∪ Fk)c)
= 1 – P(F1c F2c … Fkc)
= 1 – P(no strings hashed to buckets 1 to k)
= 1 – (1-p1-p2-⋯-pk)m
12
hashing
Let D0  D be a fixed set of m strings, R = {0,...,n-1}. A
hash function h:D→R is perfect for D0 if h:D0→R is
injective (no collisions). How hard is it to find a perfect
hash function?
1) Fix h; pick m elements of D0 independently at random
∈D
Suppose h maps ≈ (1/n)th of D to each element of R.
This is like the birthday problem:
P(h is perfect for D0) =
13
caution; this analysis is heuristic, not rigorous, but still useful.
hashing
Let D0  D be a fixed set of m strings, R = {0,...,n-1}. A
hash function h:D→R is perfect for D0 if h:D0→R is
injective (no collisions). How hard is it to find a perfect
hash function?
2) Fix D0; pick h at random
E.g., if m = |D0| = 23 and n = 365, then there is ~50%
chance that h is perfect for this fixed D0. If it isn’t, pick
h’, h’’, etc. With high probability, you’ll quickly find a
perfect one!
“Picking a random function h” is easier said than done,
but, empirically, picking among a set of functions like
h(x) = (a•x +b) mod n
14
network failure
Consider the following parallel network
p1
p2
…
pn
n routers, ith has probability pi of failing, independently
P(there is functional path) = 1 – P(all routers fail)
= 1 – p1p2 pn
15
network failure
Contrast: a series network
p1
p2
pn
n routers, ith has probability pi of failing, independently
P(there is functional path) =
P(no routers fail) = (1 – p1)(1 – p2) (1 – pn)
16
deeper into independence
Recall: Two events E and F are independent if
P(EF) = P(E) P(F)
If E & F are independent, does that tell us anything
about
P(EF|G), P(E|G), P(F|G),
when G is an arbitrary event? In particular, is
P(EF|G) = P(E|G) P(F|G) ?
In general, no.
17
deeper into independence
Roll two 6-sided dice, yielding values D1 and D2
E = { D1 = 1 }
F = { D2 = 6 }
G = { D1 + D 2 = 7 }
E and F are independent
P(E|G) = 1/6
P(F|G) = 1/6, but
P(EF|G) = 1/6, not 1/36
so E|G and F|G are not independent!
18
conditional independence
Definition:
Two events E and F are called conditionally
independent given G, if
P(EF|G) = P(E|G) P(F|G)
Or, equivalently (assuming P(F)>0, P(G)>0),
P(E|FG) = P(E|G)
19
do CSE majors get fewer A’s?
Say you are in a dorm with 100 students
10 are CS majors: P(C) = 0.1
30 get straight A’s: P(A) = 0.3
3 are CS majors who get straight A’s
P(CA) = 0.03
P(CA) = P(C) P(A), so C and A independent
At faculty night, only CS majors and A students show
up
So 37 students arrive
Of 37 students, 10 are CS ⇒
P(C | C or A) = 10/37 = 0.27 < .3 = P(A)
Seems CS major lowers your chance of straight A’s
☹
Weren’t they supposed to be independent?
In fact, CS and A are conditionally dependent at fac20
conditioning can also break DEPENDENCE
Randomly choose a day of the week
A = { It is not a Monday }
B = { It is a Saturday }
C = { It is the weekend }
A and B are dependent events
P(A) = 6/7, P(B) = 1/7, P(AB) = 1/7.
Now condition both A and B on C:
P(A|C) = 1, P(B|C) = ½, P(AB|C) = ½
P(AB|C) = P(A|C) P(B|C) ⇒ A|C and B|C independent
Dependent events can become independent
Another reason why
by conditioning on additional information! conditioning is so
useful
22
independence: summary
Events E & F are independent if
P(EF) = P(E) P(F), or, equivalently P(E|F) = P(E) (if
p(E)>0)
More than 2 events are indp if, for alI subsets, joint
probability = product of separate event probabilities
Independence can greatly simplify calculations
For fixed G, conditioning on G gives a probability
measure,
P(E|G)
But “conditioning” and “independence” are orthogonal:
Events E & F that are (unconditionally) independent
may become dependent when conditioned on G
Events that are (unconditionally) dependent may
become independent when conditioned on G
23
CSE 312, 2012 Autumn, W.L.Ruzzo
6. random variables
T
T
T
T
H
T
H
H
random variables
A random variable is some numeric function of the
outcome, not the outcome itself. (Technically, neither random nor
a variable, but...)
Ex.
Let H be the number of Heads when 20 coins are tossed
Let T be the total of 2 dice rolls
Let X be the number of coin tosses needed to see 1st head
Note; even if the underlying experiment has “equally
likely outcomes,”
the
associated
random
variable
may
Outcome
H
P(H)
not
TT
0
P(H=0) = 1/4
TH
1
HT
1
} P(H=1) = 1/2
HH
2
P(H=2) = 1/4
25
numbered balls
26
first head
Flip a (biased) coin repeatedly until 1st head observed
How many flips? Let X be that number.
P(X=1) = P(H) = p
P(X=2) = P(TH) = (1-p)p
P(X=3) = P(TTH) = (1-p)2p
memorize me!
...
Check that it is a valid probability distribution:
1)
2)
27
probability mass functions
28
head count
n=2
n=8
29
cumulative distribution function
cdf
pmf
NB: for discrete random variables, be careful about “≤” vs “<”
30
why random variables
Why use random variables?
A. Often we just care about numbers
If I win $1 per head when 20 coins are tossed, what is my average
winnings? What is the most likely number? What is the probability
that I win < $5? ...
B. It cleanly abstracts away from unnecessary detail
about the experiment/sample space; PMF is all we
need.
Outcom
e
H
P(H)
TT
0
P(H=0) = 1/4
TH
1
HT
1
HH
2
→
P(H=1) = 1/2
P(H=2) = 1/4
→
Flip 7 coins, roll 2 dice, and throw
a dart; if dart landed in sector =
dice roll mod #heads, then X = ...
31
expectation
32
expectation
average of random values,
weighted by their respective
probabilities
33
expectation
average of random values,
weighted by their respective
probabilities
34
expectation
average of random values,
weighted by their respective
probabilities
35
first head
dy0/dy = 0
How much
would you
pay to play?
36
(To geo)
how many heads
How much would
you pay to play?
37
expectation of a function of a random variable
i
2
3
4
5
6
7
8
9
10
11
12
p(i) = P[X=i] i•p(i)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
2/36
6/36
12/36
20/36
30/36
42/36
40/36
36/36
30/36
22/36
12/36
j
q(j) = P[Y = j]
j•q(j)-
0
4/36+3/36 =7/36 0/361
5/36+2/36 =7/36 7/362 1/36+6/36+1/36 =8/36 16/363
2/36+5/36 =7/36 21/364
3/36+4/36 =7/36 28/36-
E[Y] = Σj jq(j) =
72/3672/36
=2
E[X] = Σi ip(i) = 252/36 = 7
38
expectation of a function of a random variable
i
2
3
4
5
6
7
8
9
10
11
12
p(i) = P[X=i] g(i)•p(i)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
E[g(X)] = Σi g(i)p(i) =
j
2/36
6/36
12/36
0/36
5/36
12/36
15/36
16/36
0/36
2/36
2/36
72/36
252/3=
q(j) = P[Y = j]
j•q(j)-
0
4/36+3/36 =7/36 0/361
5/36+2/36 =7/36 7/362 1/36+6/36+1/36 =8/36 16/363
2/36+5/36 =7/36 21/364
3/36+4/36 =7/36 28/36-
E[Y] = Σj jq(j) =
72/3672/36
=2
2
39
expectation of a function of a random variable
BT pg.84-85
X
g
Y
xi1
xi2
yj1
xi3
xi6
yj3
yj2
xi4
xi5
Note that Sj = { xi | g(xi)=yj } is a
partition of the domain of g.
40
properties of expectation
A & B each bet $1, then flip 2 coins: HH A wins $2
HT Each takes
TH back $1
TT B wins $2
Let X be A’s net gain: +1, 0, -1, resp.:
P(X = +1) = 1/4
P(X = 0) = 1/2
What is E[X]?
P(X = -1) = 1/4
E[X] = 1•1/4 + 0•1/2 + (-1)•1/4 = 0
What is E[X2]?
Note:
E[X2] ≠ E[X]2
E[X2] = 12•1/4 + 02•1/2 + (-1)2•1/4 = 1/2
41
properties of expectation
Linearity of expectation, I
For any constants a, b: E[aX + b] = aE[X] + b
Proof:
Example:
Q: In the 2-person coin game above, what is E[2X+1]?
A: E[2X+1] = 2E[X]+1 = 2•0 + 1 = 1
42
properties of expectation
Linearity, II
Let X and Y be two random variables derived from
outcomes of a single experiment. Then
E[X+Y] = E[X] + E[Y]
True even if X, Y
dependent
Proof: Assume the sample space S is countable. (The result is
true without this assumption, but I won’t prove it.) Let X(s), Y(s) be
the values of these r.v.’s for outcome s∈S.
Claim:
Proof: similar to that for “expectation of a function of an r.v.,” i.e.,
the events “X=x” partition S, so sum above can be rearranged to
match the definition of
Then:
E[X+Y] = Σs∈S(X[s] + Y[s]) p(s)
= Σs∈SX[s] p(s) + Σs∈SY[s] p(s) = E[X] + E[Y]
43
properties of expectation
Example
X = # of heads in one coin flip, where P(X=1) = p.
What is E(X)?
E[X] = 1•p + 0 •(1-p) = p
Let Xi, 1 ≤ i ≤ n, be # of H in flip of coin with P(Xi=1) =
pi
What is the expected number of heads when all are
flipped?
E[ΣiXi] = ΣiE[Xi] = Σipi
☜ Compare to slide 35
Special case: p1 = p2 = ... = p :
E[# of heads in n flips] = pn
44
properties of expectation
Note:
Linearity is special!
It is not true in general that
E[X•Y]
= E[X] • E[Y]
← counterexample above
E[X2]
= E[X]2
E[X/Y]
= E[X] / E[Y]
E[asinh(X)] = asinh(E[X])
•
•
•
45
variance
46
risk
Alice & Bob are gambling (again). X = Alice’s gain per
flip:
E[X] = 0
. . . Time passes . . .
Alice (yawning) says “let’s raise the stakes”
E[Y] = 0, as before.
47
E[X] measures the “average” or “central tendency” of X.
What about its variability?
If E[X] = μ, then E[|x-μ|] seems like a natural quantity to
look at: how much do we expect X to deviate from its
average. Unfortunately, it’s a bit inconvenient
mathematically; following is easier/more common.
Definition
The variance of a random variable X with mean E[X] =
μ is
Var[X] = E[(X-μ)2], often denoted σ2.
The standard deviation of X is σ = √Var[X]
48
what does variance tell us?
The variance of a random variable X with mean E[X] =
μ is
Var[X] = E[(X-μ)2], often denoted σ2.
1: Square always ≥ 0, and exaggerated as X moves
away
from μ, so Var[X] emphasizes deviation from the mean.
II: Numbers vary a lot depending on exact distribution
of X, but typically X is
within μ ± σ ~66% of the time, and
within μ ± 2σ ~95% of the time.
(We’ll see the reasons for this soon.)
49
mean and variance
μ = E[X] is about location; σ = √Var(X) is about spread
σ≈2.2
# heads in 20 flips, p=.5
μ
# heads in 150 flips, p=.5
σ≈6.1
μ
(and note σ bigger in absolute terms in second ex., but smaller as a proportion of max.)
50
risk
Alice & Bob are gambling (again). X = Alice’s gain per
flip:
E[X] = 0
Var[X] = 1
. . . Time passes . . .
Alice (yawning) says “let’s raise the stakes”
E[Y] = 0, as before.
Var[Y] = 1,000,000
51
example
Two games:
a) flip 1 coin, win Y = $100 if heads, $-100 if tails
b) flip 100 coins, win Z = (#(heads) - #(tails)) dollars
Same expectation in both: E[Y] = E[Z] = 0
Same extremes in both: max gain = $100; max loss =
$100
σY = 100
σZ = 10
But
variability
is very
different:
~
~
~
~
more variance examples
X1 = sum of 2 fair dice,
minus 7
σ2 = 5.83
σ2 = 10
X2 = fair 11-sided die labeled
-5, ..., 5
-1, 0, +1
X3 = Y-6•signum(Y), where Y
is the difference of 2 fair
dice, given no doubles
X4 = 3 pairs of dice all give
NB: Wow,
same X3
kinda complex;
see slide 29
σ2 = 15
σ2 = 19.7
53
properties of variance
54
properties of variance
Example:
What is Var[X] when X is outcome of one fair die?
E[X] = 7/2, so
55
properties of variance
NOT linear;
insensitive to location (b),
quadratic in scale (a)
E[X] = 0
Var[X] = 1
Y = 1000 X
E[Y] = E[1000 X] = 1000 E[x] =
0
Var[Y] = Var[1000 X]
=106Var[X] = 106
56
properties of variance
NOT linear
^^^^^^^
57
58
r.v.s and independence
Defn: Random variable X and event E are independent
if the event E is independent of the event {X=x} (for any
fixed x), i.e.
x P({X = x} & E) = P({X=x}) • P(E)
Defn: Two random variables X and Y are independent if
the events {X=x} and {Y=y} are independent (for any
fixed x, y), i.e.
x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y})
Intuition as before: knowing X doesn’t help you guess Y
or E and vice versa.
59
r.v.s and independence
Random variable X and event E are independent if
x P({X = x} & E) = P({X=x}) • P(E)
Ex 1: Roll a fair die to obtain a random number 1 ≤ X ≤ 6, then flip
a fair coin X times. Let E be the event that the number of heads is
even.
P({X=x}) = 1/6 for any 1 ≤ x ≤ 6,
P(E) = 1/2
P( {X=x} & E ) = 1/6 • 1/2, so they are independent
Ex 2: as above, and let F be the event that the total number of
heads = 6.
P(F) = 2-6/6 > 0, and considering, say, X=4, we have P(X=4) = 1/6
> 0 (as above), but P({X=4} & F) = 0, since you can’t see 6 heads
in 4 flips. So X & F are dependent. (Knowing that X is small
renders F impossible; knowing that F happened means X must be
61
6.)
r.v.s and independence
Two random variables X and Y are independent if the events
{X=x} and {Y=y} are independent (for any x, y), i.e.
x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y})
Ex: Let X be number of heads in first n of 2n coin flips, Y be
number in the last n flips, and let Z be the total. X and Y are
independent:
But X and Z are not independent, since, e.g., knowing that X =
0 precludes Z > n. E.g., P(X = 0) and P(Z = n+1) are both 62
joint distributions
Often, several random variables are simultaneously
observed
X = height and Y = weight
X = cholesterol and Y = blood pressure
X1, X2, X3 = work loads on servers A, B, C
Joint probability mass function:
fXY(x, y) = P({X = x} & {Y = y})
Joint cumulative distribution function:
FXY(x, y) = P({X ≤ x} & {Y ≤ y})
63
examples
Two joint PMFs
W
Z
1
2
3
Y
X
1
2
3
1
2/24 2/24 2/24
1
4/24 1/24 1/24
2
2/24 2/24 2/24
2
0
3/24 3/24
3
2/24 2/24 2/24
3
0
4/24 2/24
4
2/24 2/24 2/24
4
4/24
0
2/24
P(W = Z) = 3 * 2/24 = 6/24
P(X = Y) = (4 + 3 + 2)/24 = 9/24
Can look at arbitrary relationships among variables this
way
64
bottom row: dependent variables Top row; independent variables
(a simple linear dependence)
sampling from a joint distribution
65
another example
Flip n fair coins
X = #Heads seen in first n/2+k
Y = #Heads seen in last n/2+k
66
marginal distributions
Two joint PMFs
W
Z
1
2
3
fW(w)
Y
X
1
2
3
fX(x)
1
2/24 2/24 2/24 6/24
1
4/24 1/24 1/24 6/24
2
2/24 2/24 2/24 6/24
2
0
3/24 3/24 6/24
3
2/24 2/24 2/24 6/24
3
0
4/24 2/24 6/24
4
2/24 2/24 2/24 6/24
4
4/24
0
2/24 6/24
fZ(z) 8/24 8/24 8/24
fY(y) 8/24 8/24 8/24
Marginal PMF of one r.v.: sum
over the other (Law of total
fY(y) = Σx fXY(x,y)
fX(x) = Σy fXY(x,y)
probability)
Question: Are W & Z independent? Are X & Y
independent?
67
joint, marginals and independence
Repeating the Definition: Two random variables X and
Y are independent if the events {X=x} and {Y=y} are
independent (for any fixed x, y), i.e.
∀x, y P({X = x} & {Y=y}) = P({X=x}) • P({Y=y})
Equivalent Definition: Two random variables X and Y
are independent if their joint probability mass function is
the product of their marginal distributions, i.e.
∀x, y fXY(x,y) = fX(x) • fY(y)
Exercise: Show that this is also true of their cumulative
distribution functions
68
expectation of a function of 2 r.v.’s
A function g(X, Y) defines a new random variable.
Its expectation is:
E[g(X, Y)] = ΣxΣy g(x, y) fXY(x,y)
☜ like slide 38
Expectation is linear. E.g., if g is linear:
E[g(X, Y)] = E[a X + b Y + c] = a E[X] + b E[Y] + c
Example:
g(X, Y) = 2X-Y
E[g(X,Y)] = 72/24 = 3
E[g(X,Y)] = 2•E[X] - E[Y]
Y
X
1
2
3
1
1 • 4/24 0 • 1/24 -1 • 1/24
2
3 • 0/24 2 • 3/24 1 • 3/24
3
5 • 0/24 4 • 4/24 3 • 2/24
4
7 • 4/24 6 • 0/24 5 • 2/24
= 2•2.5 - 2 = 3
recall both marginals are uniform
69
products of independent r.v.s
Theorem: If X & Y are independent, then E[X•Y] =
E[X]•E[Y]
Proof:
independence
Note: NOT true in general; see earlier example
E[X2]≠E[X]2
70
a zoo of (discrete)
random variables
71
bernoulli random variables
A single experiment with outcomes “Success” or “Failure”
X is a random indicator variable (1 = success, 0 = failure)
P(X=1) = p and P(X=0) = 1-p
X is called a Bernoulli random variable: X ~ Ber(p)
E[X] = E[X2] = p
Var(X) = E[X2] – (E[X])2 = p – p2 = p(1-p)
Examples:
coin flip
random binary digit
whether a disk drive crashed
Jacob (aka James,
Jacques) Bernoulli, 1654
– 1705
72
binomial random variables
Consider n independent random variables Yi ~ Ber(p)
X = Σi Yi is the number of successes in n trials
X is a Binomial random variable: X ~ Bin(n,p)
By Binomial theorem,
Examples:
# of heads in n coin flips
# of 1’s in a randomly generated length n bit string
# of disk drive crashes in a 1000 computer cluster
E[X] = pn
Var(X) = p(1-p)n
proof coming
73
binomial pmfs
74
variance of independent r.v.s is additive
(Bienaymé, 1853)
Theorem: If X & Y are independent, then
Var[X+Y] = Var[X]+Var[Y]
Proof: Let
Var(aX+b) = a2Var(X)
75
mean, variance of the binomial (II)
76
disk failures
A RAID-like disk array consists of n drives,
each of which will fail independently with
probability p.
Suppose it can operate effectively if at least
one-half of its components function, e.g.,
by “majority vote.”
For what values of p is a 5-component system more
likely to operate effectively than a 3-component
system?
X5 = # failed in 5-component system ~ Bin(5, p)
X3 = # failed in 3-component system ~ Bin(3, p)
77
disk failures
X5 = # failed in 5-component system ~ Bin(5, p)
X3 = # failed in 3-component system ~ Bin(3, p)
P(5 component system effective) = P(X5 < 5/2)
=
P(3 component system effective) = P(X3 < 3/2)
=
Calculation:
5-component system
is better iff p < 1/2
n=1
n=5
n=3
78
noisy channels
Goal: send a 4-bit message over a noisy communication channel.
Say, 1 bit in 10 is flipped in transit, independently.
What is the probability that the message arrives correctly?
Let X = # of errors; X ~ Bin(4, 0.1)
P(correct message received) = P(X=0)
Can we do better? Yes: error correction via redundancy.
E.g., send every bit in triplicate; use majority vote.
Let Y = # of errors in one triple; Y ~ Bin(3, 0.1); P(a triple is OK) =
If X’ = # errors in triplicate msg, X’ ~ Bin(4, 0.028), and
Coding theory: good error correction with less wasted transmission
79
models & reality
Sending a bit string over the network
In real networks, large bit strings (length n ≈ 104)
Corruption probability is very small: p ≈ 10-6
X ~ Bin(104, 10-6) is unwieldy to compute
Extreme n and p values arise in many cases
# bit errors in file written to disk
# of typos in a book
# of server crashes per day in giant data center
80
poisson random variables
Suppose “events” happen, independently, at
an average rate of λ per unit time. Let X be
the actual number of events happening in a
given time unit. Then X is a Poisson r.v. with
parameter λ (denoted X ~ Poi(λ)) and has
distribution (PMF):
Siméon Poisson, 1781-1840
Examples:
# of alpha particles emitted by a lump of radium in 1 sec.
# of traffic accidents in Seattle in one year
# of babies born in a day at UW Med center
# of visitors to my web page today
See B&T Section 6.2 for more on theoretical basis for Poisson.
81
poisson random variables
X is a Poisson r.v. with parameter λ if it has PMF:
Is it a valid distribution? Recall Taylor series:
So:
82
expected value of poisson r.v.s
i = 0 term is zero
j = i-1
As expected, given
definition in terms of
“average rate λ”
(Var[X] = , too; proof similar, see B&T example 6.20)
83
binomial random variable is poisson in the limit
Poisson approximates binomial when n is large, p is
small, and λ = np is “moderate”
Formally, Binomial is Poisson in the limit as
n → ∞ (equivalently, p → 0) while holding np = λ
84
sending data on a network, again
Recall example of sending bit string over a network
Send bit string of length n = 104
Probability of (independent) bit corruption is p = 10-6
Number of errors: Y ~ Bin(104, 10-6)
Number of errors approx. X ~ Poi(λ = 104•10-6 = 0.01)
What is probability that message arrives uncorrupted?
P(Y=0) ≈ 0.990049829
Poisson approximation (here) is accurate to ~5 parts per billion.
85
binomial vs poisson
86
more on conditioning
Recall: conditional probability
P(X | A) = P(X & A) / P(A)
Notation: For a random variable X,
take this as shorthand for
“∀x P(X=x | A) ...”
Define: The conditional expectation of X
E[X | A] = ∑x x P(X | A)
I.e., the value of X averaged over outcomes where we
know A happened
87
total expectation
Recall: the law of total probability
P(X) = P(X | A) P(A) + P(X | ¬ A) P(¬ A)
I.e., unconditional probability is the weighted
average of conditional probabilities, weighted
by the probabilities of the conditioning events
The Law of Total Expectation
E[X] = E[X | A] P(A) + E[X | ¬ A] P(¬ A)
I.e., unconditional expectation is the weighted average
of conditional expectations, weighted by the
probabilities of the conditioning events
88
total expectation
The Law of Total Expectation
89
balls in urns – the hypergeometric distribution
B&T, exercise
1.61
Draw d balls (without replacement) from an urn
d
containing N, of which w are white, the rest black.
Let X = number of white balls drawn
N
[note: (n choose k) = 0 if k < 0 or k > n]
E[X] = dp, where p = w/N (the fraction of white balls)
proof: Let Xj be 0/1 indicator for j-th ball is white, X = Σ Xj
The Xj are dependent, but E[X] = E[Σ Xj] = Σ E[Xj] = dp
Var[X] = dp(1-p)(1-(d-1)/(N-1))
90
random variables – summary
RV: a numeric function of the outcome of an experiment
Probability Mass Function p(x): prob that RV = x; Σp(x)=1
Cumulative Distribution Function F(x): probability that RV ≤ x
Generalize to joint distributions; independence & marginals
Expectation:
mean, average, “center of mass,” fair price for a game of
chance
(probability)-weighted
average
of a random variable: E[X] = Σx xp(x)
of a function: if Y = g(X), then E[Y] = Σx g(x)p(x)
linearity:
E[aX + b] = aE[X] + b
E[X+Y] = E[X] + E[Y]; even if dependent
this interchange of “order of operations” is quite special to
linear combinations. E.g., E[XY]≠E[X]•E[Y], in general (but
91
see below)
random variables – summary
Conditional Expectation:
E[X | A] = ∑x x•P(X | A)
Law of Total Expectation
E[X] = E[X | A]•P(A) + E[X | ¬ A]•P(¬ A)
Variance:
Var[X] = E[ (X-E[X])2 ] = E[X2] - (E[X])2]
Standard deviation: σ = √Var[X]
“Variance is insensitive to location, quadratic in scale”
2
Var[aX+b] = a Var[X]
If X & Y are independent, then
E[X•Y] = E[X]•E[Y]
Var[X+Y] = Var[X]+Var[Y]
(These two equalities hold for indp rv’s; but not in general.)
92
random variables – summary
Important Examples:
Bernoulli: P(X = 1) = p and P(X = 0) = 1-p
Binomial:
Poisson:
μ = p, σ2= p(1-p)
μ = np, σ2 = np(1-p)
μ = λ, σ2 = λ
Bin(n,p) ≈ Poi(λ) where λ = np fixed, n →∞ (and so p=λ/n → 0)
Geometric P(X = k) = (1-p)k-1p
1/p, σ2 = (1-p)/p2
μ=
Many others, e.g., hypergeometric
93
Download