Expected value, variance, and the binomial distribution

advertisement
Expected value and variance;
binomial distribution
June 24, 2004
Recall: expected value
Discrete case:
E( X ) 
 x p(x )
i
i
all x
Continuous case:
E( X ) 

xi p(xi )dx
all x
Expected Value

Expected value is an extremely useful
concept for good decision-making!
Example: the lottery

The Lottery (also known as a tax on people who
are bad at math…)
 A certain lottery works by picking 6 numbers from
1 to 49. It costs $1.00 to play the lottery, and if
you win, you win $2 million after taxes.

If you play the lottery once, what are your
expected winnings or losses?
Lottery
Calculate the probability of winning in 1 try:
1
 49 
 
6

“49 choose 6”
1
1

 7.2 x 10 -8
49! 13,983,816
43!6!
Out of 49 numbers,
this is the number of
distinct combinations
of 6.
The probability function (note, sums to 1.0):
x$
p(x)
-1
.999999928
+ 2 million
7.2 x 10--8
Expected Value
The probability function
x$
p(x)
-1
.999999928
+ 2 million
7.2 x 10--8
Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
Negative expected value is never good!
You shouldn’t play if you expect to lose money!
Expected Value
If you play the lottery every week for 10 years, what are your
expected winnings or losses?
520 x (-.86) = -$447.20
Empirical Mean
(each person, cell, etc. counts once)
N
True mean of a population:  =
x
i 1
N
n
Sample mean, for a sample of n subjects: = X 
x
i 1
n
Variance/standard deviation

Probability distributions not only have central
tendency (means), but also have ranges (described
by variance or standard deviation).
Var(x) =E(x-)2
“The expected (or average) squared distance (or
deviation) from the mean”
**We square because squaring has better properties than absolute value.
Take square root to get back linear average distance from the mean
(=”standard deviation”).
Empirical Variance
N

The variance of a population: 2 =
( xi   ) 2
i 1
N
N
The variance of a sample: s2 =

( xi  x ) 2
i 1
n 1
Binomial distribution
Introduction:
Take the example of 5 coin tosses. What’s the
probability that you flip exactly 3 heads in 5
coin tosses?
Binomial distribution
Solution:
One way to get exactly 3 heads: HHHTT
What’s the probability of this exact arrangement?
P(heads)xP(heads) xP(heads)xP(tails)xP(tails)
=(1/2)3 x (1/2)2
Another way to get exactly 3 heads: THHHT
Probability of this exact outcome = (1/2)1 x (1/2)3 x
(1/2)1 = (1/2)3 x (1/2)2
Binomial distribution
In fact, (1/2)3 x (1/2)2 is the probability of each
unique outcome that has exactly 3 heads and 2
tails.
So, the overall probability of 3 heads and 2 tails is:
(1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 +
….. for as many unique arrangements as there
are—but how many are there??
 
 
 3
5
5C3
ways to
arrange 3
heads in
5 trials
= 5!/3!2! = 10
Outcome
Probability
THHHT
(1/2)3 x (1/2)2
HHHTT
(1/2)3 x (1/2)2
TTHHH
(1/2)3 x (1/2)2
HTTHH
(1/2)3 x (1/2)2
HHTTH
(1/2)3 x (1/2)2
HTHHT
(1/2)3 x (1/2)2
THTHH
(1/2)3 x (1/2)2
HTHTH
(1/2)3 x (1/2)2
HHTHT
(1/2)3 x (1/2)2
THHTH
(1/2)3 x (1/2)2
HTHHT
(1/2)3 x (1/2)2
10 arrangements x (1/2)3 x (1/2)2
The probability
of each unique
outcome (note:
they are all
equal)
P(3 heads and 2 tails) =
10 x (½)5=31.25%
5
 
 3
x P(heads)3 x P(tails)2 =
Binomial distribution function:
X= the number of heads tossed in 5 coin
tosses
p(x)
0
1
2
3
4
number of heads
5
x
Binomial distribution, generally
Note the general pattern emerging  if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly r “successes”=
n = number of trials
n r
nr
  p (1  p)
r
r=#
successes
out of n
trials
p=
probability of
success
1-p = probability
of failure
Binomial distribution:
definitions
Binomial: Suppose that n independent experiments, or trials, are
performed, where n is a fixed number, and that each experiment
results in a “success” with probability p and a “failure” with
probability 1-p. The total number of successes, X, is a binomial
random variable with parameters n and p
We write: X ~ Bin (n, p) {reads: “X is distributed binomially with
parameters n and p}
And the probability that X=r (i.e., that there are exactly r successes)
is:
P(X=r) =
n r
nr
  p (1  p)
r
Binomial distribution
RECALL: All probability distributions are characterized by an
expected value and a variance:
If X follows a binomial distribution with parameters n and p:
X ~ Bin (n, p)
Then:
The expected value of a binomial = np
The variance of a binomial = np(1-p)
The standard deviation of a binomial =
np (1  p )
Binomial distribution: example

If I toss a coin 20 times, what’s the
probability of getting exactly 10 heads?
 20  10 10
 (.5) (.5)  .176
 10 
Binomial distribution: example

If I toss a coin 20 times, what’s the
probability of getting of getting 2 or less
heads?
 20  0 20 20!
(.5) 20  9.5 x10 7 
 (.5) (.5) 
20!0!
0
 20  1 19 20! 20
(.5)  20 x9.5 x10 7  1.9 x10 5 
 (.5) (.5) 
19!1!
1
 20  2 18 20!
(.5) 20  190 x9.5 x10 7  1.8 x10  4
 (.5) (.5) 
18!2!
2
 1.8 x10  4
In-Class Exercise
Suppose that exactly 55.1% of potential voters who currently favor Kerry
(a priori knowledge that only we have!).
NBC news conducts a poll which consists of randomly calling 1000
eligible voters and asking their voting preference,
• If the NBC researcher samples 1000 random voters, what’s the
probability that exactly 551 of them say that they favor Kerry?
• If the NBC researcher samples 1000 random voters, how many do
you expect to say they favor Kerry (if someone is going to pay you a
million dollars if you guess this right, what’s your best guess?)
• Calculate the variance and standard deviation of the number of
sampled voters (out of 1000) who vote “yes” on the recall.
• If the NBC researcher finds that 400 out of 1000 of his random
sample reported that they would voted “yes” for Kerry, what might
you think about his sampling methods? (defend your opinion with
numbers!)
In-Class Exercise
•
If the NBC researcher samples 1000 random voters, what’s the
probability that exactly 551 of them say that they favor Kerry?
n r
P( X  r )    p (1  p) n  r
r
 1000
P( X  551)   (.551) 551(.449 ) 449
 551 
A very small number!
In-Class Exercise
b. If the NBC researcher samples 1000 random voters, how many do
you expect to say they favor Kerry (if someone is going to pay you
a million dollars if you guess this right, what’s your best guess?)
Your best guess is 551. (1000x.551)
In-Class Exercise
c. Calculate the variance and standard deviation of the number of
sampled voters (out of 1000) who would vote “yes” for Kerry.
Variance=np(1-p)=1000(.551)(.449)=247.4
Standard deviation= square root (247.4)=15.7
In-Class Exercise
d. If the NBC researcher finds that 400 out of 1000 of his random
sample reported that they would vote “yes” for Kerry, what might
you think about his sampling methods? (defend your opinion with
numbers!)
EXPECTED DEVIATION = 15.7; unlikely to see deviation
of 151 (which is so much greater than the expected deviation)
from the expected value of 551…
Reading for this week

Walker: 1.1-1.2, pages 1-9
Reading for next week

Walker: 1.3-1.6 (p. 10-22), Chapters 2 and 3
(p. 23-54)
Download