Expected value and variance; binomial distribution June 24, 2004 Recall: expected value Discrete case: E( X ) x p(x ) i i all x Continuous case: E( X ) xi p(xi )dx all x Expected Value Expected value is an extremely useful concept for good decision-making! Example: the lottery The Lottery (also known as a tax on people who are bad at math…) A certain lottery works by picking 6 numbers from 1 to 49. It costs $1.00 to play the lottery, and if you win, you win $2 million after taxes. If you play the lottery once, what are your expected winnings or losses? Lottery Calculate the probability of winning in 1 try: 1 49 6 “49 choose 6” 1 1 7.2 x 10 -8 49! 13,983,816 43!6! Out of 49 numbers, this is the number of distinct combinations of 6. The probability function (note, sums to 1.0): x$ p(x) -1 .999999928 + 2 million 7.2 x 10--8 Expected Value The probability function x$ p(x) -1 .999999928 + 2 million 7.2 x 10--8 Expected Value E(X) = P(win)*$2,000,000 + P(lose)*-$1.00 = 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86 Negative expected value is never good! You shouldn’t play if you expect to lose money! Expected Value If you play the lottery every week for 10 years, what are your expected winnings or losses? 520 x (-.86) = -$447.20 Empirical Mean (each person, cell, etc. counts once) N True mean of a population: = x i 1 N n Sample mean, for a sample of n subjects: = X x i 1 n Variance/standard deviation Probability distributions not only have central tendency (means), but also have ranges (described by variance or standard deviation). Var(x) =E(x-)2 “The expected (or average) squared distance (or deviation) from the mean” **We square because squaring has better properties than absolute value. Take square root to get back linear average distance from the mean (=”standard deviation”). Empirical Variance N The variance of a population: 2 = ( xi ) 2 i 1 N N The variance of a sample: s2 = ( xi x ) 2 i 1 n 1 Binomial distribution Introduction: Take the example of 5 coin tosses. What’s the probability that you flip exactly 3 heads in 5 coin tosses? Binomial distribution Solution: One way to get exactly 3 heads: HHHTT What’s the probability of this exact arrangement? P(heads)xP(heads) xP(heads)xP(tails)xP(tails) =(1/2)3 x (1/2)2 Another way to get exactly 3 heads: THHHT Probability of this exact outcome = (1/2)1 x (1/2)3 x (1/2)1 = (1/2)3 x (1/2)2 Binomial distribution In fact, (1/2)3 x (1/2)2 is the probability of each unique outcome that has exactly 3 heads and 2 tails. So, the overall probability of 3 heads and 2 tails is: (1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 + ….. for as many unique arrangements as there are—but how many are there?? 3 5 5C3 ways to arrange 3 heads in 5 trials = 5!/3!2! = 10 Outcome Probability THHHT (1/2)3 x (1/2)2 HHHTT (1/2)3 x (1/2)2 TTHHH (1/2)3 x (1/2)2 HTTHH (1/2)3 x (1/2)2 HHTTH (1/2)3 x (1/2)2 HTHHT (1/2)3 x (1/2)2 THTHH (1/2)3 x (1/2)2 HTHTH (1/2)3 x (1/2)2 HHTHT (1/2)3 x (1/2)2 THHTH (1/2)3 x (1/2)2 HTHHT (1/2)3 x (1/2)2 10 arrangements x (1/2)3 x (1/2)2 The probability of each unique outcome (note: they are all equal) P(3 heads and 2 tails) = 10 x (½)5=31.25% 5 3 x P(heads)3 x P(tails)2 = Binomial distribution function: X= the number of heads tossed in 5 coin tosses p(x) 0 1 2 3 4 number of heads 5 x Binomial distribution, generally Note the general pattern emerging if you have only two possible outcomes (call them 1/0 or yes/no or success/failure) in n independent trials, then the probability of exactly r “successes”= n = number of trials n r nr p (1 p) r r=# successes out of n trials p= probability of success 1-p = probability of failure Binomial distribution: definitions Binomial: Suppose that n independent experiments, or trials, are performed, where n is a fixed number, and that each experiment results in a “success” with probability p and a “failure” with probability 1-p. The total number of successes, X, is a binomial random variable with parameters n and p We write: X ~ Bin (n, p) {reads: “X is distributed binomially with parameters n and p} And the probability that X=r (i.e., that there are exactly r successes) is: P(X=r) = n r nr p (1 p) r Binomial distribution RECALL: All probability distributions are characterized by an expected value and a variance: If X follows a binomial distribution with parameters n and p: X ~ Bin (n, p) Then: The expected value of a binomial = np The variance of a binomial = np(1-p) The standard deviation of a binomial = np (1 p ) Binomial distribution: example If I toss a coin 20 times, what’s the probability of getting exactly 10 heads? 20 10 10 (.5) (.5) .176 10 Binomial distribution: example If I toss a coin 20 times, what’s the probability of getting of getting 2 or less heads? 20 0 20 20! (.5) 20 9.5 x10 7 (.5) (.5) 20!0! 0 20 1 19 20! 20 (.5) 20 x9.5 x10 7 1.9 x10 5 (.5) (.5) 19!1! 1 20 2 18 20! (.5) 20 190 x9.5 x10 7 1.8 x10 4 (.5) (.5) 18!2! 2 1.8 x10 4 In-Class Exercise Suppose that exactly 55.1% of potential voters who currently favor Kerry (a priori knowledge that only we have!). NBC news conducts a poll which consists of randomly calling 1000 eligible voters and asking their voting preference, • If the NBC researcher samples 1000 random voters, what’s the probability that exactly 551 of them say that they favor Kerry? • If the NBC researcher samples 1000 random voters, how many do you expect to say they favor Kerry (if someone is going to pay you a million dollars if you guess this right, what’s your best guess?) • Calculate the variance and standard deviation of the number of sampled voters (out of 1000) who vote “yes” on the recall. • If the NBC researcher finds that 400 out of 1000 of his random sample reported that they would voted “yes” for Kerry, what might you think about his sampling methods? (defend your opinion with numbers!) In-Class Exercise • If the NBC researcher samples 1000 random voters, what’s the probability that exactly 551 of them say that they favor Kerry? n r P( X r ) p (1 p) n r r 1000 P( X 551) (.551) 551(.449 ) 449 551 A very small number! In-Class Exercise b. If the NBC researcher samples 1000 random voters, how many do you expect to say they favor Kerry (if someone is going to pay you a million dollars if you guess this right, what’s your best guess?) Your best guess is 551. (1000x.551) In-Class Exercise c. Calculate the variance and standard deviation of the number of sampled voters (out of 1000) who would vote “yes” for Kerry. Variance=np(1-p)=1000(.551)(.449)=247.4 Standard deviation= square root (247.4)=15.7 In-Class Exercise d. If the NBC researcher finds that 400 out of 1000 of his random sample reported that they would vote “yes” for Kerry, what might you think about his sampling methods? (defend your opinion with numbers!) EXPECTED DEVIATION = 15.7; unlikely to see deviation of 151 (which is so much greater than the expected deviation) from the expected value of 551… Reading for this week Walker: 1.1-1.2, pages 1-9 Reading for next week Walker: 1.3-1.6 (p. 10-22), Chapters 2 and 3 (p. 23-54)