Coin tossing experiment

advertisement
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Stat 220, Parts IV-V
Probability and Chance Variability
Lecture 17
The Standard Error (Ch. 17)
Normal Approximation for Averages and Sums (Ch. 18)
Overview
Probability
Histograms
The Central
Limit
Theorem
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
John Kerrich tossed a coin 10,000 times and counted the
number of heads (his story is told in Section 16.1 in the
textbook). Before knowing the outcomes, what statements can
we make based on our current knowledge?
Summary
Coin Tossing Experiment
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
A coin lands heads or tails with equal chances of 50%. In the
long run, should the number of heads equal the number of tails?
The
Standard
Error
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 3:
Kerrich’s
Coin Tossing
Experiment
Coin tossing experiment
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Coin Tossing Experiment
Central Limit
Theorem
If we can describe toss outcomes with numbers – random
variables – then we can invoke the Law of Averages, since
10,000 is a pretty large number.
A common trick with processes whose outcome can be seen as
“yes” or “no” (“yes=heads, no=tails”), is to represent “yes” as 1
and “no” as 0. Then, our box model for the fair-coin toss has
one ticket with 1 and one with 0. The number of heads in
10,000 tosses, is like the sum of 10,000 independent draws from
the box.
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Recall: the expected value of a sum of draws is
(number of draws) × (average of box)
In our case, the average of the box is 1/2, so the expected value
for Kerrich’s experiment is 5000 heads.
The Law of Averages tells us that the actual fraction of heads
(which is the average of all draws) should be pretty close to
1/2. But how close?
It is not very likely that John Kerrich got exactly 5000 heads.
We expect that he got about 5,000 heads.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
−2.00%
−3.00%
0.40%
0.80%
0.28%
−0.14%
The difference between observed and expected number of heads
seems to increase, but the difference in percents seems to
decrease.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
The standard error for the sum of
the draws
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
20
10
0
2000
4000
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
6000
8000
10000
6000
8000
10000
number of tosses
Definition
Example
The Short
Cut
0
2000
4000
nr of tosses
Stat 220,
Parts IV-V
Probability
and Chance
Variability
The standard error for the Sum of
the draws
Central Limit
Theorem
The actual sum will likely be different from the expected value.
It will be off by the chance error:
The
Standard
Error
Definition
Example
The Short
Cut
The
Standard
Error
0
48.00%
47.00%
50.40%
50.80%
50.28%
49.86%
−20 −10
-1
-3
2
8
14
-14
Example 3:
Kerrich’s
Coin Tossing
Experiment
nr of heads − half the number of tosses
24
47
252
508
2514
4986
Difference
in %
4
Definition
Example
The Short
Cut
50
100
500
1000
5000
10000
Difference
observed %
of heads
2
The
Standard
Error
nr tosses
observed
nr of heads
0
Example 3:
Kerrich’s
Coin Tossing
Experiment
Central Limit
Theorem
−2
Central Limit
Theorem
10,000 tosses: A Graph
% of heads − 50%
10,000 tosses: A Table
−4
Stat 220,
Parts IV-V
Probability
and Chance
Variability
sum = expected value + chance error
The chance error is the amount above (+) or below (-) the
expected value.
Definition
The standard error (SE) for the sum tells us how big the chance
error is likely to be.
The SE has a lot in common with the r.m.s. error we learned
about in regression.
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
A sum is likely to be around its expected value, but to be off by
an amount similar in size to the Standard Error (SE).
To compute the SE for a sum, we use the following law:
Theorem (The square root law for sums)
When drawing at random with replacement from a box of
numbered tickets, the standard error for the sum of the draws is
√
number of draws × (SD of the box).
Stat 220,
Parts IV-V
Probability
and Chance
Variability
The standard error for the Average
of the draws
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 4
Central Limit
Theorem
For sums, the SE will increase as we keep drawing. But for the
average of all draws, the opposite happens! The formula for
the SE of the average is:
Example 3:
Kerrich’s
Coin Tossing
Experiment
Theorem (The square root law for averages)
The
Standard
Error
When drawing at random with replacement from a box of
numbered tickets, the standard error for the average of the
draws is
SD of the box
√
.
number of draws
Note: the formulas may look similar but the square-root of the
number of draws works in opposite directions. For sums, it
increases the SE and for averages, it reduces the SE.
Short cut for calculating the SE
When the tickets in the box show only two different numbers
(’big’ and ’small’), the SD of the box is
big number - small number
p
× (fraction with big number) × (fraction with small number)
Example 4 (continued): The SD is
r
1 3 √
(9 − 1) ×
× = 12 ≈ 3.5
4 4
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
We make 100 draws at random with replacement from the box
1
1
1
9
The average of the box is 3.
The expected value of the sum is 100 × 3 = 300.
The SD of the box is
r
(1 − 3)2 + (1 − 3)2 + (1 − 3)2 + (9 − 3)2 √
= 12 ≈ 3.5
4
√
The SE for the sum is 100 × 3.5 = 35.
Thus, the sum of the draws is likely to be ≈ 300, give or take
35 or so. The average will be ≈ 3, give or take 0.35 or so.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 5
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
We make 25 draws from the box
0
2
3
4
6
Fill in the blanks:
The sum of the draws is around ..., give or take ... or so.
The average of the draws is around ..., give or take ... or so.
The
Standard
Error
• We can also calculate the SE for the average. The actual
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
of draws will tend to the expected value of a single draw.
average will be within about 2-3 SE’s (rarely more) of the
expected value.
• We have similar results for sums (the book only talks
about sums at this point). It’s all the same, because sum
is just average times number of draws!
Is that all? No. It turns out that we can also estimate the
probabilities for how far off the average (or sum) will be. These
probabilities follow... you guessed it, the normal curve.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Probability histograms
The histograms that we saw so far are data histograms (a.k.a.
“empirical”):
• They are based on data
• Area under the histogram represents the percents (or
counts) of cases
We will now look at a new type of histogram: probability
histograms:
• They are based on theory, not on data
• Area under the histogram represents chance
Definition
A probability histogram represents chance by area.
The total area under the histogram is 100%. This type of
histogram can also be a smooth curve.
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
The mathematical result proving that we can use the normal
curve for averages, sums and some other beasts, is known as
the Central Limit Theorem (CLT).
Some people (myself included) think that it is more like a law of
nature (e.g., Newton’s Laws) than like a piece of math. Never
mind.
The CLT is the main reason why we put so much effort into
learning the normal curve. It is what makes the normal curve so
commonly used. But before we present the CLT, we need one
new concept.
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Probability histograms
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
We have actually seen at least two probability histograms:
Probabilities of 2−Dice Sums
0.15
• When repeating the same draw independently, the average
0.10
Example 3:
Kerrich’s
Coin Tossing
Experiment
Central Limit
Theorem
Probability
So probability theory allows us to say that:
A Law of Nature?
0.05
Central Limit
Theorem
Can we say more?
Stat 220,
Parts IV-V
Probability
and Chance
Variability
0.00
Stat 220,
Parts IV-V
Probability
and Chance
Variability
2
3
4
5
6
7
8
9
10
Sum
The normal curve is a continuous probability histogram; the
graph of monopoly-move probabilities is a discrete one.
11
12
Stat 220,
Parts IV-V
Probability
and Chance
Variability
The Law of Averages for probability
histograms
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Law of Averages for Probability Histograms
As we draw many independent instances of the same r.v., the
data histogram will look more and more like the r.v.’s
probability histogram.
Definition
Example
The Short
Cut
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Summary
This is how it looks for Monopoly moves.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Example 6: Probability Histogram
of a Sum
Central Limit
Theorem
Sum of draws from the box
1
Stat 220,
Parts IV-V
Probability
and Chance
Variability
2
9
On the next slide we see a histogram of sums of 25, 50 and 100
draws, each repeated a large number of times. This is known as
a statistical simulation.
Because of the Law of Averages for probability histograms,
when we simulate the same random process many times and
make the same calculation each time, the histogram of all
calculation results will be similar to the probability histogram for
the corresponding r.v.
Statistical simulation can be used (as we do here) to
demonstrate a theoretical result about an r.v., or (as statistical
researchers do) to explore the probability properties of r.v.’s for
which we don’t have a theoretical result.
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Central limit theorem
Theorem (The Central Limit Theorem)
When drawing at random with replacement from a box, the
probability histogram for the average (and the sum) will follow
the normal curve, even if the contents of the box do not. The
histogram must be put in standard units, and the number of
draws must be reasonably large.
Notes:
1
The theorem applies to averages and sums (also to
medians, btw), but not necessarily to every number
calculated from the data.
2
There is no clear-cut answer to the question what
’reasonably large’ is. Much depends on the contents of the
box, but for the average of 100 draws the probability
histogram will usually be very close to the normal curve.
Example 6: Probability Histogram
of a Sum
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
How to use the CLT
On a conceptual level, the CLT is a very fortunate result
because the normal curve has very thin tails compared to most
distributions. So any number that obeys the CLT will rarely
stray too far from its expected value.
On a practical level, we can now make probability statements
about averages and sums, using the normal curve. We are
already familiar with normal-curve lookup techniques.
The average of this normal curve will be the Expected Value
(EV) of the average (or sum).
The SD of this normal curve will be the Standard Error (SE) of
the average (or sum).
Example 7: solution to part 1
Set up the box model: 120 draws with replacement from a box
with tickets 0,0,0,0,0,1.
• EV = (# of draws) × (avg of box) = 120 × 61 = 20
• SE for sum of draws =
• SD of box = (1 − 0) ×
• SE for sum of draws =
p
q
√
# of draws × (SD of the box)
1
6
×
5
6
= .37
120 × .37 = 4.1
Normal approximation: New Average = 20, New SD = 4.1
Oops... the probability histogram for number of sixes looks like
the monopoly one. It is discrete. But the normal curve is
continuous. If we look up an interval from 15 to 25, we are
really cutting the rectangles representing 15 and 25 sixes, right
down the middle. This can distort the results; in fact we are
underestimating the probability.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 7: Using the CLT for Dice
Rolls
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
Roll a die 120 times
1
Use the normal
approximation to estimate
the chances of getting
between 15 and 25 sixes,
inclusive.
2
Use the normal
approximation to estimate
the chances of getting
exactly 20 sixes.
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Continuity correction
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
The solution: continuity correction. Use it for normal
approximation, whenever the actual data come in whole
numbers (integers). For example:
• Exactly 20: look up the interval 19.5 to 20.5
• Between 15 and and 25, inclusive: 14.5 to 25.5
• Between 15 and and 25, exclusive: 15.5 to 24.5
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 7: solution to part 1
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Now we can continue our solution. 15 to 25 inclusive becomes
14.5 to 25.5 for the look-up.
z=
sum−EV
SE
=
25.5−20
4.1
= 1.35
The probability is ≈ 0.82, or 82%.
Overview
Probability
Histograms
The Central
Limit
Theorem
The CLT and Regression
Why did we use the normal curve in regression? Because the
CLT affects regression in two ways:
1
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Normal Approximation:
Central Limit
Theorem
In the probability histogram, the bar for 20 goes from 19.5 to
20.5. So we should find this area under the normal curve.
z=
sum−EV
SE
=
20.5−20
4.1
= .12
From normal table: Area=8.0%. So chance ≈ 8.0%
Summary
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Example 3:
Kerrich’s
Coin Tossing
Experiment
Definition
Example
The Short
Cut
Probability of getting exactly 20 sixes.
The EV and SE are the same as before: EV = 20, SE = 4.1
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Central Limit
Theorem
The
Standard
Error
Example 7: solution to part 2
2
The data points themselves might represent some sort of
averaging process. For example, a person’s height is
affected by many little things combined. So looking at the
distribution of heights in a large population (in a relatively
homogeneous ethnic setting), is like looking at a
probability histogram of averages. That explains why the
height distribution is approximately normal.
The estimated regression line is itself a type of average.
Which means it is subject to the CLT as well. So even if x
or y do not look normal, we can often still make probability
statements about the line using the normal approximation.
Note: if x and y look very far from normal, we can use robust
regression methods that do not assume normality.
Stat 220,
Parts IV-V
Probability
and Chance
Variability
Central Limit
Theorem
Example 3:
Kerrich’s
Coin Tossing
Experiment
The
Standard
Error
Definition
Example
The Short
Cut
Normal Approximation:
Central Limit
Theorem
Overview
Probability
Histograms
The Central
Limit
Theorem
Summary
Summary
• For averages and sums of draws with replacement, we can
often assume that their probabilities follow the normal
curve. This is known as the Central Limit Theorem or CLT.
• Note: the book focuses on sums (the authors think sums
are easier); the CLT is actually about averages, and most
practical uses involve averages. The results hold for sums,
because sum is just average times number of draws.
• We use the expected value and SE for the average (or the
sum) in order to convert to and from standard units, and
make probability statements about the average (sum).
• If we use the normal curve to estimate a discrete probability
histogram, a continuity correction is recommended.
Download