Sampling and Sample Size Part 1

advertisement
Sampling and Sample Size Part 1
Cally Ardington
Course Overview
1.
2.
3.
4.
5.
6.
7.
8.
What is Evaluation?
Outcomes, Impact, and Indicators
Why Randomise?
How to Randomise?
Sampling and Sample Size
Threats and Analysis
Project from Start to Finish
Cost Effectiveness and Scaling
Lecture Outline
• Precision and accuracy
• Statistical tools
Population and sampling distribution
Law of Large Number and Central Limit
Theorem
Standard deviation and standard error
Which of these is more accurate?
I.
II.
33%
33%
33%
kn
ow
Do
n’
t
II.
I.
A. I.
B. II.
C. Don’t know
Precision (Sample Size)
Accuracy versus Precision
estimates
truth
Accuracy (Randomization)
Precision (Sample Size)
Accuracy versus Precision
truth
estimates
truth
estimates
truth
estimates
truth
estimates
Accuracy (Randomization)
This session’s question
• How large does the sample need to be for you
to be able to detect a given treatment effect?
• Randomization removes the bias (ensures
accuracy) but it does not remove noise
• We control precision with sample size
Lecture Outline
• Precision and accuracy
• Statistical tools
Population and sampling distribution
Law of Large Number and Central Limit Theorem
Standard deviation and standard error
Population distribution
500
600
1 Standard
Deviation
450
500
400
Population Frequency
350
400
Standard deviation
300
Population mean
26
250
300
200
200
150
100
100
50
0
0
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95
100
Take a random sample : Sampling
distribution
500
4.0%
450
3.5%
400
3.0%
350
300
26
250
2.5%
Population distribution
2.0%
Sampling distribution
(1)
Population mean
200
1.5%
150
1.0%
100
0.5%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95 100
Lecture Outline
• Precision and accuracy
• Statistical tools
Population and sampling distribution
Law of Large Number and Central Limit Theorem
Standard deviation and standard error
• We generally don’t have a our population
distribution but, we have our sampling
distribution.
• What do we know about our sampling
distribution?
• Two statistical laws help us here
(1) Central Limit Theorem
(2) The Law of Large Numbers
(1) Central Limit Theorem
500
400
300
200
100
0
This is the distribution of the
population
(Population Distribution)
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
test scores
To here…
This is the distribution of Means from all Random
Samples
(Sampling distribution)
Central Limit Theorem
Draw 2
Mean test
score
Draw 1
Mean test
score
Draw 3
Mean test
score
Central Limit Theorem
Draw 4
Mean test
score
Draw 6
Mean test
score
Draw 5
Mean test
score
Central Limit Theorem
Draw 7
Mean test
score
Draw 8
Mean test
score
Draw 9
Mean test
score
Draw 10
Mean test
score
Draw 10 random students, take the
average, plot it: 10 times.
Frequency of Means With 10 draws
10
9
8
7
6
5
4
3
2
1
0
Inadequate sample size
No clear distribution
around population mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Draw 10 random students:
50 and 100 times
Frequency of Means With 50 draws
10
9
8
7
6
5
4
3
2
1
0
More sample means
around population mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means with 100 draws
10
9
8
7
6
5
4
3
2
1
0
Still spread a good deal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Draws 10 random students:
500 and 1000 times
Frequency of Means With 500 draws
80
70
60
50
40
30
20
10
0
Distribution now
significantly more normal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 1000 draws
80
70
Starting to see peaks
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
• This is a theoretical exercise. In reality we do
not have multiple draws, we only have one draw.
• BUT, we can control the number of people in
that draw. This is what we refer to as SAMPLE
SIZE.
• The previous example was based on a sample
size of 10
• What happens if we take a sample size of 50?
What happens to the sampling distribution if we draw a sample size of 50
instead of 10, and take the mean (thousands of times)?
a.
..
B
&
un
de
rly
in
gs
A
Bo
th
Ne
ith
er
.T
he
cu
rv
e
be
ll
Th
e
W
e
w
ill
ap
pr
oa
ch
w
ill
a
be
be
ll c
ur
v.
..
A. We will approach a bell
curve faster (than with
a sample size of 10)
B. The bell curve will be
narrower
C. Both A & B
D. Neither. The
underlying sampling
distribution does not
change.
na
rro
w
er
25% 25% 25% 25%
(2) Law of Large Numbers
N = 10
10
Frequency of Means With 5
Samples
N = 50
Frequency of Means With 5
Samples
10
8
8
6
6
4
4
2
2
0
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 10
Samples
10
8
6
4
2
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 10
Samples
10
8
6
4
2
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
N= 10
N = 50
Frequency of Means With 500
Samples
90
80
70
60
50
40
30
20
10
0
Frequency of Means With 500
Samples
90
80
70
60
50
40
30
20
10
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 1000
Samples
Frequency of Means With 1000
Samples
160
140
120
100
80
60
40
20
0
160
140
120
100
80
60
40
20
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Lecture Outline
• Precision and accuracy
• Statistical tools
Population and sampling distribution
Law of Large Number and Central Limit Theorem
Standard deviation and standard error
Standard deviation/error
• What’s the difference between the standard
deviation and the standard error?
• The standard error = the standard deviation of
the sampling distributions
Variance and Standard Deviation
• Variance = 400
𝜎2 =
𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑉𝑎𝑙𝑢𝑒 − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒
𝑁
• Standard Deviation = 20
𝜎 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• Standard Error = 20
𝑁
SE = 𝜎
𝑁
2
Standard Deviation
500
4.0%
450
3.5%
400
3.0%
350
2.5%
300
26
250
2.0%
200
1.5%
150
1.0%
100
0.5%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95
100
Sample Frequency
Population mean
Standard deviation
Standard Error
500
4.0%
450
3.5%
400
3.0%
350
2.5%
300
26
250
2.0%
200
1.5%
150
1.0%
100
0.5%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95
100
Sample Frequency
Population mean
Standard deviation
Standard error
Sample size ↑ x4, SE ↓ ½
500
4.5%
450
4.0%
400
3.5%
350
3.0%
300
2.5%
26
250
2.0%
200
1.5%
150
1.0%
100
0.5%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95
100
Sample Frequency
Population mean
Standard deviation
Standard error
Sample Distribution
Sample size ↑ x9, SE ↓ ?
500
7.0%
450
6.0%
400
5.0%
350
300
4.0%
26
250
3.0%
200
150
2.0%
100
1.0%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95 100
Sample Frequency
Population mean
Standard deviation
Standard error
Sample Distribution
Sample size ↑ x100, SE ↓?
500
25.0%
450
400
20.0%
350
300
15.0%
26
250
200
10.0%
150
100
5.0%
50
0
0.0%
0
5
10
15
20
25
30
35
40
45 50 55
test scores
60
65
70
75
80
85
90
95
100
Sample Frequency
Population mean
Standard deviation
Standard error
Sample Distribution
Download