Sampling and Sample Size Part 1 Cally Ardington Course Overview 1. 2. 3. 4. 5. 6. 7. 8. What is Evaluation? Outcomes, Impact, and Indicators Why Randomise? How to Randomise? Sampling and Sample Size Threats and Analysis Project from Start to Finish Cost Effectiveness and Scaling Lecture Outline • Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit Theorem Standard deviation and standard error Which of these is more accurate? I. II. 33% 33% 33% kn ow Do n’ t II. I. A. I. B. II. C. Don’t know Precision (Sample Size) Accuracy versus Precision estimates truth Accuracy (Randomization) Precision (Sample Size) Accuracy versus Precision truth estimates truth estimates truth estimates truth estimates Accuracy (Randomization) This session’s question • How large does the sample need to be for you to be able to detect a given treatment effect? • Randomization removes the bias (ensures accuracy) but it does not remove noise • We control precision with sample size Lecture Outline • Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit Theorem Standard deviation and standard error Population distribution 500 600 1 Standard Deviation 450 500 400 Population Frequency 350 400 Standard deviation 300 Population mean 26 250 300 200 200 150 100 100 50 0 0 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Take a random sample : Sampling distribution 500 4.0% 450 3.5% 400 3.0% 350 300 26 250 2.5% Population distribution 2.0% Sampling distribution (1) Population mean 200 1.5% 150 1.0% 100 0.5% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Lecture Outline • Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit Theorem Standard deviation and standard error • We generally don’t have a our population distribution but, we have our sampling distribution. • What do we know about our sampling distribution? • Two statistical laws help us here (1) Central Limit Theorem (2) The Law of Large Numbers (1) Central Limit Theorem 500 400 300 200 100 0 This is the distribution of the population (Population Distribution) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores To here… This is the distribution of Means from all Random Samples (Sampling distribution) Central Limit Theorem Draw 2 Mean test score Draw 1 Mean test score Draw 3 Mean test score Central Limit Theorem Draw 4 Mean test score Draw 6 Mean test score Draw 5 Mean test score Central Limit Theorem Draw 7 Mean test score Draw 8 Mean test score Draw 9 Mean test score Draw 10 Mean test score Draw 10 random students, take the average, plot it: 10 times. Frequency of Means With 10 draws 10 9 8 7 6 5 4 3 2 1 0 Inadequate sample size No clear distribution around population mean 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Draw 10 random students: 50 and 100 times Frequency of Means With 50 draws 10 9 8 7 6 5 4 3 2 1 0 More sample means around population mean 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Frequency of Means with 100 draws 10 9 8 7 6 5 4 3 2 1 0 Still spread a good deal 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Draws 10 random students: 500 and 1000 times Frequency of Means With 500 draws 80 70 60 50 40 30 20 10 0 Distribution now significantly more normal 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Frequency of Means With 1000 draws 80 70 Starting to see peaks 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 • This is a theoretical exercise. In reality we do not have multiple draws, we only have one draw. • BUT, we can control the number of people in that draw. This is what we refer to as SAMPLE SIZE. • The previous example was based on a sample size of 10 • What happens if we take a sample size of 50? What happens to the sampling distribution if we draw a sample size of 50 instead of 10, and take the mean (thousands of times)? a. .. B & un de rly in gs A Bo th Ne ith er .T he cu rv e be ll Th e W e w ill ap pr oa ch w ill a be be ll c ur v. .. A. We will approach a bell curve faster (than with a sample size of 10) B. The bell curve will be narrower C. Both A & B D. Neither. The underlying sampling distribution does not change. na rro w er 25% 25% 25% 25% (2) Law of Large Numbers N = 10 10 Frequency of Means With 5 Samples N = 50 Frequency of Means With 5 Samples 10 8 8 6 6 4 4 2 2 0 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Frequency of Means With 10 Samples 10 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Frequency of Means With 10 Samples 10 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 N= 10 N = 50 Frequency of Means With 500 Samples 90 80 70 60 50 40 30 20 10 0 Frequency of Means With 500 Samples 90 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Frequency of Means With 1000 Samples Frequency of Means With 1000 Samples 160 140 120 100 80 60 40 20 0 160 140 120 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Lecture Outline • Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit Theorem Standard deviation and standard error Standard deviation/error • What’s the difference between the standard deviation and the standard error? • The standard error = the standard deviation of the sampling distributions Variance and Standard Deviation • Variance = 400 𝜎2 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑉𝑎𝑙𝑢𝑒 − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑁 • Standard Deviation = 20 𝜎 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 • Standard Error = 20 𝑁 SE = 𝜎 𝑁 2 Standard Deviation 500 4.0% 450 3.5% 400 3.0% 350 2.5% 300 26 250 2.0% 200 1.5% 150 1.0% 100 0.5% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Sample Frequency Population mean Standard deviation Standard Error 500 4.0% 450 3.5% 400 3.0% 350 2.5% 300 26 250 2.0% 200 1.5% 150 1.0% 100 0.5% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Sample Frequency Population mean Standard deviation Standard error Sample size ↑ x4, SE ↓ ½ 500 4.5% 450 4.0% 400 3.5% 350 3.0% 300 2.5% 26 250 2.0% 200 1.5% 150 1.0% 100 0.5% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Sample Frequency Population mean Standard deviation Standard error Sample Distribution Sample size ↑ x9, SE ↓ ? 500 7.0% 450 6.0% 400 5.0% 350 300 4.0% 26 250 3.0% 200 150 2.0% 100 1.0% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Sample Frequency Population mean Standard deviation Standard error Sample Distribution Sample size ↑ x100, SE ↓? 500 25.0% 450 400 20.0% 350 300 15.0% 26 250 200 10.0% 150 100 5.0% 50 0 0.0% 0 5 10 15 20 25 30 35 40 45 50 55 test scores 60 65 70 75 80 85 90 95 100 Sample Frequency Population mean Standard deviation Standard error Sample Distribution