The law of averages The central limit theorem The Law of Averages and the Central Limit Theorem Patrick Breheny September 24 Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Kerrich’s experiment A South African mathematician named John Kerrich was visiting Copenhagen in 1940 when Germany invaded Denmark Kerrich spent the next five years in an interment camp To pass the time, he carried out a series of experiments in probability theory One of them involved flipping a coin 10,000 times Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The law of averages What does the law of averages say will happen with Kerrich’s coin-tossing experiment? We all know that a coin lands heads with probability 50% After many tosses, the law of averages says that the number of heads should be about the same as the number of tails . . . . . . or does it? Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Kerrich’s results Number of tosses 10 100 500 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Number of heads 4 44 255 502 1,013 1,510 2,029 2,533 3,009 3,516 4,034 4,538 5,067 Patrick Breheny Heads 0.5·Tosses -1 -6 5 2 13 10 29 33 9 16 34 38 67 STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error 50 0 −50 Number of heads minus half the number of tosses Kerrich’s results plotted 10 50 100 500 1000 Number of tosses Patrick Breheny STA 580: Biostatistics I 5000 10000 The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Where’s the law of averages? Instead of the number of heads getting closer to the number of tails, they seem to be getting farther apart This is not a fluke of this particular experiment; as the number of tosses goes up, the absolute size of the difference between the number of heads and number of tails is also likely to go up However, compared with the number of tosses, the difference is becoming quite small Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Chance error Consider the following equation, where h equals the number of heads and n equals the number of tosses: h= 1 · n + chance error 2 As n goes up, chance error will tend to go up as well However, it does not go up as fast as n does; i.e., if we flipped our coin another 10,000 times, the chance error will be likely to be bigger, but not twice as big Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Chance error (cont’d) Now consider dividing the previous equation by n: h̄ = 1 chance error + 2 n The final term will go to zero as n gets larger and larger This is what the law of averages says: as the number of tosses goes up, the difference between the number of heads and number of tails gets bigger, but the difference between the percentage of heads and 50% gets smaller Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error 55 50 45 40 Percentage of heads 60 Kerrich’s percentage of heads 10 50 100 500 1000 Number of tosses Patrick Breheny STA 580: Biostatistics I 5000 10000 The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error 100 50 0 −50 −100 Number of heads minus half the number of tosses Repeating the experiment 50 times 10 50 100 500 1000 Number of tosses Patrick Breheny STA 580: Biostatistics I 5000 10000 The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error 60 40 20 0 Percentage of heads 80 100 Repeating the experiment 50 times (cont’d) 10 50 100 500 1000 Number of tosses Patrick Breheny STA 580: Biostatistics I 5000 10000 The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Convergence From the coin-tossing experiment, we saw that as the sample size increases, the average tends to settle in on an answer Statisticians say that the average number of heads converges to one half The connection between tossing coins and sampling from a population may not be obvious, but imagine drawing a large random sample from the population and counting the number of males vs. females This is essentially the same random process Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Population parameters We said that researchers are usually interested in a numerical quantity (parameter) of a population that they would like to generalize their findings to, but that they can only sample a fraction of that population As that fraction gets larger and larger, the numerical quantity that they are interested in will converge to the population parameter There are two main points here: The population parameter is the value I would get if I could take an infinitely large sample from the population The accuracy of my sample statistic will tend to increase (i.e., it will tend to get closer to the population parameter) as my sample size gets bigger Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error A note about wording It is important to keep in mind that the population mean is a different concept than a sample mean The population mean is an unknown quantity that we would like to measure The sample mean is a statistic that describes one specific list of numbers So be careful to distinguish between the mean of a population/distribution and the mean of a sample The same goes for other statistics, like the population standard deviation vs. the sample standard deviation Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The mean and standard deviation of the normal distribution Recall that when you standardize a variable, the resulting list of numbers has mean 0 and standard deviation 1 Since histograms of these standardized variables seem to match up quite well with the normal distribution, one would think that the normal distribution also has mean 0 and standard deviation 1 Indeed, one can show this to be true Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The mean of the binomial distribution When our random variable is an event that either occurs (1) or doesn’t occur (0), the population mean is simply the probability of the event This matches up with our earlier definition of probability as the long-run fraction of time that an event occurs What about a binomial distribution? If the average number of times an event occurs with each trial is p, then the average number of times the event occurs in n trials must be p + p + · · · + p = np Therefore, the mean of the binomial distribution is np Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The standard deviation of a binomial distribution Because we know the probability of each value occurring, we can derive the standard deviation of the binomial distribution as well (although the actual calculations are a little too lengthy for this class) This standard deviation turns out to be p np(1 − p) This formula makes sense: as p gets close to 0 or 1, there is less unpredictability in the outcome of the random process, and the spread of the binomial distribution will be smaller Conversely, when p ≈ 0.5, the unpredictability is at its maximum, and so is the spread of the binomial distribution Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Sampling distributions We now turn our attention from distributions of data to the sampling distributions that we talked about earlier Imagine we were to sample 10 people, measure their heights, and take the average The distribution we are talking about now is the distribution of the average height of a sample of 10 people Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The expected value and the standard error When trying to summarize a histogram, we said that the two main qualities we wanted to describe were the center (measured by the average) and the spread (measured by the standard deviation) The same is true for sampling distributions We will be interested in characterizing their center and spread, only now we will call its center the expected value and its spread the standard error Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The expected value of the mean To study expected values, lets continue with our hypothetical experiment where we go out and measure the average height of 10 people In an actual experiment, we would only do this once, and our population would be unknown To illustrate the concepts of sampling distributions, let’s instead repeat the experiment a million times, and sample from a known population: the NHANES sample of adult women Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error 0 50000 Frequency 100000 150000 200000 Histogram of our simulation 60 62 64 66 Sample means Patrick Breheny STA 580: Biostatistics I 68 The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Expected value of the mean So where is the center of this histogram? The mean of the 1 million sample averages was 63.49 inches Its center is exactly at the population mean: 63.49 inches This is what statisticians mean when they say that the sample average is an unbiased estimator: that its expected value is equal to the population value Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The standard error of the mean How about the spread of the histogram? Would we expect its spread to be the same as the spread of the population? No; as we saw with the coin flipping, the spread of the sample average definitely went down as our sample size increased As Sherlock Holmes says in The Sign of the Four: “While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician.” Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The standard error of the mean (cont’d) Indeed, the standard error of the mean (the standard deviation of the sample means) was 0.869 inches This is much lower than the population standard deviation: 2.75 inches Therefore, while an individual person would be expected to be 63.5 inches give or take 2.75 inches, the average of a group of ten people would be expected to be 63.5 give or take only 0.87 inches Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The square root law The formula relating the standard error for a sample mean and the standard deviation of the population is called a square root law It says that SD SE = √ n For this example, 2.749 SE = √ 10 = 0.869 Note the connection with the law of averages: as n gets large, SE goes to 0 and the average converges to the expected value Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error The expected value of the standard deviation So we saw that the expected value of the mean was exactly equal to the population mean Is the same true for the sample standard deviation? What about the root-mean-square? We can repeat the experiment, recording the sample standard deviations and the root-mean-squares of our 10-person samples Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction Convergence and population parameters The expected value and the standard error Neither estimator is unbiased The results: Expected value of the sample standard deviation: 2.67 inches Expected value of the root-mean-square: 2.53 inches Neither one is unbiased! However, the standard deviation is less biased than the root-mean-square, and that is why people use it Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Suppose we have an infinitely large urn with two types of balls in it: half are numbered 0 and the other half numbered 1 0.3 0.2 0.0 0.1 Probability 0.4 0.5 What does this distribution look like? 0 1 Number on ball What is this distribution? Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem What is its mean? np = 1(0.5) = 0.5 What is its standard deviation? p p np(1 − p) = 1(0.5)(0.5) = 0.5 Patrick Breheny STA 580: Biostatistics I Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem 0.15 0.00 0.05 0.10 Probability 0.20 0.25 0.30 Now consider the sum of five balls from the urn: 0 1 2 3 4 5 Sum of balls What is this distribution? Patrick Breheny STA 580: Biostatistics I Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem Instead of the sum, let’s consider the average of those five balls 0.15 0.10 0.05 0.00 Probability 0.20 0.25 0.30 We know the distribution of the sum, so we know the distribution of its average: 0 0.2 0.4 0.6 0.8 1 Average of balls Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem What is the expected value of the average? 5(0.5) np = = 0.5 n 5 What is the standard error of the average? p p np(1 − p) 5(0.5)(0.5) = = 0.22 n 5 Patrick Breheny STA 580: Biostatistics I Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem 0.15 0.10 0.00 0.05 Probability 0.20 The sampling distribution of the average of 10 balls: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Average of balls What is its expected value? What is its standard error? Patrick Breheny STA 580: Biostatistics I 1 The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem 0.04 0.00 0.02 Probability 0.06 The sampling distribution of the average of 100 balls: 0 0.03 0.07 0.1 0.13 0.17 0.2 0.23 0.27 0.3 0.33 0.37 0.4 0.43 0.47 0.5 0.53 0.57 0.6 0.63 0.67 0.7 0.73 Average of balls What is its expected value? What is its standard error? Patrick Breheny STA 580: Biostatistics I 0.77 0.8 0.83 0.87 0.9 0.93 0.97 1 The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem To recap: For the population as a whole, the variable had mean 0.5, standard deviation 0.5, and a flat shape The expected value, standard error, and shape of the sampling distribution of the average of the variable were: n 5 10 100 Expected value 0.5 0.5 0.5 Standard error 0.22 0.16 0.05 Patrick Breheny Shape Kind of normal More normal Pretty darn normal STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The central limit theorem There are three very important phenomena going on here: #1 The expected value of the sampling distribution is always equal to the population average #2 The standard error of the sampling distribution is always equal to the population standard deviation divided by the square root of n #3 As n gets larger, the sampling distribution looks more and more like the normal distribution These three properties of the sampling distribution of the sample average hold for any distribution Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The central limit theorem (cont’d) This result is called the central limit theorem, and it is one of the most important, remarkable, and powerful results in all of statistics In the real world, we rarely know the distribution of our data But the central limit theorem says: we don’t have to Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The central limit theorem (cont’d) Furthermore, as we have seen, knowing the mean and standard deviation of a distribution that is approximately normal allows us to calculate anything we wish to know with tremendous accuracy – and the sampling distribution of the mean is always approximately normal The only caveats: Observations must be independently drawn from the population The central limit theorem applies to the sampling distribution of the mean – not necessarily to the sampling distribution of other statistics How large does n have to be before the distribution becomes close enough in shape to the normal distribution? Patrick Breheny STA 580: Biostatistics I Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem Example #1 n=10 Density 0.5 1.0 1.5 2.0 1.5 0.0 1.0 Density 2.0 2.5 2.5 3.0 3.0 3.5 Population 0.0 0.2 0.4 0.6 0.8 1.0 x 0.2 0.4 0.6 Sample means Patrick Breheny STA 580: Biostatistics I 0.8 Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem Example #2 n=10 0.3 Density 0.2 0.10 0.1 0.05 0.0 0.00 Density 0.15 0.4 0.5 0.20 Population −6 −4 −2 0 2 4 6 x −3 −2 −1 0 Sample means Patrick Breheny STA 580: Biostatistics I 1 2 3 The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Rules of thumb In the previous two examples, the sampling distribution was very close to the normal distribution with samples of size 10 A widely used “rule of thumb” is to require n to be about 20 However, this depends entirely on the original distribution: If the original distribution was close to normal, n = 2 might be enough If the original distribution is highly skewed or strange in some other way, n = 50 might not be enough Patrick Breheny STA 580: Biostatistics I Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem A troublesome distribution For example, imagine an urn containing the numbers 1, 2, and 9: 0.4 0.0 0.2 Density 0.6 0.8 n=20 2 3 4 5 6 7 Sample mean 0.4 0.2 0.0 Density 0.6 0.8 n=50 2 3 4 Patrick Breheny 5 STA 580: Biostatistics I Sample mean 6 Introduction The central limit theorem How large does n have to be? Applying the central limit theorem The law of averages The central limit theorem A troublesome distribution (cont’d) 0.6 0.4 0.2 0.0 Density 0.8 1.0 n=100 2.5 3.0 3.5 4.0 4.5 Sample mean Patrick Breheny STA 580: Biostatistics I 5.0 5.5 The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem An example from the real world Weight tends to be skewed to the right (far more people are overweight than underweight) As we did before with height, let’s perform an experiment in which the NHANES sample of adult men is the population, and we draw samples from it (i.e., we are re-sampling the NHANES sample) Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Results 0.02 0.01 0.00 Density 0.03 0.04 n=20 160 180 200 220 240 Sample mean Patrick Breheny STA 580: Biostatistics I 260 The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Sampling distribution of serum cholesterol According the National Center for Health Statistics, the distribution of serum cholesterol levels for 20- to 74-year-old males living in the United States has mean 211 mg/dl, and a standard deviation of 46 mg/dl We are planning to collect a sample of 25 individuals and measure their cholesterol levels What is the probability that our sample average will be above 230? Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution The first thing we would do is determine the expected value and standard error of the sampling distribution The expected value will be identical to the population mean, 211 √ The standard error will be smaller by a factor of n: SD SE = √ n 46 =√ 25 = 9.2 Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution (cont’d) Next, we would determine how many standard deviations away from the mean 230 is: 230 − 211 = 2.07 9.2 What is the probability that a normally distributed random variable is more than 2.07 standard deviations above the mean? 1-.981 = 1.9% Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem A different question 95% of our sample averages will fall between what two numbers? The first step: what two values of the normal distribution contain 95% of the data? The 2.5th percentile of the normal distribution is -1.96 Thus, a normally distributed random variable will lie within 1.96 standard deviations of its mean 95% of the time Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution Which numbers are 1.96 standard deviations away from the expected value of the sampling distribution? 211 − 1.96(9.2) = 193.0 211 + 1.96(9.2) = 229.0 Therefore, 95% of our sample averages will fall between 193 mg/dl and 229 mg/dl Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem A different sample size What if we had only collected samples of size 10? Now, the standard error is 46 SE = √ 10 = 14.5 Now what is the probability of that our sample average will be above 230? Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution Now 230 is only 230 − 211 = 1.31 14.5 standard deviations away from the expected value The probability of being more than 1.31 standard deviations above the mean is 9.6% This is almost 5 times higher than the 1.9% we calculated earlier for the larger sample size Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution #2 What about the values that would contain 95% of our sample averages? The values 1.96 standard errors away from the expected value are now 211 − 1.96(14.5) = 182.5 211 + 1.96(14.5) = 239.5 Note how much wider this interval is than the interval (193,229) for the larger sample size Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Another example What if we’d increased the sample size to 50? Now the standard error is 6.5, and the values 211 − 1.96(6.5) = 198.2 211 + 1.96(6.5) = 223.8 contain 95% of the sample averages Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Summary n 10 25 50 SE 14.5 9.2 6.5 Interval (182.5,239.5) (193.0,229.0) (198.2,223.8) Width of interval 57.0 36.0 25.6 The width of the interval is going down by what factor? Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem What sample size do we need? Finally, we ask a slightly harder question: How large would the sample size need to be in order to insure a 95% probability that the sample average will be within 5 mg/dl of the population mean? As we saw earlier, 95% of observations fall within 1.96 standard deviations of the mean Thus, we need to get the standard error to satisfy 1.96(SE) = 5 SE = Patrick Breheny 5 1.96 STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem Solution The standard error is equal to the standard deviation over the square root of n, so 5 SD = √ 1.96 n √ n = SD · 1.96 5 n = 325.1 In the real world, we of course cannot sample 325.1 people, so we would sample 326 to be safe Patrick Breheny STA 580: Biostatistics I The law of averages The central limit theorem Introduction The central limit theorem How large does n have to be? Applying the central limit theorem One last question How large would the sample size need to be in order to insure a 90% probability that the sample average will be within 10 mg/dl of the population mean? There is a 90% probability that a normally distributed random variable will fall within 1.645 standard deviations of the mean Thus, we want 1.645(SE) = 10, so 10 46 =√ 1.645 n n = 57.3 Thus, we would sample 58 people Patrick Breheny STA 580: Biostatistics I