Al-Ahliyya Amman University Chapter 5 Central Limit Theorem CLT Dr Alia Al nuaimat What is the point? Research questions are about population parameters (these are unknown). This chapter covers estimation, one of the two types of statistical inference. As discussed in earlier chapters, statistics, such as means and variances, can be calculated from samples drawn from populations. These statistics serve as estimates of the corresponding population parameters. We expect these estimates to differ by some amount from the parameters they estimate. This chapter introduces estimation procedures that take these differences into account, thereby providing a foundation for statistical inference procedures discussed in the remaining chapters of the book. 2 Unbiased estimator • A statistic is said to be an unbiased estimate of a given parameter when the mean of the sampling distribution of that statistic can be shown to be equal to the parameter being estimated. For example, the mean of a sample is an unbiased estimate of the mean of the population from which the sample was drawn. We require that the sample fairly represents the population of interest. • We WANT our statistic to be an unbiased estimator of a population parameter. • Our research question is about an unknown population and we will use statistics (know quantities calculated from our data to provide evidence that addresses our research question). • Having an unbiased estimator gives us good evidence. Central Limit Theorem (or CLT) • The fundamental or central theorem of Statistics says that under certain conditions (i.e., random sample and large enough sample size) the following is true: “The sampling distribution of a sample mean is a normal curve. “ Central limit theorem • Suppose we have a population with mean μ and standard deviation σ. • If we take simple random samples of size n and the sample size is sufficiently large (> 50), then the sampling distribution of the sample means is approximately normal with mean = μ, and standard error (i.e., standard deviation = σ/√ N ). ADDITIONAL MATERIAL TO HELP YOU UNDERSTAND SAMPLING DISTRIBUTIONS AND THE CENTRAL LIMIT THEOREM (CLT): • The goal of most research is inference (taking information from a sample and generalizing it to a population). • Valid inference depends on a selecting a sample that fairly represents a population. • To illustrate the concept of a sampling distribution, suppose that 25 researchers have the same question: What is the mean weight of cats in Gainesville, Florida? Suppose that each researcher finds a random sample of 30 cats, weighs each cat, and calculates the sample mean. Possible data collected from these experiments is given in the table on the following table. Each sample yields a different value of the sample mean, so the sample mean can be thought of as a random variable. Next Step! • We could graph the values of the sample means. The histogram would give us an idea of the probability distribution. • This probability distribution, which shows how to assign probability to the values of statistics, is called a sampling distribution. • The standard deviation of a sampling distribution is called a standard error. • Mathematical theory lets us know what the sampling distributions are for various statistics. Definition • Sampling distribution: The probability distribution of a statistic when the statistic is considered as a random variable (e.g. mean for several sample). • Standard error: The standard deviation of a sampling distribution. Its reflect the error (or how much we close) in sampling to determine the mean(SE = S/√N). If N increased SE decrease will be more accurate since its reflect true poulation General Rule in CLT • As sample size becomes larger, the distribution becomes more and more normal. • If the population data is not normally distributed, the CLT applies with sample sizes N >30. • So you can start with a random distribution, take a sample (of at least 30), plot the average of those samples and you will end up with a normal distribution • This is why a normal distribution is SO helpful and comes up so often. Sampling distribution of the Sample Mean • Derived from samples of original distribution • Will have same mean as original distribution • But as the sample size gets larger, will get a tighter fit around the mean. • When n is small eg. N=1 will usually not be normal no matter how many trials you do. As n ∞ get normal distribution • The more samples, the closer to the mean the distribution of your sample means will be?!!! What will make the sample mean more accurate? • We know, the larger the sample (n) the closer the values to the true mean. • Also the smaller true σ, the less the spread of sample means. Two Factors: n and σ Standard error of mean • SE N This does not give the variability of the population, it gives a precision of the estimate of the mean ie. “How close is my sample mean to the TRUE MEAN?” Example • Weight of adult women in a population is normally distributed, with a mean of 75 kg. Approximately 95 % of all women weigh between 55kg and 95kg. • What would the standard error of the mean for a sample of the weight of 49 women be? • For 64 women? • For 625 women? • 1.42 SE of mean for N=49 • 1.25 SE of mean for N = 64 • 0.4 SE of mean for N= 625 • What does this mean? It means that for larger samples the precision of the sample mean is better. That is it is closer to the true mean. • Calculate 95% confidence intervals for each sample mean. Confidence Interval for a Mean x̄ ± Z* σ / (√n) There’s a 95% probability that the population mean is within E of the sample mean X . 18 Distribution of sample means 0.025 0.95 0.025 1.96 Z0.025 = 1.96 n E 1.96 1.96 n n 19 Confidence Interval for a Mean E 1.96 n E = Error Margin There’s a 95% probability that X , the sample mean, is within E of the population mean . 20 Example: 95% Confidence Interval x Interpretation of 95% CI • Correct We have 95% confidence that the true population mean lies within this interval A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population. 95% of the time, in repeated sampling, the interval calculated from the same sample size will include the true mean • Incorrect The probability that the mean lies between the lower and upper limits is 0.95 W H AT “ 9 0 % C O N F I D E N C E ” M E A N S • 90% Confidence Interval: Lower Bound < < Upper Bound • What “90% confidence” does not mean • We are 90% confident that the sample mean for the observed sample (the data used to obtain the bounds) lies between the bounds. ABSOLUTELY FALSE. • You can be 100% confident that the sample mean for the given data is equal to itself with virtually no error margin. 23 W H AT “ 9 0 % C O N F I D E N C E ” M E A N S (When the conditions are satisfied.) 90% of all samples produce an interval that covers the true mean . We have an interval from one sample, chosen randomly. Our interval either does or does not cover : in practice we just don’t know. We do know that the procedure works 90% of the time. 24 99 percent C.I for the mean age of Jordanians was computed to be (29.8; 38.5 years). What is the interpretation attached to this interval? (a) We are 99 percent confident that the mean age of Jordanians is between 29.8 and 38.5. (b) Ninety-nine percent of the residents in our sample had ages between 29.8 and 38.5. (c) We are 99 percent confident that the mean age of Jordanians in our sample is between 29.8 and 38.5. (d) All of the above are valid interpretations . α = tail area central area = 1 – 2α zα 0.10 0.80 z.10 = 1.28 0.05 0.90 z.05 = 1.645 0.025 0.95 z.025 = 1.96 0.01 0.98 z.01 = 2.33 0.005 0.99 z.005 = 2.58 Table 6.4 Definition Definition Theorem Procedure Figure 7.5 Locating za/2 on the standard normal curve Definition Figure 7.6 The z value (z.05) corresponding to an area equal to .05 in the upper tail of the z-distribution Figure 7.7 MINITAB output for Finding z.05 Table 7.2 Procedure Procedure Procedure Figure 7.9 Standard normal (z) distribution and t-distributions Table 7.3 Figure 7.10 The t.025 value in a t-distribution with 4 df, and the corresponding z.025 value Table 7.4 Figure 7.11 SPSS confidence interval for mean blood pressure increase Procedure Figure 7.12 MINITAB printout with descriptive statistics and 99% confidence interval for Example 7.5 Procedure Procedure Definition Figure 7.15 MINITAB printout with 90% confidence interval for p Figure 7.16 Relationship between sample size and width of confidence interval: hospital-stay example Figure 7.17 Specifying the sampling error SE as the half-width of a confidence interval Procedure Procedure