Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. Some material sourced from: Mind on Statistics Utts/Heckard, 3rd Edition, Duxbury, 2006 Text Only: ISBN 0495667161 Bundled version: ISBN 1111978301 Material from this publication used with permission. Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons – Zero Waiver Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. Stat 250 Gunderson Lecture Notes Learning about a Population Mean Part 1: Distribution for a Sample Mean Chapter 9: Section 5, SD Module 3 and Sections 9 and 10 Recall Parameters, Statistics and Sampling Distributions We go back to the scenario where we have one population of interest but now the response being measured is quantitative (not categorical). We want to learn about the value of the population mean We take a random sample and use the sample statistic, the sample mean X , to estimate the parameter. When we do this, the sample mean may not be equal to the population mean, in fact, it could change every time we take a new random sample. So recall that a statistic is a random variable and it will have a probability distribution. This probability distribution is called the sampling distribution of the statistic. We turn to understanding the sampling distribution of the sample mean which will be used to construct a confidence interval estimate for the population mean and to test hypotheses about the value of a population mean. 9.5 SD Module 3: Sampling Distribution for One Sample Mean Many responses of interest are measurements – height, weight, distance, reaction time, scores. We want to learn about a population mean and we will do so using the information provided from a sample from the population. Example: How many hours per week do you work? A poll was conducted by The Heldrich Center for Workforce Development (at Rutgers University). A probability sample of 1000 workers resulted in a mean number of hours worked per week of 43. Population = Parameter = Sample = Statistic = Can anyone say how close this observed sample mean x of 43 is to the population mean ? If we were to take another random sample of the same size, would we get the same value for the sample mean? 107 So what are the possible values for the sample mean x if we took many random samples of the same size from this population? What would the distribution of the possible x values look like? What can we say about the distribution of the sample mean? Distribution of the Sample Mean – Main Results Let = mean for the population of interest. Let = standard deviation for the population of interest. Let x = the sample mean for a random sample of size n. If all possible random samples of the same size n are taken and x is computed for each, then … The average of all of the possible sample mean values (from all possible random samples of the same size n) is equal to ___________________________. Thus the sample mean is an __________________ estimator of the population mean. The standard deviation of all of the possible sample mean values is equal to the original population standard deviation divided by n . Standard deviation of the sample mean is given by: s.d.( x ) = n What about the shape of the sampling distribution? The first two bullets above provide what the mean and the standard deviation are for the possible sample mean values. The final two bullets tell us that the shape of the distribution will be (approximately) normal. If the parent (original) population has a normal distribution, then the distribution of the possible values of x , the sample mean, is normal. Take a random sample of ANY size n and compute the sample mean. Original Population Sample Mean Population If the parent (original) population is not necessarily normally distributed but the sample size n is large, then the distribution of the possible values of x , the sample mean is approximately normal. Take a random sample of size n where n is LARGE, and compute the sample mean. Original Population 108 Sample Mean Population This last result is called the . The CLT is formally presented in Section 9.10 of your text (starting on page 345). The C in CLT is for CENTRAL. The CLT is an important or central result in statistics. As it turns out, all normal curve approximations for the big five statistics are really applications of the CLT and can be derived using the results from Section 8.8. The Stat 250 formula card summarizes distribution of a sample mean as follows: Sample Means Mean: EX X Standard Deviation: s.d .( X ) X n Sampling Distribution of X . If X has the N ( , ) distribution, then X is N , n If X follows any distribution with mean and standard deviation and n is large, . then X is approximately N , n This last result is the Central Limit Theorem From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission. Try It! ACT scores Suppose that ACT scores for students follow a normal model with mean of 18.6 and standard deviation of 5.9. a. What is the probability that a single student randomly selected will score 21 or higher? b. Suppose two students will be randomly selected. What is the probability they will both score 21 or higher? c. A random sample of 50 students is obtained. What is the probability that the mean score for these 50 students will be 21 or higher? 109 d. What if the normal distribution assumption were not given? Would your answer to part c change? Explain. Try It! Actual Flight Times Suppose the random variable X represents the actual flight time (in minutes) for Delta Airlines flights from Cincinnati to Tampa follows a uniform distribution over the range of 110 minutes to 130 minutes. a. Sketch the distribution for X (include axes labels and some values on the axes). b. Suppose we were to repeatedly take a random sample of size 100 from this distribution and compute the sample mean for each sample. What would the histogram of the sample mean values look like? Provide a smoothed out sketch about the distribution of the sample mean, include all details that you can. Try It! True or False Determine whether each of the following statements is True or False. A true statement is always true. Clearly circle your answer. a. The central limit theorem is important in statistics because for a large random sample, it says the sampling distribution of the sample mean is approximately normal. True False b. The sampling distribution of a parameter is the distribution of the parameter value if repeated random samples are obtained. True False 110 More on the Standard Deviation of X The standard deviation of the sample mean is given by: s.d.( x ) = n This quantity would give us an idea about how far apart a sample mean x and the true population mean are expected to be on average. We can interpret the standard deviation of the sample mean as approximately the average distance of the possible sample mean values (for repeated samples of the same size n) from the true population mean From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission. Note: If the sample size increases, the standard deviation decreases, which says the possible sample mean values will be closer to the true population mean (on average). The s.d.( x ) is a measure of the accuracy of the process of using a sample mean to estimate the population mean. This quantity does not tell us exactly how far away a particular observed n x value is from In practice, the population standard deviation is rarely known, so the sample standard deviation s is used. As with proportions, when making this substitution we call the result the standard error of the mean s.e.( x ) = s . This terminology makes sense, because this is a n measure of how much, on average, the sample mean is in error as an estimate of the population mean. Standard error of the sample mean is given by: s.e.( x ) = s n This quantity is an estimate of the standard deviation of x . So we can interpret the standard error of the sample mean as estimating, approximately, the average distance of the possible x values (for repeated samples of the same size n) from the true population mean From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission. Moreover, we can use this standard error to create a range of values that we are very confident will contain the true population mean namely, x (few)s.e.( x ). This is the basis for confidence interval for the population mean discussed more in the Chapter 11. 111 9.9 Preparing for Statistical Inference: Standardized Statistics In our ACT Scores and Actual Flight Times examples earlier, we have already constructed and used a standardized z-statistic for a sample mean. x z = has (approximately) a standard normal distribution N(0,1). n Dilemma = _____________________________________ If we replace the population standard deviation with the sample standard deviation s, then x won’t be approximately N(0,1); instead it has a ________________________________ s n Student’s t-Distribution A little about the family of t-distributions ... They are symmetric, unimodal, centered at 0. They are flatter with heavier tails compared to the N(0,1) distribution. As the degrees of freedom (df) increases ... the t distribution approaches the N(0,1) distribution. We can still use the ideas about standard scores for a frame of reference. Tables A.2 and A.3 summarize percentiles for various t-distributions From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission. 112 We will see more on t-distributions in Chapters 11 and 13 when we do inference about population mean(s). 9.10 Generalizations beyond the Big Five The sampling distribution of a statistic is the distribution of possible values of the statistic for repeated samples of the same size from a population. So far we have discussed the sampling distribution of a sample proportion (SD Module 1), the sampling distribution of the difference between two sample proportions (SD Module 2), and the sampling distribution of the sample mean (SD Module 3). In all cases, under specified conditions the sampling distribution was approximately normal. Every statistic has a sampling distribution, but the appropriate distribution may not always be normal, or even be approximately bell-shaped. You can construct an approximate sampling distribution for any statistic by actually taking repeated samples of the same size from a population and constructing a histogram for the values of the statistic over the many samples. The web applet examined in your computer labs could simulate the sampling distribution of the median, the range, and many other statistics. Example 9.19 on pages 347-348 of your text also shows a good example. 113 Additional Notes A place to … jot down questions you may have and ask during office hours, take a few extra notes, write out an extra practice problem or summary completed in lecture, create your own short summary about this chapter. 114