Ch 5.5 + Ch 5.6 Sampling Distribution Topics: I. What is a Sampling Distribution? II. Sampling Distribution of a Sample Mean X (a) X ~ Normal Distribution (b) X ~ Non-normal Distribution III. Central Limit Theorem IV. Sampling Distribution of the Sample Proportion p ---------------------------------------------------------------------------------------------------------------------------I. Sampling Distribution Population vs. Sample: Population (or process) = The object of interest (for which we would like to make inference Due to limited resource and time, it is usually impossible to know every aspect of the population Instead, we obtain a (random) sample from the population Use the information from the sample to make inference about the population For example, we naturally use sample mean X to estimate the population mean ( X ) Since X was obtained from a sample, we are not guaranteed to get the same value for X if we conduct the same experiment (to obtain the data) again. So X can be viewed as a random variable, thus has a distribution. This distribution is called the sampling distribution of X . A Statistic is a numerical quantity that is calculated from the sample (for example, the sample mean X is a statistic) A Parameter is a population characteristic (fixed) such as the success probability in a Binomial distribution The observed value of statistic depends on the particular sample; hence it varies from sample to sample. Such variability is called sampling variability The probability distribution of the statistics is called its sampling distribution Why do we care about the sampling distribution? The sampling distribution of a statistic tells us what values a statistic is likely to take; we can use the sampling distribution to make inference about the population parameter. 1 Ex1. A neighborhood has 5 houses A, B, C, D and E. They respectively have 3, 2, 5, 3, and 4 bedrooms. We randomly draw 3 houses at a time and calculate the sample statistics median and mean of bedrooms. What is the sampling distribution of the sample median? What is the sampling distribution of the sample mean? Houses drawn in the sample # of bedrooms Sample median Sample mean Probability ABC 3,2,5 3 10/3=3.3 0.1 ABD 3,2,3 3 8/3=2.7 0.1 ABE 3,2,4 3 9/3 = 3 0.1 ACD 3,5,3 3 11/3 = 3.7 0.1 ACE 3,5,4 4 4 0.1 ADE 3,3,4 3 10/3 = 3.3 0.1 BCD 2,5,3 3 3.3 0.1 BCE 2,5,4 4 11/3=3.7 0.1 BDE 2,3,4 3 9/3 = 3 0.1 CDE 5,3,4 4 4 0.1 The sample median takes 2 values: 3 & 4 with probabilities 0.7 and 0.3. So the probability mass function for sample median is Sample meadian 3 4 Probability 0.7 0.3 Similarly, the probability mass function for sample mean is Sample mean 2.7 3 3.3 3.7 4 probability 0.1 0.2 0.3 0.2 0.2 2 II. Sampling Distribution of a Sample Mean X Let X be the sample mean of a random sample X1 , X 2 ,..., X n from a population with mean and SD . (That is, X X1 X1 X n .) We want to know the sampling distribution of n X. If X ~ Normal (mean= , SD= ). Then X , the mean of a random sample of n observations follows a Normal distribution with mean X = X , and X = X and standard deviation X . n is also called standard error (SE) of X , or Standard error of the mean Ex 2. Thousands of boxes contain nuts. The weights are normally distributed with mean =1 lb and SD =0.01 lb. We inspect 4 boxes and get their weights X 1 , X 2 , X 3 , X 4 . The sample mean is X X1 X 2 X 3 X 4 4 (a) What is the sampling distribution of X ? Mean and SE of X ? N ( X 1, 4 0.01 0.005) 2 (b) What is the probability that X lies between 0.99 and 1.01 lb? P[0.99 X 1.01] 0.9545 3 X ~ any non-normal distribution with mean= , SD= . The sampling distribution of X based on a sample of size n is (a) If n is small (i.e., < 30 ), then Distribution: is determined by the distribution of X Mean X and SE X : X , X n (b) If n is large (i.e., 30 ), then Distribution of X is approximately normal Mean X and SE X : X , X n These results follow from Central Limit Theorem (CLT) III. Central Limit Theorem Assume X follows an arbitrary distribution with mean and SD . When sample size is sufficiently large (i.e., n 30), the sample distribution of X always follows normal distribution with mean and SE n Usually the less symmetric a distribution is, the larger the sample size will need to ensure normality of X 4 Ex3. Let X be the number of major defects for each new automobile tested. Suppose the number of such defects for a certain model has some distribution with mean =3.2 and SD =2.4. A sample of 100 new cars is collected. (a) What is the sampling distribution of X based on samples of size 100? What is its center and what is the SE of X ? N ( X 3.2, X 2.4 2.4 0.24) n 100 (b) What is the probability that the sample average number of major defects exceeds 4? P[ X 4] 0 Comments: If X is the sample mean of a random sample X1 , X 2 ,..., X n from a population with mean = and SD = , then regardless of the sample size n and the distribution of X, X , X n The variation of sample means is (always) than variation of the original dataAs sample size n increases, x (the SE of X ) decreases, and the shape of the sampling distribution becomes more and more bell shaped and the mass is more and more concentrated around mean . This implies higher probability around . 5 Ex4. The heights of college age students (denoted by X) are known to have mean =115 and SD =30. (a) Assume that we were told that the heights of college age students are normally distributed. What is the sampling distribution of X based on samples of 9 college age students? What are the mean and SE? N ( X 115, X 30 / 9 10) (b) What is the sampling distribution of X based on samples of 9 college age students without the assumption that the heights have a normal distribution? What are the mean and SE of the sampling distribution of X ? The distribution is determined by the height distribution. But X 115, X 30 / 9 10 (c) What is the sampling distribution of X , the average height of 36 college age students? What are the mean and SE of the sampling distribution of X ? N ( X 115, X 30 / 36 5) 6 IV. Sampling Distribution of a Sample Proportion p Ex. Consider a basket containing 100 balls with 2 colors: Red and White. The proportion of Red balls is denoted by (and is not known). Assume 30 balls were randomly picked from the basket with replacement, and 14 balls out of the 30 balls were red. (1) In the sample, what is the proportion of red balls? p 14 30 (2) We refer such quantity, 14 / 30, as sample proportion and denote it by p . Our question of interest: what is the distribution of the sample proportion p ? Thoughts: we can think a r.v such that X = 1 if “red” and X=0 if “not red”. Then p can be viewed as the mean of 30 X’s In general, p from a sample of size n can be written as X 1 X 2 ... X n n Thus by CLT , p ~ Normal if n large. (However, different criteria for “large n ” are needed here.) Sampling Distribution of p (a) If _large n ( n 30, n 5, n(1 ) 5 ), then the sample proportion p has A ___normal distribution___ ( by CLT) Mean (denoted by p ) , and SE (denoted by p ) 1 n (b) If _small n (at least one of n 30, n 5, n(1 ) 5 does not satisfy), then the sample proportion p has An unknown (discrete) distribution Mean (denoted by p ) , and SE (denoted by p ) = 1 n 7 Ex5. In the population, the proportion of defectives =12%. (a) What is the sampling distribution of p based on 100 observations? What is the mean? What is the standard error? Approximate normal with mean=0.12, SD= 0.12 (1 0.12) 0.03 100 (b) What is the probability that p <0.10? P[ p 0.1] P[ p 0.12 0.1 0.12 0.67] 0.25 0.03 0.03 8