Chapter 2 SAMPLING DISTRIBUTIONS Two Definitions of Random Sample Random sample from a finite population (page 381) Random sample from an infinite population (page382) Chapter 2. Sampling Distribution Random Sample from a Finite Population (page 381) Definition 11.1. Suppose we select n distinct elements from a population consisting of N elements, using a particular probability sampling method. Let X1=measure taken from the 1st element in the sample X2=measure taken from the 2nd element in the sample … Xn=measure taken from the nth element in the sample Then, (X1, X2,…,Xn) is called a random sample of size n from a finite population. Chapter 2. Sampling Distribution Remarks About a Random Sample from a Finite Population (page 381) Let (X1, X2,… Xn) be a random sample from a finite population. The Xis are random variables since probability sampling requires the use of a randomization mechanism in selecting the elements of the sample. The definition does not require the assignment of equal chances of inclusion in the sample for all the elements in the population. The definition requires that the selected elements in the sample must be distinct from each other. Chapter 2. Sampling Distribution More Remarks (page 381) If the elements in the sample were selected using SRSWOR and Xi is the measure taken from the ith selected element in the sample then (X1, X2, … Xn) is a random sample from a finite population. If the elements in the sample were selected using SRSWR and Xi is the measure taken from the ith selected element in the sample then (X1, X2, … Xn) is NOT a random sample from a finite population. However, it can be viewed as a random sample from an infinite population. Chapter 2. Sampling Distribution Random Sample from an Infinite Population (page 382) Definition 11.2. Let X1=measure taken from the 1st element in the sample X2=measure taken from the 2nd element in the sample … Xn=measure taken from the nth element in the sample Then (X1, X2,…,Xn) is called a random sample of size n from an infinite population if the values of X1, X2,…,Xn are n independent observations generated from the same cumulative distribution function (CDF), F(.). This common CDF or its corresponding probability mass/density function, f(.), is called the parent population or the distribution of the population. This definition is equivalent to saying that a random sample from an infinite population is a sample generated by a series of n independent trials that are performed under identical conditions. This is because the CDFs of the Xis must all be the same. Chapter 2. Sampling Distribution Remarks About a Random Sample from an Infinite Population (page 382) Many of the procedures in this course will require that (X1, X2,…, Xn) is a random sample from a normal population. Even if the collection of all elements under consideration is a finite collection, sampling can still be seen from a viewpoint of sampling from an infinite population when we replace the selected elements in the sample back to the population. In many cases, the distinction of sampling from a finite and an infinite population will be irrelevant when the sampling fraction, n/N, is close to zero so that inferences on finite and infinite populations will yield essentially the same results. Chapter 2. Sampling Distribution Example 10.34 (pages 328) Go back to our small barangay in Example 10.34. This barangay consists of 6 qualified voters: A1, A2, A3, A4, A5, and A6. Renzo and Sandro are 2 candidates vying for the same position. A1, A2, A3 and A4 have already decided to vote for Renzo while A5 and A6 will vote for Sandro. Suppose we select a sample of size 2 using SRSWR. We have a finite population where N=6. For i=1,2,3,4,5,6, define: Xi 1 if i th voter in the population elects Re nzo 0 if i th voter in the population elects Sandro Population Data={1,1,1,1,0,0} This time, let us define for i=1,2 Xi 1 if i th voter in the sample elects Re nzo 0 if i th voter in the sample elects Sandro Note that (X1,X2) is a random sample from an infinite population where the common PMF of the discrete random variables, X1 and X2 are as follows: x f(x) 0 2/6 1 4/6 Chapter 2. Sampling Distribution New Definition of Statistic (page 183) Definition 11.3. Suppose (X1, X2,…,Xn) is a random sample. A statistic is a random variable that is a function of X1, X2,…,Xn . Example 11.1: Suppose (X1, X2,…,Xn) is a random sample. n Xi a) X i 1 is a random variable that is a function of X1, n X2,…,Xn. Thus, X is a statistic. n (Xi b) S2 X )2 is a random variable that is a function of X1, X2,…,Xn. Thus, S2 is a statistic. i 1 n 1 Chapter 2. Sampling Distribution Remarks About the Statistic (page 383) The given definition of a statistic does not contradict the definition given in Stat 114 that states that it is a summary measure computed from a sample. This time though we are requiring that the statistic is a random variable. As a random variable, the statistic is a function whose value depends on the outcome of a random experiment. In this case, the random experiment is the selection of a random sample of size n. It is impossible to predict with certainty what the realized value of the statistic will be. However, as a random variable, it has a probability distribution which can help us understand the behavior of this statistic in probabilistic terms. Chapter 2. Sampling Distribution Sampling Distribution of a Statistic (page 383) Definition 11.4 The sampling distribution of a statistic is its probability distribution. o If the statistic is a discrete random variable then its sampling distribution is its probability mass function. On the other hand, if the statistic is a continuous random variable then its sampling distribution is its probability density function. o The sampling distribution of a statistic depends on various factors such as: Sample size Method of choosing the random sample Population under study Chapter 2. Sampling Distribution Example 11.3 (page 383) Go back to our small barangay in Example 10.34. This barangay consists of 6 qualified voters: A1, A2, A3, A4, A5, and A6. Renzo and Sandro are 2 candidates vying for the same position. A1, A2, A3 and A4 have already decided to vote for Renzo while A5 and A6 will vote for Sandro. Suppose we select a sample of size 2 using SRSWOR. 2 Xi Construct the sampling distribution of X Xi i 1 2 , where we define for i=1,2 1 if i th voter in the sample elects Re nzo 0 if i th voter in the sample elects Sandro (Note that X can also be viewed in this example as a sample proportion because 2 the numerator, i 1 X i , simply counts the total number of voters in a sample of size 2 who will elect Renzo; and, we divide this by the number of voters in the sample.) Chapter 2. Sampling Distribution Example 11.3 cont’d. Physical Sample {A1,A2} {A1,A3} {A1,A4} {A1,A5} {A1,A6} {A2,A3} {A2,A4} {A2,A5} {A2,A6} {A3,A4} {A3,A5} {A3,A6} {A4,A5} {A4,A6} {A5,A6} {X1,X2} {1,1} {1,1} {1,1} {1,0} {1,0} {1,1} {1,1} {1,0} {1,0} {1,1} {1,0} {1,0} {1,0} {1,0} {0,0} x 1 1 1 1/2 1/2 1 1 1/2 1/2 1 1/2 1/2 1/2 1/2 0 Xi 1 if A1 or A2 orA3 or A4 is selected 0 if A5 or A6 is selected All of these 15 possible samples have the same chances of selection because we select the sample using SRSWOR. We then use the classical definition of probability to construct the sampling distribution of X . Sampling Distribution of X 0 1/2 1 x f( x ) 1/15 8/15 6/15 Chapter 2. Sampling Distribution Remarks: In Example 11.2 (page 384), the sample size was 3 instead of 2. Note that the sampling distribution of X is different even if the population and sample selection procedures are both the same. In Example 11.4 (page 386), the sample size was also 2 but the sample was selected using systematic sampling. Once again, the sampling distribution of X is different. Chapter 2. Sampling Distribution Standard error of a Statistic (page 387) Definition 11.5. The standard deviation of a statistic is called its standard error. Recall: The sampling error is the error attributed to the variation present among the computed values of the statistic from the different possible samples consisting of n elements (page 77). The standard error will give us an idea on the expected size of the sampling error. A small standard error indicates that the computed values of our statistic in the different samples generated are close to one another, so that even if we know that the value of a statistic varies from one sample to another, a small standard error gives us an assurance that at least the variation among their values is not too large. Chapter 2. Sampling Distribution Remarks About Theorems 11.1 and 11.2 (pages 389-390) Suppose a sample of size n is selected from a population with mean µ and variance 2. Let be the sample mean (viewed as a random variable). Theorem 11.1 Theorem 11.2 SRSWOR SRSWR/ Sampling from an infinite population µ µ Mean of Variance of Standard error of In both theorems, mean of X is the mean of the population, µ. That is E( X ) = In both theorems, the standard error of X is smaller for populations where elements are homogeneous with respect to the characteristic of interest. 2 . X is small or the In both theorems, increasing the sample size n will decrease the standard error of X . The term N n N 1 in Theorem 11.1 is called the finite population correction. This term is notably absent in the Var( X ) when we sample from an infinite population. This correction factor results in a smaller standard error under SRSWOR compared to SRSWR. This correction factor though will approach 1 when we allow N to approach infinity while n remains fixed. Thus, the standard errors in both schemes will be approximately equal to each other. Chapter 2. Sampling Distribution Theorem 11.3. Central Limit Theorem (page 393) If X is the mean of a random sample of size n from a large or infinite population with population mean and population variance 2, then the sampling distribution of X is approximately normally distributed and the mean of X is the population mean, , and the variance of X is 2/n, when n is sufficiently large. The Central Limit Theorem basically states that when the sample size is sufficiently large then we can use the normal distribution to approximate the sampling distribution of X . The CLT does not state any requirement about the distribution of the population, aside from having mean µ and variance 2. The normal approximation will hold for population distributions that are either discrete or continuous. The normal approximation will hold for population distributions that are either symmetric or skewed. We can use the approximation even for random samples from finite populations so long as N is very large. In most situations, the normal approximation will be good if n 30. If the distribution of the population is not very different from the normal distribution then the approximation will be good even if n<30. In fact, if the population is normally distributed then X will be normally distributed even for a sample of size 1. Chapter 2. Sampling Distribution Examples Example 11.8 (page 393) Exercise 1 (page 395) A random sample of size 400 is taken from a large population with mean µ=50 and variance Approximate the probability of selecting a sample that satisfies: By CLT , sin ce n a) 49.5≤ b) | X X ≤50.75 =25. 400 is l arg e then X is approximately normally distributed where mean of X is X 2 50 and variance of X is - µ| ≤0.5 a ) Find P(49.5 2 49.5 50 0.0625 P 50.75) P 2 Z X 2 3 /n P( Z /n 25 / 400 0.0625. 50.75 50 0.0625 3) P( Z 2) 0.9987 0.0228 0.9759. b) Find P( X 0.5) P( 0.5 P X 0.5) 0.5 0.0625 P( 2 Z X 2 2) Chapter 2. Sampling Distribution /n P( Z 0.5 0.0625 2) P( Z 2) 0.9772 0.0228 0.9544. Exercise 3 (page 395) Suppose the mean monthly income, µ, of the households in the exclusive subdivisions in Metro Manila is P200,000 with a standard deviation =P150,000. What is the probability of selecting a random sample of 100 families whose sample mean monthly income is larger than P250,000? Let Xi=monthly income of ith selected family in the sample, i=1,2,…,100 (X1, X2,…,X100) is a random sample from a population with mean =200,000 and standard deviation =150,000 State the problem: Find P( X 250, 000). What can be concluded using the CLT? The sampling dist ' n of X is approximately normal and the mean of X is and the standard error of X is / n 150, 000 / 100 15, 000. Solution: 250, 000 200, 000 15, 000 P( X 250, 000) P X / n P( Z 3.33) Chapter 2. Sampling Distribution 1 F (3.33) 200, 000 1 0.9996 0.0004 Assignment 5 A random sample of size 625 was selected from a large population with population mean =15 and population variance 2=20.25. Approximate the probability of selecting a sample that satisfies: 1. a) b) 2. sample mean is between 15.2736 and 15.4572 |X 0.45 An anthropologist claims that the population mean height of men of the race he is studying is 55 inches with standard deviation of 5 inches. Approximate the probability of selecting a random sample of 100 men of this race whose sample mean height is greater than 56.5 inches? Let Xi=height of the ith selected man in the sample. Chapter 2. Sampling Distribution The t-distribution (page 396) If X is a random variable that follows a t-distribution with v degrees of freedom then we write X~t(v). Just like the standard normal distribution, the t-distribution is also a bell-shaped distribution that is symmetric about 0. Its tails will also approach the x-axis without ever touching it. However, the t-distribution has a larger variance than the standard normal. But as the degrees of freedom increases, the variance of the t-distribution approaches 1 (the variance of the standard normal distribution. V=5 V=2 -3 -2 -1 0 1 Chapter 2. Sampling Distribution 2 3 t-Table, Table B.2 (page 605) If X~t(v) then: (i) P(X < -t (v)) = P(X > t (v))= (ii) t1- (v) = -t (v). ; and, Example 11.10 (page 397) Other Examples: 1. Suppose X~t(v=15) a. P(X > 2.947)= b. P(X < -2.947) = c. P(X< -1.341)= 2. t0.01(v=4) 3. t0.005(v=8) 4. t0.95(v=24) Chapter 2. Sampling Distribution The Chi-square Distribution (page 397) If X is a random variable that follows a chi-square distribution with v degrees of freedom thedfn we write X~ 2 (v). The PDF of the chi-square distribution is positive for positive real numbers only; elsewhere, its value is 0. Its mean is equal to its degrees of freedom. Its variance is twice its degrees of freedom. Thus, as the degrees of freedom increases, both the mean and variance will also increase. The PDF of the chi-square distribution is skewed to the right. Its skewness is more pronounced for smaller degrees of freedom. As the degrees of freedom increases, its distribution becomes more symmetric. V=2 V=5 V=10 V=15 Chapter 2. Sampling Distribution Chi-square Table, Table B.3 (page 606) If X~ 2(v) then: 2 (v) )= ; and, 2 (v) )= 1 - . (i) P(X > (ii) P(X < Example 11.11 (page 398) Other Examples: 1. Suppose X~ 2(v=18) a. P(X > 6.265)= b. P(X < -6.265) = c. P(X< 28.869)= 2. 3. 2 0.025 2 0.1 (v 6) (v 25) Chapter 2. Sampling Distribution The F-Distribution (page 398) If X is a random variable that follows an F-distribution with v1 numerator degrees of freedom and v2 denominator degrees of freedom, we write X~F(v1,v2 ). The PDF of the F-distribution is positive for positive real numbers only; elsewhere, its value is 0. Its graph is skewed to the right. In general, distributions with higher degrees of freedom are less skewed. If X and Y are two independent random variables such that X~ 2(v1) and Y~ 2(v2), then the random variable F X / v1 Y / v2 will follow an F-distribution with v1 numerator degrees of freedom and v2 denominator degrees of freedom. Chapter 2. Sampling Distribution F-Table, Table B.4 (page 607-612) If X~F(v1,v2) then: (i) P(X >F (v1,v2)= ; (ii) P(X <F (v1,v2)= 1- ; 1 F1 (v2 , v1 ) (iii) F (v1 , v2 ) Example 11.12 (page 399) Other Examples: 1. Suppose X~F(v1=8,v2=4) a. P(X > 5.1)= b. P(X < 9.6) = 2. F0.1(v1=3, v2=6) 3. F0.975(v1=3, v2=6) Chapter 2. Sampling Distribution Sampling from the Normal Distribution (page 400) Suppose (X1,X2,…,Xn) is a random sample satisfying the condition that Xi~Normal(µ, n ) for i=1,2,…,n. n Xi i 1 Define the statistics, X 2 n X )2 (Xi 2 as the sample mean and S i 1 n 1 as the sample variance, where n is the sample size. TABLE 11.1. Sampling Distributions of Statistics Based on a Random Sample from a Normal Distribution STATISTIC SAMPLING DISTRIBUTION X standard normal distribution PARAMETER/S mean=0 variance=1 t-distribution degrees of freedom: v=n–1 Chi-square distribution degrees of freedom: Z n T X S n X2 (n 1) S 2 2 v = n-1 Chapter 2. Sampling Distribution Examples Example 11.13 (pages 400-401) Exercise 5 (page 406) IQ is normally distributed with mean, µ=100, and standard deviation, random sample of size 100 with mean IQ larger than 105? Let Xi=IQ of ith selected student in the sample Given: Xi ~ Normal(µ=100, 2=202) (X1, X2, …, X100) is a random sample. Find P( X 105). According to table 11.1, Z P( X 105) P X / n P( Z X / n is a standard normal random variable. 105 100 20 / 100 2.5) 1 P( Z 2.5) 1 0.9938 0.0062. Chapter 2. Sampling Distribution =20. What is the probability of selecting a More Examples Example 11.14 (page 401) Exercise 6 (page 407) The length of time it takes a student in a dormitory to take a bath follows a normal distribution with mean, µ=22.5689 minutes. Suppose a random sample of 16 students was selected and its standard deviation, S=2.2. Find the probability of selecting a sample whose sample mean is more than 24 minutes. Let Xi=length of time it takes ith student in the sample to take a bath Given: Xi~Normal(µ=22.5689, 2) and (X1, X2, …, X16) is a random sample with S=2.2. Find P( X 24). According to table 11.1, P( X 24) P X S/ n P(T X follows a t distribution with v S/ n n 1 degrees of freedom. 24 22.5689 2.2 / 16 2.602) where T ~ t (v 16 1 15) 0.01. Chapter 2. Sampling Distribution More Examples IQ is normally distributed with mean, µ=100, and standard deviation, random sample of size 11 with variance greater than 819.32? =20. What is the probability of selecting a Let Xi=IQ of ith selected student in the sample Given: Xi ~ Normal(µ=100, 2=202) (X1, X2, …, X11) is a random sample. Find P(S 2 819.32). According to table 11.1, X P( S 2 819.32) P since X (n 1)S 2 2 2 (n 1) S 2 2 (11 1)(819.32) 202 2 2 is a (n 1) S 2 2 is a 2 random variable with (n 1) degrees of freedom. P( X 2 20.483) 0.025 random variable with n 1 10 degrees of freedom. Chapter 2. Sampling Distribution Assignment 6 1. 2. 3. The scores in the Stanford-Binet IQ test are known to be normally distributed with population mean 100 and population standard deviation 16. Suppose a random sample of size 9 will be selected. What is the probability of selecting a sample whose mean is between 96 and 112? What is the probability of selecting a sample whose variance is less than 642.88? Suppose the population standard deviation is unknown. What is the probability of selecting a sample whose mean is greater than 111.16 if the standard deviation of the sample is 18? Chapter 2. Sampling Distribution