2-Estimation and Inf..

CHAPTER 2 ESTIMATION AND INFERENCE 1. 2. Sample Statistic The Sampling Distribution of the Sample Mean 2.1. Expected Value and Variance of 𝑥̅ The Normal Sampling Distribution of 𝑥̅ 3.1. The Margin of Sampling Error Properties of Estimators 4.1. Unbiased Estimators 4.1.1. Proof that 𝑠² is an unbiased estimator of the population variance σ² 4.2. Efficient Estimators Confidence Interval (Interval Estimate) for the Population Mean 5.1. The 𝑡 Distribution Test of Hypothesis for µ 6.1. The probability value 3. 4. 5. 6. 1. Sample Statistic In the previous discussions of random variables, both discrete and continuous, we have assumed that we have exact information about the probability distribution or probability density function of the random variable. In particular, we have assumed that we have the exact knowledge about the population parameters, namely, the mean (expected value) µ and the variance σ². In practice, other than for random variables whose values can be determined through random experiments that can be repeated under identical conditions, we do not know the exact probability distribution or density function of a random variable. Therefore, we do not have an exact knowledge of the population parameters. The next best alternative to the full knowledge of the population parameters is to estimate their values based on data obtained through a random sample. The estimators of the two population parameters µ and σ² are, respectively, 𝑥̅ (the sample mean) and 𝑠² (the sample variance), where, 𝑥̅ = ∑𝑥 𝑛 and 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛−1 These estimators 𝑥̅ and 𝑠² each is a sample statistic. The specific value obtained from the sample data for 𝑥̅ and 𝑠² are called estimates. 2. The Sampling Distribution of the Sample Mean To obtain an estimate of the population mean we take a single random sample of size 𝑛 from the population. From the sample data we compute the sample mean as an estimate of µ. The value of the sample mean 𝑥̅ depends upon the random sample selected. Since this value is not known until we take the random sample, then 𝑥̅ is a random variable. Since 𝑥̅ is a random variable, then it has a probability distribution. The probability distribution of 𝑥̅ is called the sampling distribution of 𝑥̅ . To explain the sampling distribution, consider the following simple example. 2-Estimation and Inference 1 of 21 Suppose we have a population consisting of 𝑁 = 5 elements with the following associated values represented by 𝑥: Population Element A B C D E 𝑥 15 12 9 6 3 First compute the mean and variance of the population: µ=9 and σ2 = ∑(𝑥 − µ)2 ⁄𝑁 = 18 Next write each of the 𝑥 values in the population on a ball and put them in a bowl. Now select a sample of size 𝑛 = 3 without replacement and compute the sample mean. Even though we are selecting only a sample of size 3, this sample is one of the 10 possible samples that can be selected. These possible samples are listed below along with the mean corresponding to each sample. Sample Elements A B C A B D A B E A C D A C E A D E B C D B C E B D E C D E Sample Data 𝑥𝑖 15, 12, 9 15, 12, 6 15, 12, 3 15, 9, 6 15, 9, 3 15, 6, 3 12, 9, 6 12, 9, 3 12, 6, 3 9, 6, 3 Sample Mean 𝑥̅ 12 11 10 10 9 8 9 8 7 6 The following table shows the relative frequency distribution of the sample means. The relative frequency shows that there is a probability associated with each value of the sample mean. This probability distribution is called the sampling distribution of the random variable x̄ . 𝑥̅ 6 7 8 9 10 11 12 𝑓(𝑥̅ ) 0.1 0.1 0.2 0.2 0.2 0.1 0.1 The sampling distribution of 𝑥̅ above implies that when the above experiment is conducted many times, 20% of such samples would yield a sample mean of, say, 9, or 10 percent of the time we will get a sample mean of, say, 6. As the above distribution shows, each sample mean value has its own probability of occurring. 2-Estimation and Inference 2 of 21 ̅, or the Mean of the Means 2.1. The Expected Value of 𝒙 In the previous chapter, you learned that the expected value of a random variable is the mean value of that random variable, obtained as the weighted mean of the values of the random variable, where the weights are the probability associated with each value. E(𝑥) = ∑𝑥𝑓(𝑥) Now the random variable of interest is the sample mean 𝑥̅ . Thus, the expected value of 𝑥̅ is the “mean of 𝑥̅ ”. E(𝑥̅ ) = ∑𝑥̅ 𝑓(𝑥̅ ) The following table shows the calculation of the mean or expected value of 𝑥̅ . 𝑥̅ 6 7 8 9 10 11 12 𝑓(𝑥̅ ) 0.1 0.1 0.2 0.2 0.2 0.1 0.1 𝑥̅ 𝑓 (𝑥̅ ) 0.6 0.7 1.6 1.8 2.0 1.1 1.2 9.0 E(𝑥̅ ) = µ𝑥̅ = ∑𝑥̅ 𝑓(𝑥̅ ) Note that µ𝑥̅ = µ. This is a very important result. This relationship between the mean of the sample means, E(𝑥̅ ), and the population mean, µ, is the cornerstone of inferential statistics. The relationship can simply be stated as: “The mean of the means equals the mean.” That is, the expected value of the sample means is equal to the mean of the parent population: E(𝑥̅ ) = E ( ∑𝑥 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = E( ) 𝑛 𝑛 E(𝑥̅ ) = 1 E(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) 𝑛 E(𝑥̅ ) = 1 [E(𝑥1 ) + E(𝑥2 ) + ⋯ + E(𝑥𝑛 )] 𝑛 E(𝑥̅ ) = 1 1 (µ + µ + ⋯ + µ) = 𝑛µ = µ 𝑛 𝑛 Note that since 𝑥𝑖 are randomly selected from the same population, then the expected value of each 𝑥𝑖 is the population mean µ. 2.2. Variance of the Mean Using the expression var(𝑥̅ ) to denote the variance of 𝑥̅ , compute var(𝑥̅ ). Remember that the variance of a random variable is the expected value (the mean) of the squared deviations. Thus, var(𝑥̅ ) = E[(𝑥̅ − µ)2 ] = ∑(𝑥̅ − µ)2 𝑓(𝑥̅ ) 2-Estimation and Inference 3 of 21 𝑥̅ 6 7 8 9 10 11 12 𝑓(𝑥̅ ) 0.1 0.1 0.2 0.2 0.2 0.1 0.1 (𝑥̅ − µ)2 9 4 1 0 1 4 9 (𝑥̅ − µ)2 𝑓(𝑥̅ ) 0.9 0.4 0.2 0.0 0.2 0.4 0.9 3.0 var(𝑥̅ ) = ∑(𝑥̅ − µ)2 𝑓(𝑥̅ ) = 3 The variance of 𝑥̅ is not equal to the variance of 𝑥 (the variance of the parent population), that is, var(𝑥̅ ) ≠ σ2 . However, there is a definite relationship between the two variances, as shown by the following formula: var(𝑥̅ ) = σ2 𝑁 − 𝑛 ( ) 𝑛 𝑁−1 var(𝑥̅ ) = 18 5 − 3 ( )=3 3 5−1 The term ( 𝑁−𝑛 𝑁−1 ) in the formula is called the finite population correction factor. This term disappears for non- finite populations. Thus, var(𝑥̅ ) = σ2 𝑛 The proof of this relationship follows: var(𝑥̅ ) = var ( ∑𝑥 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = var ( ) 𝑛 𝑛 var(𝑥̅ ) = 1 var(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) 𝑛2 var(𝑥̅ ) = 1 [var(𝑥1 ) + var(𝑥2 ) + ⋯ + var(𝑥𝑛 )] 𝑛2 var(𝑥̅ ) = 1 2 (σ + σ2 + ⋯ + σ2 ) 𝑛2 var(𝑥̅ ) = 1 σ2 2 𝑛σ = 𝑛2 𝑛 Note that since 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 are randomly selected from the same population, then var(𝑥1 ) = var(𝑥2 ) = ⋯ = var(𝑥𝑛 ) = σ2 The square root of the variance of 𝑥̅ is the standard error of the mean, denoted by se(𝑥̅ ). 𝑠𝑒(𝑥̅ ) = σ √𝑛 2-Estimation and Inference 4 of 21 ̅ 3. The Normal Sampling Distribution of 𝒙 Note that the number of samples of size n quickly becomes astronomical. For example, the number of possible samples of size 𝑛 = 40 selected from a population of size 𝑁 = 1,000 is: 555,974,423,571,664,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 Since each sample yields its own 𝑥̅ value, then the random variable 𝑥̅ takes on infinite values, making 𝑥̅ a continuous random variable with a smooth probability density function. In fact, in most cases, the sampling distribution of 𝑥̅ is normal or approximately normal. If the parent population from which the sample is taken is normal, then the sampling distribution of 𝑥̅ is also normal. When the parent population distribution is normal with mean μ and standard deviation σ, ... σ Parent population distribution x ... the sampling distribution of is also normal with mean μ and standard error Sampling distribution of x̄ μ If the parent population is not normal, the sampling distribution of x̄ approaches normal, per central limit theorem, as we increase the sample size n. The minimum sample size to have an approximate normal sampling distribution of 𝑥̅ is 𝑛 = 30. 2-Estimation and Inference 5 of 21 When the parent population distribution is NOT normal, ... x ... the sampling distribution of is approximatedly normal with mean μ and standard error , if n ≥ 30. μ 3.1. Central Limit Theorem In applied statistical analysis many of the random variables used can be characterized as the sum of a large number of independent random variables. For example, total daily sales in a store are the result of a number of sales to individual customers—each of which can be modeled as a random variable. Total investment in the United States in a month is the sum of individual investments by many independent firms. Thus, if 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 represents the result of individual random events, the observed random variable 𝑥 is the sum these random variables: 𝑥 = ∑𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 Using the properties of expected value shown in the previous chapter, E(𝑥) = E(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = 𝑛µ var(𝑥) = var(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = var(𝑥1 ) + var(𝑥2 ) + ⋯ + var(𝑥𝑛 ) = 𝑛σ2 The CLT states that the resulting sum, 𝑥 = ∑𝑥𝑖 , is normally distributed with mean 𝑛µ and standard deviation 𝑛σ2 . 𝑥~𝑁(𝑛µ, √𝑛𝜎) Therefore, 2-Estimation and Inference 6 of 21 𝑧= 𝑥 − 𝑛µ √𝑛σ = ∑𝑥𝑖 − 𝑛µ √𝑛σ is a standard normal random variable. If we divide the numerator and the denominator on the right hand side by 𝑛, we have: 𝑧= 𝑥̅ − µ σ⁄√𝑛 This implies that 𝑥̅ ~ N(µ, σ⁄√𝑛) Example 1 The speed on a certain stretch of an interstate highway is normally distribution with a mean of µ = 80 mph with a standard deviation of σ = 5 mph. a) If a vehicle is randomly clocked, what is the probability that the speed is below 82 mph? 𝑃(𝑥 < 82) = ________ 𝑧= 𝑥 − µ 82 − 80 = = 0.40 𝜎 5 P(𝑧 < 0.40) = 0.6554 Alternatively, using the Excel NORM.DIST(...) command, =NORM.DIST(82,80,5,1) = 0.6554 0.6554 x = 82 μ = 80 x b) If a random sample of 𝑛 = 16 vehicles is clocked, what is the probability that the average sample speed is below 82 mph? P(𝑥̅ < 82) = ________ Now you have to use the sampling distribution of 𝑥̅ to solve this problem. 𝑥̅ ~ N(µ, 𝑠𝑒(𝑥̅ ) = σ⁄√𝑛) se(𝑥̅ ) = 5⁄√16 = 1.25 2-Estimation and Inference 7 of 21 𝑧= 𝑥̅ − µ 82 − 80 = = 1.60 se(𝑥̅ ) 1.25 P(𝑧 < 1.60) = 0.9772 =NORM.DIST(82,80,1.25,1)= 0.9452 0.9452 μ = 80 x̅ = 82 x̅ 3.2. The Margin of Sampling Error Note that, like any random variable, the random variable 𝑥̅ consists of a fixed component and a random component. The fixed component is the mean or expected value of 𝑥̅ , which is the population mean µ, and the random component is denoted by ε. 𝑥̅ = µ + ϵ Using 𝑧 = 𝑥̅ −µ se(𝑥̅ ) , and solving for 𝑥̅ , we have, 𝑥̅ = µ + 𝑧 ∙ se(𝑥̅ ) The random component 𝑥̅ , then, can be presented as, ϵ = 𝑧 ∙ se(𝑥̅ ) The random component, called the margin of statistical (sampling) error, is a function of 𝑧. Using this relationship between 𝜖 and 𝑧 we can determine intervals within which the sample mean will fall with an associated probability. For example, suppose we want to find the lower and upper ends of a middle interval (symmetric about the mean) that contains 95% of all the possible sample means. Of the remaining 5% of the sample means, then 2.5% would exceed the upper boundary value and the other 2.5% would be below the lower boundary. This "5%" represents the probability that the 𝑥̅ falls outside the "95% interval" and it is called the error probability, represented by the symbol α. Generally, given an α value, then 1 − 𝛼 represent the probability that the interval would contain the sample mean. Thus, 𝑥̅𝐿 and 𝑥̅𝑈 are the boundaries of the interval that contain 1 − α proportion of all possible sample means. Of the remaining 𝑥̅ values, 𝛼⁄2 each fall on the right tail (to the right of 𝑥̅𝑈 ) and the left tail (to the left of 𝑥̅𝐿 ) of the distribution. 2-Estimation and Inference 8 of 21 1−α 𝛼⁄2 𝑥̅𝐿 = 𝜇 − 𝑧𝛼⁄2 se(𝑥̅ ) 𝛼⁄2 μ 𝑥̅𝑈 = 𝜇 + 𝑧𝛼⁄2 se(𝑥̅ ) x̅ The diagram is a graphic representation of the following probability statement: P(𝑥̅𝐿 < 𝑥̅ < 𝑥̅𝑈 ) = 1 − 𝛼 P(µ − ϵ < 𝑥̅ < µ + ϵ) = P(µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼 The 𝑧 score that bounds a tail area of α⁄2 under the standard normal curve is 𝑧α⁄2 . Thus, the margin of error (MOE) formula is generally written as: ϵ = 𝑧α⁄2 se(𝑥̅ ) and, P(µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼 Example 2 The speed on a certain stretch of an interstate highway is normally distribution with a mean of 80 mph with a standard deviation of 5 mph. A random sample of 𝑛 = 64 vehicles is clocked. Find the 95% margin of error for the sample mean. In other words, find the middle interval of 𝑥̅ values which contains 95% of all possible sample means for samples of size 𝑛 = 64. 1 − α = 0.95 α⁄2 = 0.025 𝑧α⁄2 = 𝑧0.025 = 1.96 se(𝑥̅ ) = σ⁄√𝑛 = 5⁄√64 = 0.625 ϵ = 𝑧α⁄2 se(𝑥̅ ) = 1.96(0.625) = 1.225 𝑥̅𝐿 = 80 − 1.225 = 78.775 𝑥̅𝑈 = 80 + 1.225 = 81.225 2-Estimation and Inference 9 of 21 0.9500 78.775 μ = 80 81.225 x̅ 4. Properties of Estimators 4.1. Unbiased Estimators Since estimators are random variables with infinite number of values, the probability that a single estimate will equal the population parameter is practically zero. Thus there will always be a deviation between the estimate and the parameter. If the parameter of interest is the population mean µ, then the deviation between the sample mean x̄ and µ is: 𝑥̅ − µ = ϵ Although this deviation will never be zero for any single estimate, in repeated sampling it is desirable that the mean or expected value of the deviation be zero, that is, the deviation above and below µ cancel each other out: E(ϵ) = 0. If this equality holds in the long run, then E(ϵ) = E(𝑥̅ − µ) = E(𝑥̅ ) − µ = 0 Thus, E(𝑥̅ ) = µ If deviations average to zero, then the expected value of 𝑥̅ is equal to the mean of the population. If this is true, then 𝑥̅ is said to be an unbiased estimator of the population mean. The proof that E(𝑥̅ ) = µ was shown above in the discussion of the sampling distribution of 𝑥̅ . 4.1.1. Proof that 𝒔² is an unbiased estimator of the population variance 𝛔² We learned that to compute the variance of the sample you use the formula, 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛−1 The variance is the mean squared deviation of the data from the sample mean. In computing of the mean squared deviation for the sample data, why do we divide the sum of squared deviations by 𝑛 − 1 and not by 𝑛? This has to do with the fact that when computing 𝑠² we are finding the deviations of the random variable 𝑥 from another random variable, that is, 𝑥̅ . Thus, for a sample of size n, the number of random squared deviations is reduced by 1. To explain, suppose you randomly select three items (𝑛 = 3) from a population and obtain the following data points: 3, 9, 12. The mean of this sample, another random number, is 𝑥̅ = 8. 2-Estimation and Inference 10 of 21 Given this mean, the first two squared deviations are (3 − 8)2 = 25 and (9 − 8)2 = 1. These are the only two random squared deviations. The third squared deviation, (12 − 8)2 = 16, is no longer random because when the mean is 8, the third number must be 12. Thus, you lose one “degree of freedom”. 1 To be an unbiased mean, the mean of the squared deviations is then obtained by using 𝑛 − 1 = 2 degrees of freedom in the denominator. If we divide the sum of squared deviation by 𝑛, then the sample variance would be smaller and thus it would underestimate the population variance. In other words, 𝑠² would be a biased estimator of the population variance. The following shows that using 𝑛 in the denominator of the sample variance would make it a biased estimator, and when divided by 𝑛 − 1 the bias disappears. For 𝑠 2 to be an unbiased estimator of σ² the following must hold. E(𝑠 2 ) = σ² In the following proof, it will be shown that if in the sample variance formula the sum of squared deviations is divided by n, 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛 then E(𝑠 2 ) = 𝑛−1 2 σ < σ2 𝑛 That is, the expected value of the sample variance would be less than the population variance, imparting a downward bias to the estimator. Therefore, dividing the sum of deviation squares of 𝑥, ∑(𝑥 − 𝑥̅ )2 , by 𝑛 would make the resulting sample variance a biased estimator of the population variance. Now the proof: E(𝑠 2 ) = E [ E(𝑠 2 ) = ∑(𝑥 − 𝑥̅ )2 𝑛 ] 1 E[∑(𝑥 − 𝑥̅ )2 ] 𝑛 Rewrite the sum of squared deviations within the brackets by adding and subtracting µ, as follows: ∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − 𝑥̅ + µ − µ)2 ∑(𝑥 − 𝑥̅ )2 = ∑[(𝑥 − µ) − (𝑥̅ − µ)]2 1 Note that, 1 𝑥̅ = ∑𝑥 𝑛 𝑥̅ = 1 (𝑥 + 𝑥2 + ⋯ + 𝑥𝑛 ) 𝑛 1 Thus, 𝑥𝑛 = 𝑛𝑥̅ − (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛−1 ) This shows that any one of the 𝑛 observations in a sample can be written as a linear combination of 𝑥̅ and the remaining 𝑛 − 1 observations. Therefore, in computing the average of squared deviations as the sample variance, there are 𝑛 − 1 independent squared deviations 2-Estimation and Inference 11 of 21 ∑(𝑥 − 𝑥̅ )2 = ∑[(𝑥 − µ)2 − 2(𝑥 − µ)(𝑥̅ − µ) + (𝑥̅ − µ)2 ] ∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 2(𝑥̅ − µ)∑(𝑥 − µ) + 𝑛(𝑥̅ − µ)2 ∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 2𝑛(𝑥̅ − µ)2 + 𝑛(𝑥̅ − µ)2 Note: ∑(𝑥 − µ) = ∑𝑥 − 𝑛µ = 𝑛𝑥̅ − 𝑛µ = 𝑛(𝑥̅ − µ) ∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 𝑛(𝑥̅ − µ)2 Now we can write E(𝑠 2 ) = 1 E[∑(𝑥 − µ)2 − 𝑛(𝑥̅ − µ)2 ] 𝑛 E(𝑠 2 ) = 1 E[∑(𝑥 − µ)2 ] − E[(𝑥̅ − µ)2 ] 𝑛 E(𝑠 2 ) = 1 E[(𝑥1 − µ)2 + (𝑥2 − µ)2 + ⋯ + (𝑥𝑛 − µ)2 ] − E[(𝑥̅ − µ)2 ] 𝑛 E(𝑠 2 ) = 1 {E[(𝑥1 − µ)2 ] + E[(𝑥2 − µ)2 ] + ⋯ + E[(𝑥𝑛 − µ)2 ]} − E[(𝑥̅ − µ)2 ] 𝑛 E(𝑠 2 ) = 1 2 (σ + σ22 + ⋯ + σ2𝑛 ) − var(𝑥̅ ) 𝑛 1 E(𝑠 2 ) = 1 σ2 (𝑛σ2 ) − 𝑛 𝑛 Since 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 are random selections from the same population, then, σ12 = σ22 = ⋯ = σ2𝑛 = σ2 . Also, the variance of 𝑥̅ is: var(𝑥̅ ) = σ2 ⁄𝑛. Thus, E(𝑠 2 ) = σ2 − σ2 𝑛 − 1 2 = σ 𝑛 𝑛 which is what we set out to prove. For a sample statistic to be an unbiased estimator of the population parameter, the expected value of that sample statistic must equal to the population parameter. Therefore, when the sample variance is calculated as 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛 this variance would be a biased estimator of the population variance σ². If, however, we use 𝑛 − 1 in the denominator of the sample variance formula, 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛−1 the end result of same process would instead be E(𝑠 2 ) = 1 𝑛 σ2 (nσ2 ) − ( ) = σ2 𝑛−1 𝑛−1 𝑛 2-Estimation and Inference 12 of 21 Thus, when E(𝑠 2 ) = σ², 𝑠² is said to be an unbiased estimator of σ². 4.2. Efficient Estimators In many situations different unbiased estimators of the population parameter can be obtained. However, the estimator with the smallest variance is clearly preferred since it would provide us with the smallest possible margin of error in the estimation process. The smaller the variance, the more closely clustered the values of the sample statistic (the estimator) around the population parameter. The unbiased estimator with the smallest variance is called the most efficient estimator. 5. Confidence Interval (Interval Estimate) for the Population Mean In the previous discussions, to establish the theory of sampling distribution, we assumed that the population mean µ and standard deviation σ are known. It was just explained that for samples of size 𝑛, 1 − α proportion of all sample means fall within the margin of error of 𝜖 = 𝑧α⁄2 σ⁄√𝑛 from the population mean. In practice, the whole purpose of inferential statistic is to find an estimate of the unknown population parameter. Obtaining a single sample will provide a point estimate of the population parameter. But this point estimate gives us very little information about the precision of our estimate. We know that the number of samples that could be selected, and hence the number of sample means calculated, is infinite, and these 𝑥̅ values are normally distributed about the population mean. Therefore, a point estimate would not tell us how close the mean computed from a single sample is to the population mean. An interval estimate provides a range of values, which allows us to state with a known level of confidence that the population mean falls within that interval. This interval estimate for the population mean is obtained as follows. It was explained above that, 𝑃 (µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼 Take the inequality statement (the interval) within the parentheses and rewrite it as: 𝑥̅ − 𝑧α⁄2 se(𝑥̅ ) < µ < 𝑥̅ + 𝑧α⁄2 se(𝑥̅ ) The above inequality shows that µ falls within 1 − 𝛼 of all possible intervals built around the means of all random samples: 𝑥̅ ± 𝑧α⁄2 se(𝑥̅ ). Therefore, if we select one sample of size n and build a single interval 𝑥̅ ± 𝑧α⁄2 se(𝑥̅ ), we are 1 − 𝛼 percent confident that this interval contains the population mean. Thus, the confidence interval for the population mean, with the lower end 𝐿 and the upper end 𝑈 is: 𝐿, 𝑈 = 𝑥̅ ± 𝑧α⁄2 se(𝑥̅ ) 5.1. The t Distribution So far, the theory of confidence intervals has been explained using the population standard deviation in the margin of error formula: 𝜖 = 𝑧α⁄2 se(𝑥̅ ) = 𝑧α⁄2 σ √𝑛 2-Estimation and Inference 13 of 21 In practice, obviously, σ is also an unknown population parameter and must be estimated using the sample data. The estimator of the population parameter σ is the sample statistic 𝑠, the sample standard deviation, 𝑠2 = √ ∑(𝑥 − 𝑥̅ )2 𝑛−1 Therefore, in the margin of error formula the standard error of 𝑥̅ becomes an estimated value obtained using: se(𝑥̅ ) = s √𝑛 When 𝑠 is used in place of σ a peculiar thing happens to the shape of the sampling distribution of 𝑥̅ . The sampling distribution is still bell shaped, but the area under the curve for a given interval of 𝑥̅ values is not the same as when the known σ is used. To illustrate, consider the following example: First, suppose the mean of a normally distributed population is µ = 100 and the standard deviation is σ = 20. The proportion of 𝑥̅ values for samples of size 𝑛 = 16 taken from this population that fall between, say, 90.2 and 109.8 are determined as follows P(90.2 < 𝑥̅ < 109.8) se(𝑥̅ ) = σ⁄√𝑛 = 20⁄√16 = 5 𝑧 = (𝑥̅ − µ)⁄se(𝑥̅ ) = ±1.96 P(−1.96 < 𝑧 < 1.96) = 0.95 Now, instead of using σ, let the standard deviation 20 be as if determined from a sample. That is, let 𝑠 = 20. Hence, se(𝑥̅ ) = s⁄√𝑛 = 20⁄√16 = 5 Here, when we attempt to transform 𝑥̅ to 𝑧 using the formula 𝑥̅ − µ 𝑠 ⁄√ 𝑛 a problem arises. The new random variable obtained through this transformation no longer has a 𝑧 distribution (with mean 0 and standard deviation 1). This problem was observed by William S. Gosset (1867-1937), a British chemist/statistician, in a paper published in 1908. Gosset showed that, when sample size is small, the standard normal table z does not provide the accurate area under the curve for the scores obtained from the conversion formula (𝑥̅ − µ)⁄se(𝑥̅ ). In the above example, if (𝑥̅ − µ)⁄se(𝑥̅ ) = ±1.96, the area under the curve is bounded by the two scores ±1.96 is no longer 0.95. Gosset developed an alternative table to obtain the more accurate areas or probability values for the scores thus calculated. The new table of probabilities he provided is now called the t table. And the random variable obtained from this transformation is said to have a t distribution, where, 𝑡= 𝑥̅ − µ 𝑠 ⁄√ 𝑛 2-Estimation and Inference 14 of 21 The difference between the z and t distributions is shown in the following diagram. z t (df = 4) Tail area under z 0.025 Tail area under t 0.061 0 1.96 Like the z distribution, the t distribution is symmetric about the mean of 0. However, unlike z, which has a unique, unchanging shape due to its fixed standard deviation 1, the t distribution acquires different shapes depending on a parameter called degrees of freedom. In estimations involving μ, the degrees of freedom is 𝒅𝒇 = 𝒏 − 𝟏, the denominator used in computing the sample standard deviation. The smaller the degrees of freedom, the larger the tail areas. As the df increases, the 𝑡 distribution approaches the 𝑧 distribution and tail area under the 𝑡 curve becomes closer and closer to the tail area under the 𝑧. As the degrees of freedom increases, the distinction between z and t practically disappears. For any 𝑑𝑓 > 2, the standard deviation of the t distribution is 𝑠𝑡 = √ 𝑑𝑓 . 𝑑𝑓−2 For example, if 𝑑𝑓 = 4, then 𝑠𝑡 = 1.414. As 𝑑𝑓 rises, the standard deviation approaches 1, which is the standard deviation of 𝑧. Let, for example, 𝑑𝑓 = 1000, then the standard deviation is practically 1 (√1000⁄998 = 1.001). The fact that 𝑡 has a larger standard deviation than 𝑧 makes the tail area under the 𝑡 curve relatively larger for a given value, than the area under the z curve for the same value. Thus, using a computer, it can be shown that, while the tail area for the z score 1.96 is 0.025, the tail area associated with a t score of 1.96 (with 𝑑𝑓 = 4) is 0.061. Having a larger standard deviation and tail area than 𝑧 is a reflection of the fact that the 𝑡 distribution applies to situations with a greater inherent uncertainty. The uncertainty arises from the fact that σ is unknown and 𝑥̅ −µ is estimated by the random variable 𝑠. The t distribution, 𝑡 = ⁄ , thus reflects the uncertainty in two random variables, 𝑥̅ and 𝑠, while 𝑧 = 𝑥̅ −µ σ⁄√𝑛 𝑠 √𝑛 reflects only an uncertainty due to 𝑥̅ . The greater uncertainty in t (which makes confidence intervals based on t wider than those based on z) is the price we pay for not knowing σ and having to estimate it from sample data. In inferential statistics we are interested in the t score for a given tail area, or in the tail area associated with a given 𝑡 score. A typical t table provides the 𝑡 scores for a given 𝑑𝑓 and various tail areas. But there are no tables which provide the tail area for different 𝑡 scores. In either case, a computer can easily provide the values we are looking for. Back to the confidence interval for µ: When σ is unknown, the margin of error used in building the confidence interval is, ̅) 𝒆 = 𝒕𝛂⁄𝟐,𝒅𝒇 𝐬𝐞(𝒙 where 𝑑𝑓 = 𝑛 − 1 and se(𝑥̅ ) = 𝑠⁄√𝑛. 2-Estimation and Inference 15 of 21 [Note: The symbol 𝑒 is used for margin of error in place of 𝜖 reflecting the fact that we are using an estimated value for the standard error.] The confidence interval with 1 − α level of confidence for the population mean is then, 𝑥̅ − 𝑡α⁄2,𝑑𝑓 𝑠 √𝑛 < µ < 𝑥̅ + 𝑡α⁄2,𝑑𝑓 𝐿, 𝑈 = 𝑥̅ ± 𝑡α⁄2,𝑑𝑓 𝑠 √𝑛 𝑠 √𝑛 Example 3 To build a confidence interval with a 0.95 level of confidence for the average life of a certain type of light bulb a sample of 𝑛 = 25 where tested. The sample mean is 𝑥̅ = 920.5 and the sample standard deviation is 𝑠 = 43.5. se(𝑥̅ ) = 𝑠 √𝑛 = 1 − 𝛼 = 0.95 43.5 √25 = 8.7 𝑑𝑓 = 𝑛 − 1 = 24 𝑡α⁄2,𝑑𝑓 = 𝑡0.025,24 = 2.064 To find 𝑡α⁄2,𝑑𝑓 = 𝑡0.025,24 = 2.064, use the following Excel function: =𝐓. 𝐈𝐍𝐕. 𝟐𝐓(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲, 𝐝𝐞𝐠_𝐟𝐫𝐞𝐞𝐝𝐨𝐦) 𝑒 = 𝑡α⁄2,𝑑𝑓 𝑠 √𝑛 =T. INV. 2T(0.05,24) = (2.064)(8.7) = 17.96 920.5 − 17.96 < µ < 920.5 + 17.96 902.54 < µ < 938.46 𝐿, 𝑈 = (902.54,938.46) 6. Test of Hypothesis for µ In the interval estimate process we started with no knowledge, assumption, or hypothesis, about the population parameter. A sample is taken and an interval is built around that sample. Unlike the interval estimate approach to inferential statistics, the test of hypothesis starts with a conjecture or hypothesis about the population parameter. Denoting the hypothesized value by µ0 , in hypothesis testing we ascribe this value to the center of gravity of the 𝑥̅ values in the sampling distribution of 𝑥̅ . Then the argument goes as follows: if µ0 were the actual population mean, then 1 − 𝛼 proportion of sample means would fall within the interval bounded by 𝑥̅𝐿 , 𝑥̅𝑈 = µ0 ± 𝑧𝛼⁄2 se(𝑥̅ ). 2-Estimation and Inference 16 of 21 1−α µ₀ To test for statistical validity of this conjecture, to determine if µ0 is reasonable value ascribed to the population mean, a sample of size 𝑛 is taken from which the sample statistic is computed. If the sample mean falls with the interval shown in the above diagram (the “acceptance region”), then we decide this mean belongs to the family of 𝑥̅ values whose center of gravity is µ0 , and conclude the population mean is the value we ascribed to µ0 . Note that the interval or acceptance region (𝑥̅𝐿 , 𝑥̅𝑈 ) is obtained by adding to and subtracting the margin of error from µ0 the margin of error 𝜖 = 𝑧𝛼⁄2 se(𝑥̅ ): 𝑥̅𝐿 , 𝑥̅𝑈 = µ0 ± 𝑧𝛼⁄2 se(𝑥̅ ) Therefore, main task in performing a test of hypothesis is to find the statistical margin of error. This would provide us with the critical value to establish the decision rule to accept or reject the hypothesis. The decision rule sets up the acceptance region, which defines the range of acceptable values to which the 𝑥̅ value from the sample is compared. To determine the acceptance region, first you must state the claim (the hypothesis) about the population mean in a prescribed way. The claim contains a null hypothesis, denoted by 𝐻0 , and an alternative hypothesis, 𝐻1 . Suppose we are testing the hypothesis that the population mean equals 100. 𝐻0 : µ = 100 𝐻1 : µ ≠ 100 The null hypothesis states that the population mean equals 100; the alternative hypothesis states that the population mean is a value other than 100. Once you stated your hypothesis, you must deal with the following dilemma involving hypothesis tests. Since the test of hypothesis involves the sampling distribution, in deriving a conclusion from the results of the test that is based on a random sampling process, there is always a chance that you may make a wrong decision, commit an error. There are two possible errors. 1) Type I Error—Reject a true null hypothesis. 2) Type II Error—Not reject false null hypothesis. 2-Estimation and Inference 17 of 21 (a) Type-I Error H₀ (b) Type-II Error H₁ µ₀ H₀ µ₁ H₁ µ₀ x̅ µ₁ x̅ There is always a chance that you may commit either one of the two errors. If the population mean is in fact µ = µ0 = 100, but if 𝑥̅ value falls outside the non-rejection interval (𝑥̅𝐿 , 𝑥̅𝑈 ) in panel (a) of the above diagram, then you would wrongly reject a true null hypothesis—you have committed a Type I error. The probability of committing a Type I error is α (the combined two-tail areas in the above diagram). The Type II error would occur if the population mean is not equal to 100 (µ = µ1 ≠ 100), but 𝑥̅ falls inside the non-rejection interval in panel (b), leading you to not reject a false null hypothesis. The probability of committing a Type II error is denoted by β, shown as the area to the left of 𝑥̅𝑈 under the distribution labeled 𝐻1 . Reducing α, expanding the non-rejection interval (𝑥̅𝐿 , 𝑥̅𝑈 ) for a given sample size, comes only at the cost of increasing β. Performing a test of hypothesis is like conducting a trial in a criminal court. The defendant or the accused is charged with a crime. The purpose of the trial is to establish the defendant’s guilt or innocence. The null hypothesis is that the defendant is innocent (the accused is presumed innocent) and the alternative is that he is guilty (the guilt to be established beyond a reasonable doubt by the prosecutor). If the jury finds an innocent person guilty, it has rejected a true null hypothesis; it has, therefore, committed a Type I error. On the other hand, if the jury finds a guilty person not guilty, it has not rejected a false null hypothesis; it has, therefore, committed a Type II error. In the hypothesis test, the benefit of the doubt is given to 𝐻0 , and burden of proof is upon 𝐻1 . That is, we want to make it unlikely to reject the null hypothesis unless the evidence is "very strong". We want to make it unlikely to find the defendant guilty unless guilt is established beyond a reasonable doubt. For this reason the α, the probability of rejecting the null hypothesis, is always assigned a small value—typically, 5 percent. The α value is also called the level of significance. Note that in a confidence interval, α is the percentage of all possible intervals built around sample means that do not capture the population mean. That was because α% of sample means fall outside the margin error 𝜖 = ±𝑧α⁄2 𝑠𝑒(𝑥̅ ). In a test of hypothesis, α plays a similar role. If the randomly selected 𝑥̅ falls outside the prescribed margin of error, we would wrongly reject the null hypothesis. And there is always an α% chance of doing that. Since committing a Type I Error is the more serious of the two errors, the threshold probability (the level of significance α) is set in advance. The probability of Type II Error (β), however, varies based on several factors, one of the them being α. The method to determine β will be explained later in this chapter. Suppose to test the null hypothesis that the population mean is 100 a random sample of size 16 is selected with the following results: 108 109 104 95 105 93 97 100 96 95 100 109 108 106 102 108 The mean of the sample is 𝑥̅ = 102.2. The question is then, is 102.2 significantly different from 100? How do we decide if the difference is significant? If we want to limit our probability of Type I error to 5 percent, then we select α = 0.05. Given this probability, then we can determine the 95% margin of error as follows: 2-Estimation and Inference 18 of 21 𝑒 = ±𝑡α⁄2,(𝑛−1) se(𝑥̅ ) First we must also compute the sample standard deviation (𝑠 = 5.671) to determine the standard error of 𝑥̅ . se(𝑥̅ ) = 𝑠⁄√𝑛 = 1.418 𝑡0.025,15 = 2.131 Thus, 𝑒 = 2.131 × 1.418 = 3.02. This tells us 95% of all means of samples of size 𝑛 = 16 fall within ±3.02 units from the population mean. Since 𝑥̅ = 102.2 differs from hypothesized mean µ = 100 by 2.2, then this difference falls within the acceptable margin of error of 3.02. Alternatively stated, 𝑥̅ = 102.2 falls within the non-rejection interval of 𝑥̅𝐿 = µ0 − 𝑒 = 100 − 3.02 = 96.98 and 𝑥̅𝑈 = µ0 + 𝑒 = 100 + 3.02 = 103.2. Therefore, if the population mean is 100, then 102.2 is one of the likely sample means. The decision rule for rejecting the null hypothesis, in short, can be written as: ̅ − µ𝟎 | > 𝒆 Reject H0, if |𝒙 This decision rule can also be written in a more frequently applied way, derived as follows: Start with the decision rule above and substitute for 𝑒 = 𝑡α⁄2,(𝑛−1) se(𝑥̅ ) on the right hand side of the inequality. |𝑥̅ − µ0 | > 𝑡α⁄2,(𝑛−1) se(𝑥̅ ) Divide both sides by se(𝑥̅ ). |𝑥̅ − µ0 | > 𝑡α⁄2,(𝑛−1) se(𝑥̅ ) The left hand side is the test statistic |t| and the right hand side is the critical value. Thus, the decision rule becomes: Reject H₀, if the test statistic exceed the critical value: |𝑡| > 𝑡α⁄2,(𝑛−1) 2-Estimation and Inference 19 of 21 In the example, |𝑡| = 102.2 − 100 = 1.552 1.418 is less than 𝑡0.025,15 = 2.131. Therefore, do not reject the null hypothesis. 6.1. The probability value The probability value approach to test of hypothesis is based on the notion that if the population mean is in fact 100, what is the probability that a randomly selected sample from this population would yield a sample mean which would deviate from µ = 100 by 2.2 units or more? This probability is the area under the curve to the right of 102.2. To find this probability, we must transform the test statistic into the t variable. This is already done above: |𝑡| = 1.552. Now find P(𝑡 > |𝑡|). Using Excel, this probability can be computed using the following Excel function: = T. DIST(x, deg_freedom, cumulative) Since we want the tail area associated with t-score, enter the negative value for 𝑡 and “1” for “cumulative”. = T. DIST(−1.552,15,1) = 0.0708 P(𝑡 > 1.552) = 0.0708 2-Estimation and Inference 20 of 21 0 1.552 2.131 For two-tail tests this probability value must be doubled, 0.0708 × 2 = 0.1416. This means that the probability that a sample mean would deviate (in either direction) from the population mean of 100 by 2.2 or more is 0.1416. Compared to the level of significance of α = 0.05, 0.1416 is a very high probability. This implies that if we reject the null hypothesis that the population mean is 100, the probability of committing a Type I error, rejecting a true null hypothesis, is over 14%, which far exceeds the self-imposed limit of 5%. Therefore, we do not reject the null hypothesis. In Excel you can obtain the 𝑝𝑣𝑎𝑙𝑢𝑒 for a two-tail test by = T. DIST. 2T(x, deg_freedom) = T. DIST(1.552,15) 2-Estimation and Inference = 0.1416 21 of 21

2-Estimation and Inf..

Related documents

Products

Support

2-Estimation and Inf..

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib