Review #2 Chapter 9 Chapter 10 Chapter 11 and 12 1 Chapter 9 Sampling Distributions • A statistic is a random variable describing a characteristic of a random samples. – Sample mean – Sample variance • We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). • Statistics have distributions of their own. 2 Chapter 9 The Central Limit Theorem • The distribution of the sample mean is normal if the parent distribution is normal. • The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n 30), even if the parent distribution is not normal. • The parameters of the sample distribution of the mean are: – Mean: – Standard deviation: (Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance). x x x x n 3 Chapter 9 The Central Limit Theorem • Problem 1 (Using Excel) Given a normal population whose mean is 50 and whose standard deviation is 5, – Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 – Answer: P( 49 x 52 ) P( 49 50 5 Z 52 50 4 5 ) 4 P( .4 Z .8 ) [ In Ex celworksheet type : NORMSDIST(.8) - NORMSDIST(-.4) ] The answer : .443566 -.4 .8 4 Normal table Chapter 9 The Central Limit Theorem • Problem 1 (Using the table) Given a normal population whose mean is 50 and whose standard deviation is 5, – Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 – Answer: 49 50 52 50 P(49 x 52) P( Z ) 5 4 5 4 P(.4 Z .8) .7881 .3446 .4435 -.4 .8 5 Normal table Chapter 9 The Central Limit Theorem • Problem 1 – Question 2: Find the probability that a random sample of 16 has a mean between 49 and 52. • Answer 49 50 52 50 P(49 x 52) P( Z ) 5 16 5 16 P(.8 Z 1.6) .9332 .2119 .7213 6 Normal table Chapter 9 The Central Limit Theorem • Problem 2: The amount of time per day spent by adults watching TV is normally distributed with =6 and =1.5 hours. – Question 1: What is the probability that a randomly selected adult watches TV for more than 7 hours a day? – Answer: P( X 7) [In Ex celtype : 1 - NORMDIST(7,6,1.5, True) then clickanywhere. The answer : .252492 – Question 2: What is the probability that 5 adults watch TV on the average 7 or more hours? Answer: 76 P(X 7) P Z 1.5 5 7 P(Z 1.49) 1 .9319 .0681 Normal table Chapter 9 The Central Limit Theorem • Problem 2: – Question 3: What is the probability that the total time of watching TV of the five adults will not exceed 28 hours? – Answer: 5.6 6 P(X 28/5) P Z 1.5 5 – Question 4: What total TV watching time is exceeded by only 3% of the population for samples of 5 adults? Comments: P(Total time x 0 ) P(Average time x 0 ) .03 1.Excel returns X for a [In Ex celtype : NORMINV(.9 7,6, .670822) given left hand tail probability then clickanywhere. The answer : x 0 6.892137 2. .670822 = 1.5/5.5 Thus,x 0 5(6.892137) 34.46 8 Normal table Chapter 9 The Central Limit Theorem • Problem 3: Assume that the monthly rents paid by students in a particular town is $350 with a standard deviation of $40. A random sample of 100 students who rented apartments was taken. Question1: What is the probability that the sample mean of the monthly rent exceeds $355? 355 350 P(X 355) P Z P(Z 1.25) 40 100 P(Z 1.25) 1 .8944 .1056 9 Normal table Chapter 9 The Central Limit Theorem • Problem 3 - continued Question2: What is the probability that the total revenue from renting 10 randomly selected apartments falls between 3300 and 3700 dollars? P(3300 Total rental revenue 3700) 40/10.5 = 12.64911 P(330 Average rent 370) [In Ex celtype : NORMDIST(370,350,12.64911) - NORMDIST(330,350,12.64911) The answer : 0.886154 10 Normal table Chapter 9 The Central Limit Theorem • Problem 3 - continued Question3: Let’s assume the population mean was unknown, but the standard deviation was known to be $40. A sample of 100 rentals was selected in order to estimate the mean monthly rent paid by the whole student population. What is the probability that the sample mean differ from the actual mean by more than $5? How about more than $10? 11 Chapter 9 The Central Limit Theorem • Problem 3 – continued (i ) P(X μ 5 or X μ 5 ) P(X μ 5) P(X μ 5) X μ X μ 5 5 P P 40 100 40 100 σx σx P(Z 1.25) P(Z 1.25) .1056 .1056 .2112 (ii) P(X μ 10 or X μ 10 ) P(X μ 10) P(X μ 10) X μ X μ 10 10 P P 40 100 40 100 σx σx P(Z 2.5) P(Z 2.5) 2(1 .9938) .0124 12 Chapter 9 Sampling distribution of the sample proportion In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion ^p = x/n is approximately normally distributed with the following parameters: p(1 p) pˆ p and pˆ , therefore , n (Assumption: pˆ p The population is sufficiently Z large. No correction is needed p(1 p) n in the calculation of the variance). 13 Sampling distribution of the sample proportion • Problem 4: – A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. – A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim. 14 Normal table Sampling distribution of the sample proportion Problem 4 - Continued: Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? . 10 . 05 P(Z 4.59) 0 P(pˆ .10) P Z . 05 ( 1 . 05 ) 400 If indeed 10% of the sampled households reported a call for service within the first year, what does it tell you about the the manufacturer claim? 15 Sampling Distribution of the Difference Between two Means • If two independent variables are normally distributed with means and variances 1,21, and 2,22 respectively, then x1 – x2 is also normally distributed with: x1 x 2 1 2 2 x1 x 2 2 1 2 2 n1 n 2 16 Sampling Distribution of the Difference Between two Means • When at least one of the populations is not normally distributed but the samples sizes are both at least 30, x1 – x2 is approximately normally distributed, with a mean and a variance as indicated above. 17 Sampling Distribution of the Difference Between two Means • Example: A national TV telethon committee is interested in determining whether donations made by males are on the average larger than those made by females by $4. Two samples of 25 males and 25 females were selected, and the donations made recorded. If the standard deviations of the male and female populations are $2.4 and $1.8 respectively, what is the probability that sample mean of the male donations exceeds the sample mean of the female donations by at least $5? Assume donations for the two populations are normally distributed. 18 Sampling Distribution of the Difference Between two Means • Solution x 1 x 2 ( 1 2 ) P( x1 x 2 5) P 2 2 1 2 For males For females n1 n2 54 2 2 2.4 1.8 25 25 19 Chapter 10 Introduction to Estimation • A population’s parameter can be estimated by a point estimator and by an interval estimator. • A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. • Confidence intervals are constructed using sampling distributions. 20 Confidence interval of the mean – Known Variance • We use the central limit theorem to build the following confidence interval x za / 2 a/2 n x za / 2 n a/2 1-a -za/2 za/2 21 Confidence interval of the mean – Known Variance • Problem 5: How many classes university students miss each semester? A survey of 100 students was conducted. (See Data next) • Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. Use 99% confidence level. 22 Data Confidence interval of the mean – Known Variance – Solution x za / 2 n = 10.21 2.575 2.2 100 = 10.21 .57 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 LCL = 9.64, UCL = 10.78 You can used Data Analysis Plus > Z-Estimate: Mean 23 Data Confidence interval of the mean – Known Variance – Solution (using Data Analysis Plus): • Shade the data set (you may include the title label) • Select Data Analysis Plus, then “Z-Estimate: Mean” • Type in the sigma (2.2), check Labels (if appropriate), type in alpha (.01), click OK. z-Estimate: Mean Mean Standard Deviation Observations SIGMA LCL UCL Classes 10.21 2.1756 100 2.2 9.643316 10.77668 24 Selecting the sample size • The shorter the confidence interval, the more accurate the estimate. • We can, therefore, limit the width of the interval to 2W, and get x W x za / 2 n or W z a / 2 • From here we have za / 2 n W 2 n W is called “Margin of error”, or “Bound on the error estimate” 25 Selecting the sample size • Problem 6 An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. • Sigma is known to be 6 minutes. • The required estimate accuracy is within 20 seconds. • The confidence level is 90%; 95%. • Find the sample size. 26 Selecting the sample size – Solution = 6 min; W = 20 sec = 1/3 min; • 1 - a =.90 Za/2 = Z.05 = 1.645 2 2 2 za / 2 z .05 1.645(6) n 876.75 1/ 3 W W Take n 877 • 1-a = .95, Za/2 = Z.025 = 1.96 2 1.96(6) n 1244.67 Take n 1245 1/ 3 27 Chapter 11 Hypotheses tests – In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. – The structure of hypotheses test • Formulate two hypotheses. – H0: The one we try to reject in favor of … – H1: The alternative hypothesis, the one we try to prove. • Define a significance level a. 28 Hypotheses tests – The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) – Sample from the population and calculate a statistic that provides an indication whether or not the parameter value under H1 is more likely to be true. – We shall test the population mean assuming the standard deviation is known. 29 Hypotheses tests of the Mean – Known Variance • Problem 7: A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch. 30 Hypotheses tests of the Mean – Known Variance • Solution: The population studied is the ball-bearing diameters. – We hypothesize on the population mean. – A good point estimator for the population mean is the sample mean. – We use the distribution of the sample mean to build a sample statistic to test whether = .50 inch. 31 Hypotheses tests of the Mean – Known Variance Solution – (A Two Tail rejection region) – Define the hypotheses: • H0: = .50 • H1: = .50 The probability of conducting a type one error P( X X L1 or X X L 2 given that .50) .05, or P( Z Z L1 or Z Z L 2 given that .50) .05 If X L1 and XL2 have symmetrical values around μ the ZL1 and ZL2 are symmetrical around zero,therefore, Z L1 Z α/2 and ZL2 Z α/2 . 32 Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region Critical Z P(Z Z.025 or Z Z.025 given that .50) .05 Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 -1.96 1.96 Calculate the value of the sample Z statistic and compare it to the critical value Z sample X n .51 .50 .05 100 2 Since 2 > 1.96, there is sufficient evidence to reject H0 in favor of H1 at 5% significance level. 33 Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region • We can perform the test in terms of the mean value. • Let us find the critical mean values for rejection XL2=0 + Z.025 XL1=0 - Z.025 =.50+1.96(.05)/(100)1/2=.5098 n =.50 -1.96(.05)/(100)1/2=.402 n Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level. 34 Hypotheses tests of the Mean – Known Variance • Calculate the p value of this test • Solution p-value = P(Z > Zsample) + P(Z < -Zsample) = P(Z > 2) + P(Z < -2) = 2P(Z > 2) = 2[1 - .9772} = .0456 • Since .0456 < .05, H0 is rejected. 35 Hypotheses tests of the Mean – Known Variance • Problem 8 – The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. – It is believed that banks that exercise comprehensive planning do better. – A sample of 26 banks that exercise comprehensive training provide the following result: Mean return = 10.5% – Can we infer that the belief about bank performance is supported at 10% significance level by this sample result? 36 Data Hypotheses tests of the Mean – Known Variance • Solution: (A right Hand Tail Rejection region) The population tested is the “annual rate of return”. – H0: = 10.2 – H1: > 10.2 • Let us perform the test with the standardized rejection region approach: Zsample > Z.10 (Right hand tail rejection region) Z.10 = 1.28. Reject H0 if Zsample > 1.28 37 Hypotheses tests of the Mean – Known Variance Z sample x n 10.5 10.2 .8 1.91 26 • Conclusion – At 10% significance level there is sufficient evidence in the data to reject H0 in favor of H1, since the sample statistic falls inside the rejection region. • Interpretation: – If we are willing to accept 10% chance of making the wrong conclusion, we can conclude banks conducting comprehensive training perform better than banks who do not. 38 Data Hypotheses tests of the Mean – Known Variance • Let us perform the test with the p-value method: P(X > 10.5 given that = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = .5 - .4719 = .0281 • Since .0281 < .10 we reject the null hypothesis at 10% significance level. 39 Hypotheses tests of the Mean – Known Variance • Note the equivalence between the standardized method or the rejection region method and the p-value method. P(Z>Z.10) = .10 Z10 = 1.28 .10 The statement “p-value is smaller than alpha, is equivalent to the statement “ the test statistic falls in the rejection region” .0281 1.28 1.91 40 Hypotheses tests of the Mean – Known Variance • Problem 9 – In the midst of labor-management negotiations, the president of a company argues that the company’s blue collar workers, who are paid an average of $30K a year, are well-paid because the mean annual pay for blue-collar workers in the country is less than $30K. – This figure is disputed by the union. To test the president’s belief an arbitrator draws a random sample of 350 bluecollar workers from across the country and their income recorded (see file Salaries). – If the arbitrator assumes that income is normally distributed with a standard deviation of $8,000, can it be inferred at 5% significance level that the company’s president is correct? 41 Data Hypotheses tests of the Mean – Known Variance • Solution (A left Hand Tail Rejection Region) The population tested is the ann. Salary – H0: = 30K H1: < 30K – Left hand Tail Rejection region: Z < -Z.05 or Z < -1.645 ZSample =(29,119.5-30,000)/(8,000/350.5)= -2.059 Since –2.059 < -1.645 there is sufficient evidence to infer that on the average blue collar workers’ income is lower than $30K at 5% significance level. 42 Hypotheses tests of the Mean – Known Variance • Calculate the p-value of this test: • Solution Z-Test: Mean Incomes p-value =Mean P(Z < Zsample) = P(Z < -2.059) 29119.52 Standard Deviation Observations Hypothesized Mean SIGMA z Stat P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail 8460.491 350 30000 8000 -2.059 0.0197 1.6449 0.0394 1.96 43 Type II Error • Problem 7a Calculate b for the two-tail hypotheses test performed in problem 7, when the actual mean diameter is .515 inch. • Solution – The rejection region in terms of the critical values of the sample mean was found before: XL1 = .402; XL2 = .5098. H0: = .500 H1: = .515 b = P(Do not reject H0 when H1 is true) = P(.402 < x < .5098 when = .515) = P(.402-.515)/[.05/(100).5] < Z < (.5098-.515)/[.05/(100).5] P(-22.6 < Z < -1.04) = P(1.04 < Z < 22.6) = P(Z<22.6) – P(Z<1.04) ≈ 1-P(Z<1.04) = 1 - .8508 = .1492 – This large probability may be reduced by taking larger samples 44 Ch 12: Inference when the Variance is Unknown • Generally, the variance may be unknown • In this case we change the test statistic from “Z” to “t”, when testing the population mean. • To test the population proportion we’ll use the normal distribution (under certain conditions). 45 Testing the mean – unknown variance • Replace the statistic Z with “t” t X s n The original distribution must be normal (or at least mound shaped). 46 Testing the mean – unknown variance • Problem 10 – A federal agency inspects packages to determine if the contents is at least as large as that advertised. – A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (data is provided later) – From the sample results… • Can we conclude that the average weight does not meet the weight stated? (use a = .05). • Estimate the mean weight of all containers with 99% confidence • What assumption must be met? 47 Testing the mean – unknown variance • Solution – We hypothesize on the mean weight. • H0: = 8.04 • H1: < 8.04 • (i) n=5. For small samples let us solve manually Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 – The rejection region: t < -ta,n1 = -t.05,5-1 = -2.132 The tsample = ? – Mean = (8.07+…+7.94)/5 = 7.996 -2.132 Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054 48 Testing the mean – unknown variance – The tsample is calculated as follows: t X s n 7.996 8.04 0.054 1.32 5 – Since -1.32 > -2.132 the sample statistic does not fall in the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. -2.132 -.165 49 Testing the mean – unknown variance – (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain: Mean = 8.02; Std. Dev. = .04 – The confidence interval is calculated by x ta/2 1-a = .99 a = .01 a/2 = .005 s n = 8.02 2.678 .04 50 = 8.02 .015 LCL = 8.005, UCL = 8.35 t.005,50-1 = about 2.678 from the t - table 50 Data Testing the mean – unknown variance • Comments – Check whether it appears that the distribution is normal Frequency 20 15 10 5 0 7.93 7.97 8.01 8.05 8.09 More 51 Data Using Excel: – To obtain an exact value for t use the TINV function: =TINV(0.01,49) The exact value: 2.6799535 Degrees of freedom .01 is the two tail probability = .005*2 52 Testing the mean – unknown variance • Problem 11 – Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. – Springs are designed to be 500mm long. • Springs too long or too short must be reworked. • A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. – A sample of 100 springs was taken and measured. 53 Data Testing the mean – unknown variance • Problem – continued – Can we infer at 10% significance level that the mean spring length is not 500mm? Solution H0: 500 H1: 500 Since the standard deviation is unknown We need to run a t-test, assuming the spring length is normally distributed. Rejection region: t-Test of a Mean Sample mean 499.9697 t Stat t < -ta/2 or t > ta/2 Sample standard deviation 2.55247 P(T<=t) one-tail Sample size 100 t Critical one-tail with d.f. = 99 Hypothesized mean 500 P(T<=t) two-tail Alpha t < -1.6604 or t > +1.6604 0.1 t Critical two-tail -.12 -1.6604 -0.12 0.4529 1.2902 0.9057 1.6604 54 -1.6604 Inference about a population proportion • The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. • For the confidence interval of p we have: pˆ Z a 2 where p^ = x/n pˆ (1 pˆ ) n • For the hypotheses test, we use a Z test. 55 Inference about a population proportion • Problem 12 (problem 11 continued). The engineers were interested in the percentage of springs that are the correct length. They marked each spring in the sample as – Correct – 1; – Too long – 2; – Too short – 3; Can we infer that less than 90% of the springs are the correct length, at 10% sig. level? 56 Data Inference about a population proportion • Problem 12 - Solution Conclusion: Since –1.33 < -1.28 we can infer – H0: p = .9 that less than 90% of the springs H1: p < .9 do not need reworking. – Rejection region: Z < -Za,or Z < -1.28 Z pˆ p pˆ (1 pˆ ) n .86 .8 .86(1 .86) 100 1.33 z-Test of a Proportion Sample proportion Sample size Hypothesized proportion Alpha 0.86 100 0.9 0.1 z Stat P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail -1.33 0.0912 1.2816 0.1824 1.6449 57 Data Inference about a population proportion • Problem 12 – solution continued – Let us estimate the proportion of good springs at 99% confidence level. pˆ Z a 2 pˆ (1 pˆ ) .86(1 .86) .86 2.575 n 100 z-Estimate of a Proportion Sample proportion Sample size Confidence level 0.86 100 0.99 Confidence Interval Estimate 0.86 Lower confidence limit Upper confidence limit 0.0894 0.7706 0.9494 58 Inference about a population proportion • Problem 12 – solution continued – Find the sample size if the proportion of good springs is to be estimated to within .035. Consider the given sample an initial sample. 2 z a 2 pˆ (1 pˆ ) 2.575 .86(1 .86) n 652 W .035 2 59 Inference about a population proportion • Problem 13 – A consumer protection group runs a survey of 400 dentists to check a claim that more than 4 out of 5 dentists recommend ingredients included in a certain toothpaste. – The survey results are as follows: 71 – No; 329 – Yes – At 5% significance level, can the consumer group infer that the claim is true? 60 Inference about a population proportion • Problem 13 - Solution – The two hypotheses are: H0: p = .8 The rejection region: Z > Za H1: p > .8 Z pˆ p p(1 p) n .8225 .8 .8(1 .8) 400 1.125 Z.05 = 1.645 Conclusion: Since 1.125 < 1.645 the consumer group cannot confirm the claim at 5% significance level. 61 Summary Example • An automotive expert claims that the large number of self-serve gas stations has resulted in poor automobile maintenance, and that the average tire pressure is more than 4.5 psi below it’s manufacturer specifications. • A random sample of 50 tires revealed the results stored in the file TirePressure. • Assume the tire pressure is normally distributed with = 1.5 psi, and answer the following questions: 62 Tire Pressure Summary Example • At 10% significance level can we infer that the expert is correct? What is the p value? Solution The p value = – The Hypotheses: H0: = 4.5 H1: > 4.5 P(Sample Mean > 5.04 when = 4.5)= P(Z > 2.545) = 1- .9945 = .0055 The rejection region: Z > Z.10or Z > 1.28. From the data we have: mean = 5.04, so Z=(5.04 – 4.5)/(1.5/50.5) = 2.545 – Since 2.545 > 1.28, there is sufficient evidence to infer that the expert is correct. 63 Summary Example • Find the probability of making a type II error when the actual tire under-inflation is 5 psi on the average. Solution The Rejection Region in terms of the sample means is found first: ZL= 1.28 =(XL – 4.5)/(1.5/50.5). XL= 4.5 + 1.28(1.5/50.5) = 4.77. So, the Rejection Region is: Sample mean > 4.77. b = P(accept H0 when H1 is true) = P(sample mean does not fall in the RR, when = 5) = P( x < 4.77 when = 5) = P(Z < (4.77-5)/(1.5/50.5)) = P(Z < -1.08) = From Excel: [=NORMSDIST(-1.077)] = .1407 64 Inference about the population Variance • The following statistic is c2 (Chi squared) distributed with n-1 degrees of freedom: (n 1)s c 2 2 2 • We use this relationship to test and estimate the variance. 65 Inference about the population Variance • The Hypotheses tested are: H0 : 2 20 H1 : 2 20 or 20 or 20 • The rejection region is: (n 1)s 2 02 c a2 , n 1 or c12 a , n 1 For the two tail test replace a with a. 2 66 Testing the Variance • Problem 15 • Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. • Springs are designed to be 500mm long. – Springs too long or too short must be reworked. – A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. • A sample of 100 springs was taken and measured. 67 Data Testing the Variance • Problem 15 - continued Can we infer at 10% significance level that the number of springs requiring reworking is unacceptably large? H0: 2 = 4 H1: 2 > 4 The number of springs requiring reworking depends on the standard deviation, or the variance. Rejection region: Chi-squared Test of a Variance c2Sample > c2a Sample variance 6.515104 Chi-squared Stat 161.25 Sample size 100 P(CHI<=chi) one-tail 0.0001 d.f. = 99 Hypothesized variance 4 chi-squared Critical one-tail 117.4069 Alpha c2Sample > 117.4069 0.1 P(CHI<=chi) two-tail chi-squared Critical two-tail 0.0002 77.0463 123.2252 68 Testing the Variance • Problem 15 - conclusion Since 161.25 > 117.4069, we can infer at 10% significance level that the standard deviation is greater than 2, thus the number of springs that require reworking is unacceptably large. 69 Testing the Variance • Problem 16 • A random sample of 100 observations was taken from a normal population. The sample variance was 29.76. • Can we infer at 2.5% significance level that the population variance DOES NOT exceeds 30? • Estimate the population variance with 90% confidence. 70 Testing the Variance • Problem – 16: Solution: • H0:2 = 30 • H1:2 < 30 c2 = (n – 1)s2 02 Rejection region: c2 < c21-a, n-1 c2 < 73.36 = (100 – 1)29.76 30 = 98.21 Chi-squared Test of a Variance Sample variance Sample size Hypothesized variance Alpha 29.76 100 30 0.975 ! Chi-squared Stat P(CHI<=chi) one-tail chi-squared Critical one-tail P(CHI<=chi) two-tail chi-squared Critical two-tail 98.21 0.4964 73.3611 0.9928 97.8956 98.7740 71 Testing the Variance • Problem 16 - conclusion Since 98.208 > 73.36 we conclude that there is insufficient evidence at 2.5% significance level to infer that the variance is smaller than 30. 72 Using Excel – We can get an exact value of the probability P(c2d.f.> c2) = ? for a given c2 and known d.f., and then determine the p-value. – Use the CHIDIST function: =CHIDIST(c2,d.f.) For example: = CHIDIST(98.208,99) = .50359 That is: P(c299> 98.208) = .50359 – In our example we had a left hand tail rejection region, and therefore the p-value is P(c299 < 98.208) = 1 - .50359 = .49641> .025 73 Using Excel – We can get the exact c2 value for which P(c2d.f.> c2) = a, for any given probability a and known d.f., then define the rejection region: – Use the CHIINV function =CHIINV(a,d.f.) For example: =CHIINV(.975,99) = 73.36 That is: P(c299 > ?) = .975. c2 = 73.36 The rejection region is: c2 < 73.36. 74