Inference on a Single Mean G. Baker, Department of Statistics University of South Carolina Use Calculation from Sample to Estimate Population Parameter Population (select) Sample (calculate) (describes) Parameter p? (estimate) Statistic pˆ 63 % G. Baker, Department of Statistics University of South Carolina; Slide 2 Use Calculation from Sample to Estimate Population Parameter Population (select) Sample (calculate) (describes) Parameter ? (estimate) Statistic y 2 , 200 hrs G. Baker, Department of Statistics University of South Carolina; Slide 3 Statistic Describes a sample. Always known Changes upon repeated sampling. Examples: y , s , s , pˆ 2 Parameter Describes a population. Usually unknown Is fixed Examples: , , , p 2 G. Baker, Department of Statistics University of South Carolina; Slide 4 A Statistic is a Random Variable Upon repeated sampling of the same population, the value of a statistic changes variable. While we don’t know what the next value will be, we do know the overall pattern over many, many samplings random. The distribution of possible values of a statistic for repeated samples of the same size from a population is called the sampling distribution of the statistic. G. Baker, Department of Statistics University of South Carolina; Slide 5 Sampling Distribution of y •If a random sample of size n is taken from a normal population having mean μy and variance σy2, then y is a random variable which is also normally distributed with mean μy and variance σy2/n . G. Baker, Department of Statistics University of South Carolina; Slide 6 Sampling Distribution of y Original Population Averages - Sample Size = 10 n(100,5) 80 85 90 95 100 105 110 115 n(100,1.58) 120 80 85 90 95 X 100 105 90 95 100 X(2) 120 Averages - Sample Size = 25 n(100,3.54) 85 115 X(10) Averages - Sample Size = 2 80 110 105 110 115 n(100,1) 120 80 85 90 95 100 105 110 115 120 G.X(25) Baker, Department of Statistics University of South Carolina; Slide 7 Light Bulbs The life of a light bulb is normally distributed with a mean of 2000 hours and standard deviation of 300 hours. What is the probability that a randomly chosen light bulb will have a life of less than 1700 hours? What is the probability that the mean life of three randomly chosen light bulbs will be less than 1700 hours? G. Baker, Department of Statistics University of South Carolina; Slide 8 Why Averages Instead of Single Readings? Suppose we are manufacturing light bulbs. The life of these bulbs has historically followed a normal distribution with a mean of 2000 hours and standard deviation of 300 hours. We change the filament material and unknown to us the average life of the bulbs decreases to 1500 hours. (We will assume that the distribution remains normal with a standard deviation of 300 hours.) If we randomly sample 1 bulb, will we realize that the average life has decrease? What if we sample 3 bulbs? 9 bulbs? G. Baker, Department of Statistics University of South Carolina; Slide 9 Why Averages Instead of Single Readings? μ = 1500 800 1300 σ = 300 μ = 2000 1800 2300 2800 Single Readings Y < 1400 would signal shift G. Baker, Department of Statistics University of South Carolina; Slide 10 Why Averages Instead of Single Readings? μ = 1500 800 1300 σ = 173 μ = 2000 1800 2300 2800 Averages of n = 3 Y < 1654 would signal shift G. Baker, Department of Statistics University of South Carolina; Slide 11 Why Averages Instead of Single Readings? µ = 1500 µμ==1500 1500 800 1300 µ = 2000 µμ==2000 2000 1800 2300 σ = 100 2800 Averages of n = 9 Y < 1800 would signal shift G. Baker, Department of Statistics University of South Carolina; Slide 12 What if the original distribution is not normal? Consider the roll of a fair die: Rolling A Fair Die Probability 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 # of Dots G. Baker, Department of Statistics University of South Carolina; Slide 13 Suppose the single measurements are not normally Distributed. Let Y = time to fail of a light bulb in constant failure rate mode Y is exponentially distributed with λ = 0.0005 = 1/2000 0.0005 0 1000 2000 3000 4000 5000 6000 G. Baker, Department of Statistics 8000 University of South Carolina; Slide 14 7000 Single measurements Averages of 2 measurements Averages of 4 measurements Averages of 25 measurements Source: Lawrence L. Lapin, Statistics in Modern Business Decisions, 6th ed., 1993, Dryden Press, Ft. Worth, Texas. G. Baker, Department of Statistics University of South Carolina; Slide 15 n=1 As n increases, what happens to the variance? n=2 n=4 A.Variance increases. B.Variance decreases. C.Variance remains the same. n=25 G. Baker, Department of Statistics University of South Carolina; Slide 16 n=1 n=2 n=4 n = 25 G. Baker, Department of Statistics University of South Carolina; Slide 17 Central Limit Theorem If n is sufficiently large, the sample means of random samples from a population with mean μ and standard deviation σ are approximately normally distributed with mean μ and standard deviation / n . G. Baker, Department of Statistics University of South Carolina; Slide 18 Random Behavior of Means Summary If Y is distributed n(μ, σ), then y n is distributed n(μ, / n ). If Y is distributed non-n(μ, σ), then y x 30 is distributed approximately n(μ, / n ). G. Baker, Department of Statistics University of South Carolina; Slide 19 If We Can Consider y to be Normal … Recall: If Y is distributed normally with mean μ and standard deviation σ, then Z Y So if y is distributed normally with mean μ and standard deviation / n , then Z Y / n G. Baker, Department of Statistics University of South Carolina; Slide 20 If the time between industrial accidents follows an exponential distribution with an average of 700 days, what is the probability that the average time between 49 pairs of accidents will be greater than 900 days? G. Baker, Department of Statistics University of South Carolina; Slide 21 XYZ Bottling Company claims that the distribution of fill on it’s 16 oz bottles averages 16.2 ounces with a standard deviation of 0.1 oz. We randomly sample 36 bottles and get y = 16.15. If we assume a standard deviation of 0.1 oz, do we believe XYZ’s claim of averaging 16.2 ounces? G. Baker, Department of Statistics University of South Carolina; Slide 22 Up Until Now We have been Assuming that We Knew the True Standard Deviation (σ), But Let’s Face Facts … When we use s to estimate σ, then the calculated value y s/ n follows a t-distribution with n-1 degrees of freedom. Note: we must be able to assume that we are sampling from a normal population. G. Baker, Department of Statistics University of South Carolina; Slide 23 Let’s take another look at XYZ Bottling Company. If we assume that fill on the individual bottles follows a normal distribution, does the following data support the claim of an average fill of 16.2 oz? 16.1 16.0 16.3 16.2 16.1 G. Baker, Department of Statistics University of South Carolina; Slide 24 In Summary When we know σ: Z y / n When we estimate σ with s: t df n 1 y s/ n We assume we are sampling from a normal population. G. Baker, Department of Statistics University of South Carolina; Slide 25 Relationship Between Z and t Distributions Z tdf=3 tdf=1 -4 -3 -2 -1 0 1 2 3 4 G. Baker, Department of Statistics University of South Carolina; Slide 26 Internal Combustion Engine The nominal power produced by a studentdesigned internal combustion engine should be 100 hp. The student team that designed the engine conducted 10 tests to determine the actual power. The data follow: 98, 101, 102, 97, 101, 98, 100, 92, 98, 100 Assume data came from a normal distribution. G. Baker, Department of Statistics University of South Carolina; Slide 27 Internal Combustion Engine Summary Data: Column hp n Mean 10 Std. Dev. 98.7 2.9 What is the probability of getting a sample mean of 98.7 hp or less if the true mean is 100 hp? G. Baker, Department of Statistics University of South Carolina; Slide 28 Internal Combustion Engine 98 . 7 100 P ( y 98 . 7 | 100 ) P t df 9 P ( t df 9 1 . 418 ) 2 . 9 / 10 0.0949 -4 -3 -2 -1 0 1 2 3 4 t(df=9) What did we assume when doing this analysis? Are you comfortable with the assumption? G. Baker, Department of Statistics University of South Carolina; Slide 29 Can We Assume Sampling from a Normal Population? If data are from a normal population, there is a linear relationship between the data and their corresponding Z values. Z Y Y Z If we plot y on the vertical axis and z on the horizontal axis, the y intercept estimates μ and the slope estimates σ. G. Baker, Department of Statistics University of South Carolina; Slide 30 How to Calculate Corresponding Z-Values Order data Estimate percent of population below each data point. i 0 .5 Pi n where i is a data point’s position in the ordered set and n is the number of data points in the set. Look up Z-Value that has Pi proportion of distribution below it. G. Baker, Department of Statistics University of South Carolina; Slide 31 Normal Probability (QQ) Plot Data set: 2 Z 4 7 10 Pi yi i -1.15 .125 2 1 -0.32 .375 4 2 +0.32 .625 7 3 +1.15 .875 10 4 Normal QQ Plot 12 10 Data 8 6 4 2 0 -1.5 -1 -0.5 0 0.5 1 1.5 Z values G. Baker, Department of Statistics University of South Carolina; Slide 32 Normal Probability (QQ) Plot QQ Plot with Data on Vertical Axis 16 14 12 10 8 6 4 2 0 -3 -2 -1 0 1 2 This data is a random sample from a n(10,2) population. G. Baker, Department of Statistics University of South Carolina; Slide 33 3 Normal Probability (QQ) Plot QQ Plot with Data on Vertical Axis 16 14 12 10 8 6 4 2 0 -3 -2 -1 0 1 2 3 G. Baker, Department of Statistics University of South Carolina; Slide 34 Estimation of the Mean G. Baker, Department of Statistics University of South Carolina Point Estimators A point estimator is a single number calculated from sample data that is used to estimate the value of a parameter. Recall that statistics change value upon repeated sampling of the same population while parameters are fixed, but unknown. Examples: pˆ estimates p ˆ s estimates ˆ y estimates 2 2 ˆ s estimates 2 G. Baker, Department of Statistics University of South Carolina; Slide 36 In General: ˆ is an estimator of the arbitrary parameter What makes a “Good” estimator? (1) Accuracy: An unbiased estimator of a parameter is one whose expected value is equal to the parameter of interest. (2) Precision: An estimator is more precise if its sampling distribution has a smaller standard error*. *Standard error is the standard deviation G. Baker, Department of Statistics for the sampling distribution. University of South Carolina; Slide 37 Unbiased Estimators For normal populations, both the sample mean and sample median are unbiased estimators of μ. Sampling Distributions for Mean and Median mean median -8 -6 -4 -2 µ0 2 4 6 8 G. Baker, Department of Statistics University of South Carolina; Slide 38 Most Efficient Estimators If you have multiple unbiased estimators, then you choose the estimator whose sampling distribution has the least variation. This is called the most efficient estimator. Sampling Distributions for Mean and Median mean median -8 -6 -4 -2 0 2 4 6 8 For normal populations, the sample mean is the most efficient G. Baker, Department of Statistics estimator of μ. University of South Carolina; Slide 39 Interval Estimate of the Mean Z Yn / follows a standard normal distributi on n P ( 1 . 96 Y / P ( Y 1 . 96 1 . 96 ) 0 . 95 n (with a little algebra) ) 0 . 95 n So we say that we are 95% confident that μ is in the interval Y 1 . 96 n What assumptions have we made? G. Baker, Department of Statistics University of South Carolina; Slide 40 Interval Estimate of the Mean Standard Normal 0.95 .025 -4 -3 -2 -1.96 -1 0 .025 1 2 1.96 3 4 Z G. Baker, Department of Statistics University of South Carolina; Slide 41 Interval Estimate of the Mean Let’s go from 95% confidence to the general case. The symbol zα is the z-value that has an area of α to the right of it. P ( z / 2 Y / n P ( Y z / 2 z / 2 ) (1 ) ) (1 ) n G. Baker, Department of Statistics University of South Carolina; Slide 42 Interval Estimate of the Mean Standard Normal 1-α α/2 -4 -3 -Zα/2 -2 -1 0 α/2 1 +Zα/22 3 4 (1 – α) 100% Confidence Interval G. Baker, Department of Statistics University of South Carolina; Slide 43 What Does (1 – α) 100% Confidence Mean? Sampling Distribution of the y n( , / y n) y Z 8 x y 7 Sample 6 x y 5 y 4 3 2 y y x y y 1 y x (1-α)100% Confidence Intervals 0 μ G. Baker, Department of Statistics University of South Carolina; Slide 44 If Z0.05 = 1.645, we are _____% confident that the mean is between y 1 . 645 n A.99% B.95% C.90% D.85% G. Baker, Department of Statistics University of South Carolina; Slide 45 Which z-value would you use to calculate a 99% confidence interval on a mean? A. B. C. D. Z0.10 = 1.282 Z0.01 = 2.326 Z0.005 = 2.576 Z0.0005 = 3.291 G. Baker, Department of Statistics University of South Carolina; Slide 46 Plastic Injection Molding Process A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. Periodically, clogs from one of the feeder lines causes the mean width to change. As a result, the operator periodically takes random samples of size 4. G. Baker, Department of Statistics University of South Carolina; Slide 47 Plastic Injection Molding A recent sample of four yielded a sample mean of 101.4. Construct a 95% confidence interval for the true mean width. Construct a 99% confidence for the true mean width. G. Baker, Department of Statistics University of South Carolina; Slide 48 When going from a 95% confidence interval to a 99% confidence interval, the width of the interval will Increase. B. Decrease. C. Remain the same. A. G. Baker, Department of Statistics University of South Carolina; Slide 49 Interval Width, Level of Confidence and Sample Size At a given sample size, as level of confidence increases, interval width __________. At a given level of confidence as sample size increases, interval width __________. G. Baker, Department of Statistics University of South Carolina; Slide 50 Calculate Sample Size Before Sampling! The width of the interval is determined by: z / 2 n Suppose we wish to estimate the mean to a maximum error of e: Max error e z / 2 n z / 2 n e 2 G. Baker, Department of Statistics University of South Carolina; Slide 51 Plastic Injection Molding A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. What sample size is required to estimate the true mean width to within + 2 units at 95% confidence? What sample size is required to estimate the true mean width to within + 2 units at 99% confidence? G. Baker, Department of Statistics University of South Carolina; Slide 52 If we don’t have prior knowledge of the standard deviation, but can assume we are sampling from a normal population… Instead of using a z-value to calculate the confidence interval… P ( t / 2 Y s/ n t / 2 ) (1 ) P ( Y t / 2 s n ) (1 ) G. Baker, Department of Statistics University of South Carolina; Slide 53 Interval Estimate of the Mean t Standard Normal 1-α α/2 -4 -3 df=n-1 -tα/2 -2 -1 0 α/2 1 +tα/2 2 3 4 (1 – α) 100% Confidence Interval G. Baker, Department of Statistics University of South Carolina; Slide 54 Plastic Injection Molding – Reworded A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A recent sample of four yielded a sample mean of 101.4 and sample standard deviation of 8. Estimate the true mean width with a 95% confidence interval. G. Baker, Department of Statistics University of South Carolina; Slide 55 Hypothesis Testing G. Baker, Department of Statistics University of South Carolina Combustion Engine The nominal power produced by a student designed combustion engine is assumed to be at least 100 hp. We wish to test the alternative that the power is less than 100 hp. Let µ = nominal power of engine. QQ plots shows it is reasonable to assume data came from a normal distribution. Sample Data: n 10 y 98 . 7 s 2 . 8694 G. Baker, Department of Statistics University of South Carolina; Slide 57 Combustion Engine (1) State hypotheses, set alpha. (2) Choose test statistic (3,4) Designate critical value for test and draw conclusion. or Calculate p-value and draw conclusion. G. Baker, Department of Statistics University of South Carolina; Slide 58 (3) Designate Critical Region Assumes H0: µ = 100 is true 0.05 -4 -3 -2 -1 0 100 -4 -3 -2 -1 0 -1.833 1 2 +1 +2 3 +3 4 Y=avg hp +4 tdf=9 G. Baker, Department of Statistics University of South Carolina; Slide 59 Draw conclusion: t df 9 -4 -3 y 0 s/ 98 . 7 100 n -2 -1 -1.4327 -1.833 1 . 4327 2 . 8694 / 10 0 1 2 3 4 tdf=9 G. Baker, Department of Statistics University of South Carolina; Slide 60 p-value The p-value is the probability of getting the sample result we got or something more extreme. 0.0928 -4 -3 -2 -1 -1.4327 0 1 2 3 4 tdf=9 G. Baker, Department of Statistics University of South Carolina; Slide 61 p-value P(tdf=9 < -1.4327) = 0.0928 Note: If p-value < α, reject H0. If p-value > α. Fail to reject H0. 0.0928 0.05 -4 -3 -2 -1 -1.4327 -1.833 0 1 2 3 4 tdf=9 G. Baker, Department of Statistics University of South Carolina; Slide 62 Average Life of a Light Bulb Historically, a particular light bulb has had a mean life of no more than 2000 hours. We have changed the production process and believe that the life of the bulb has increased. Let μ = mean life. (1) Set Up Hypotheses α = 0.05 H0: Ha: G. Baker, Department of Statistics University of South Carolina; Slide 63 Average Life of a Light Bulb (2) Collect Data and calculate test statistic: y 2141 t df 14 y 0 s/ n n 15 s 216 2141 2000 2 . 5282 216 / 15 0.05 0.0121 -4 -3 -2 -1 0 1 2 3 4 tdf=14 1.761 2.5282 p-value = P(tdf=14 > 2.5282) = 0.0121 G. Baker, Department of Statistics University of South Carolina; Slide 64 Average Life of a Light Bulb State Conclusion: A. B. At 0.05 level of significance there is insufficient evidence to conclude that µ > 2000 hours. At 0.05 level of significance there is sufficient evidence to conclude that µ > 2000 hours. G. Baker, Department of Statistics University of South Carolina; Slide 65 Mean Width of a Manufactured Part Test the theory that the mean width of a manufactured part differs from 100 cm. Let µ = mean width. (1) Set up Hypotheses α = 0.05 G. Baker, Department of Statistics University of South Carolina; Slide 66 Mean Width of a Manufactured Part (2,3) Collect data and calculate test statistic. y 105 s 6 n 20 t df 19 p value 2 * P ( t df 19 .... (4) State conclusion. G. Baker, Department of Statistics University of South Carolina; Slide 67 Given population parameter µ and value µ0: For Ho: µ = µ0 Ha: µ = µ0 α/2 α/2 Ha H0 Ha: µ > µ0 α H0 Ha: µ < µ0 Ha Ha α Ha H0 G. Baker, Department of Statistics University of South Carolina; Slide 68 There Are Two Errors We Can Make in a Hypothesis Test 1) Reject H0 when H0 is true. This is called a type I error. P(Rej H0|H0 is true) = α 2) Fail to Reject H0 when Ha is true at some value. This is called a type II error. P(Fail to Rej H0|Ha is true at some value) = β G. Baker, Department of Statistics University of South Carolina; Slide 69 Avg Life of Light Bulb - Type I Error H0: µ < 2000 Ha: µ > 2000 Z Fail to reject H0. Assumes H0 is true. α = Probability that we will reject Ho when Ho is true. G. Baker, Department of Statistics University of South Carolina; Slide 70 Type I and Type II Errors H0: µ = 2000 β = Probability we will fail to reject Ho when Ha is true at µ = 2200 What if µ = 2200 α = Probability that we will reject Ho when Ho is true. G. Baker, Department of Statistics University of South Carolina; Slide 71 How can we control the size of β? The value of α. Location Sample of our point of interest. size. G. Baker, Department of Statistics University of South Carolina; Slide 72 Calculating β If µ = 2200, what is the probability of a type II error? Given: α = 0.05 and we are assuming µ = 2000. We will also assume we know σ = 216. P ( Z 1 . 645 ) 0 . 05 1 . 645 y 2000 y 2091 216 / 15 G. Baker, Department of Statistics University of South Carolina; Slide 73 Calculating β H0: µ = 2000 Fail to Reject Ho What if µ = 2200 2091 Reject Ho P ( y 2091 | 2200 ) G. Baker, Department of Statistics University of South Carolina; Slide 74 Calculating β P ( y 2091 | 2200 ) 2091 2200 P z P ( z 1 . 9544 ) 0 . 0254 216 / 15 P ( Fail to Reject H 0 | 2200 ) 0 . 0254 G. Baker, Department of Statistics University of South Carolina; Slide 75 α, β and Power α = P(Reject H0|µ = 2000) = 0.05 β = P(Fail to Rej H0| µ = 2200) = 0.0254 We say that the power of this test at µ = 2200 is 1 – 0.0254 = 0.9746 Power = 1 –β Power = P(Rej H0|µ is at some Ha level) G. Baker, Department of Statistics University of South Carolina; Slide 76 Plastic Injection Molding A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A recent sample of n = 4 yielded a sample mean of 101.4 and sample standard deviation of 8. Does this data support the statement: “The true average width is greater than 95.”? G. Baker, Department of Statistics University of South Carolina; Slide 77 Plastic Injection Molding Confidence Interval Approach 95% confidence interval on µ: y t df 3 , 0 .025 s n 101 . 4 3 . 182 8 101 . 4 12 . 728 4 ( 93 . 56 ,109 . 24 ) G. Baker, Department of Statistics University of South Carolina; Slide 78 Plastic Injection Molding Hypothesis Test Approach H0: α = 0.05 Ha: Test statistics is p-value = Conclusion: G. Baker, Department of Statistics University of South Carolina; Slide 79