8/14/2007 Chapter 4: Sampling and Estimation © 2007 Pearson Education Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. Sample Design Sampling Plan – a description of the approach that will be used to obtain samples from a population Objectives Target population Population frame Method of sampling Operational procedures for data collection Statistical tools for analysis 1 8/14/2007 Sampling Methods Subjective Judgment sampling Convenience sampling Probabilistic Simple random sampling – every subset of a given size has an equal chance of being selected PHStat Tool Random Sample Generator PHStat menu > Sampling > Random Sample Generator Enter sample size Select sampling method Excel Data Analysis Tool Sampling Excel menu > Tools > Data Analysis > Sampling Specify input range of data Choose sampling method Select output option 2 8/14/2007 Other Sampling Methods Systematic sampling Stratified sampling Cluster sampling Sampling from a continuous process Errors in Sampling Nonsampling error Sampling (statistical) error Poor sample design Depends on sample size Tradeoff between cost of sampling and accuracy of estimates obtained by sampling Estimation Estimation – assessing the value of a population parameter using sample data. Point estimate – a single number used to estimate a population parameter Confidence intervals – a range of values between which a population parameter is believed to be along with the probability that the interval correctly estimates the true population parameter 3 8/14/2007 Common Point Estimates Theoretical Issues Unbiased estimator – one for which the expected value equals the population parameter it is intended to estimate The sample variance is an unbiased estimator for the population variance 2 n xi s 2 xi 2 i 1 2 n x i 1 n 1 N Interval Estimates Range within which we believe the true population parameter falls Example: Gallup poll – percentage of voters favoring a candidate is 56% with a 3% margin of error. Interval estimate is [53%, 59%] 4 8/14/2007 Confidence Intervals Confidence interval (CI) – an interval estimated that specifies the likelihood that the interval contains the true population parameter Level of confidence (1 – ) – the probability that the CI contains the true population parameter, usually expressed as a percentage (90%, 95%, 99% are most common). Sampling Distribution of the Mean Interval Estimate Containing the True Population Mean 5 8/14/2007 Interval Estimate Not Containing the True Population Mean Confidence Interval for the Mean – Known A 100(1 – )% CI is: x z /2( / n) z /2 may be found from Table A.1 or using the Excel function NORMSINV(1- /2) Example Compute a 95 percent confidence interval for the mean number of TV hours/week for the 18-24 age group in the file TV Viewing.xls. Assume that the population standard deviation is known to be 10.0. The sample mean for the n = 45 observations is computed to be 60.16. For a 95 percent CI, z /2 = 1.96. Therefore, the CI is 60.16 1.96(10/ 45) = 60.16 2.92 or [57.24, 63.08] 6 8/14/2007 Confidence Interval for the Mean, Unknown A 100(1 – )% CI is: x t /2,n-1(s/ n) t /2,n-1 is the value from a t-distribution with n-1 degrees of freedom, from Table A.2 or the Excel function TINV( , n-1) Relationship Between Normal Distribution and t-distribution The t-distribution yields larger confidence intervals for smaller sample sizes. Example Compute a 95 percent confidence interval for the mean number of TV hours/week for the 18-24 age group in the file TV Viewing.xls. Assume that the population standard deviation is not but estimated from the sample as 10.095. A 95 percent CI corresponds to /2 = 0.025. With 45 observations, thus the t-distribution has 45 - 1 = 44 df. Using Table A.2, we find that t0.025, 44 = 2.0154, yielding a 95 percent CI for the mean of 60.16 2.0154(10.095/ 45) = 60.16 3.03 or [57.13, 63.19] 7 8/14/2007 PHStat Tool: Confidence Intervals for the Mean PHStat menu > Confidence Intervals > Estimate for the mean, sigma known…, or Estimate for the mean, sigma unknown… PHStat Tool: Confidence Intervals for the Mean - Dialog Enter the confidence level Choose specification of sample statistics Check Finite Population Correction box if appropriate Sampling From Finite Populations When n > 0.05N, use a correction factor in computing the standard error: x n N N n 1 8 8/14/2007 PHStat Tool: Confidence Intervals for the Mean - Results Confidence Intervals for Proportions Sample proportion: p = x/n x = number in sample having desired characteristic n = sample size The sampling distribution of p has mean and variance (1 – )/n When n and n(1 – ) are at least 5, the sampling distribution of p approach a normal distribution Confidence Intervals for Proportions A 100(1 – )% CI is: p z /2 p(1 - p) n PHStat tool is available under Confidence Intervals option 9 8/14/2007 Confidence Intervals and Sample Size CI for the mean, known Sample size needed for half-width of at most E is n (z /2)2( 2)/E2 CI for a proportion Sample size needed for half-width of at most E is ( z / 2 ) 2 (1 ) n E2 Use p as an estimate of or 0.5 for the most conservative estimate PHStat Tool: Sample Size Determination PHStat menu > Sample Size > Determination for the Mean or Determination for the Proportion Enter s, E, and confidence level Check Finite Population Correction box if appropriate Confidence Intervals for Population Total A 100(1 – )% CI is: N x tn-1, /2 N s n N N n 1 PHStat tool is available under Confidence Intervals option 10 8/14/2007 Confidence Intervals for Differences Between Means Population 1 Population 2 Mean 1 2 Standard deviation 1 2 Point estimate Sample size x1 x2 n1 n2 Point estimate for the difference in means, 1 – 2, is given by x1 - x2 Independent Samples With Unequal Variances A 100(1 – )% CI is: x1 - x2 (t df* = s12 s 22 n1 n2 ( s12 / n1 ) 2 n1 1 /2, df*) s12 s 22 n1 n2 2 ( s 22 / n 2 ) 2 n2 Fractional values rounded down 1 Example In the Accounting Professionals.xls worksheet, find a 95 percent confidence interval for the difference in years of service between males and females. 11 8/14/2007 Calculations s1 = 4.39 and n1 = 14 (females), s2 = 8.39 and n2 = 13 (males) df* = 17.81, so use 17 as the degrees of freedom Independent Samples With Equal Variances A 100(1 – )% CI is: x1 - x2 (t sp ( n1 1) s12 n1 1 n1 1 n2 1) s 22 (n2 n2 /2, n1 + n2 – 2) s p 2 where sp is a common “pooled” standard deviation. Must assume the variances of the two populations are equal. Example: Accounting Professionals 12 8/14/2007 Paired Samples A 100(1 – )% CI is: D (tn-1, /2) sD/ n Di = difference for each pair of observations D = average of differences n ( Di sD 2 D) PHStat tool available in the Confidence Intervals menu i 1 n 1 Example Pile Foundation.xls A 95% CI for the average difference between the actual and estimated pile lengths is Differences Between Proportions A 100(1 – )% CI is: p1 p2 z p1 (1 /2 p1 ) p 2 (1 n1 p2 ) n2 Applies when nipi and ni(1 – pi) are greater than 5 13 8/14/2007 Example In the Accounting Professionals.xls worksheet, the proportion of females having a CPA is 8/14 = 0.57, while the proportion of males having a CPA is 6/13 = 0.46. A 95 percent confidence interval for the difference in proportions between females and males is Sampling Distribution of s The sample standard deviation, s, is a point estimate for the population standard deviation, The sampling distribution of s has a chisquare ( 2) distribution with n-1 df See Table A.3 CHIDIST(x, deg_freedom) returns probability to the right of x CHIINV(probability, deg_freedom) returns the value of x for a specified right-tail probability Confidence Intervals for the Variance A 100(1 – )% CI is: ( n 1) s 2 ( n 1) s 2 , 2 2 n 1, / 2 n 1,1 /2 Note the difference in the denominators! 14 8/14/2007 PHStat Tool: Confidence Intervals for Variance - Dialog PHStat menu > Confidence Intervals > Estimate for the Population Variance Enter sample size, standard deviation, and confidence level PHStat Tool: Confidence Intervals for Variance - Results Time Series Data Confidence intervals only make sense for stationary time series data 15 8/14/2007 Summary and Conclusions As the confidence level (1 - ) increases, the width of the confidence interval also increases. As the sample size increases, the width of the confidence interval decreases. Probability Intervals A 100(1 – )% probability interval for a random variable X is any interval [a,b] such that P(a X b) = 1 – Do not confuse a confidence interval with a probability interval; confidence intervals are probability intervals for sampling distributions, not for the distribution of the random variable. 16