Chapter 8 Sampling Distributions and Estimation Sampling Variation • Sample statistic – Sampling Variation • a random variable whose value depends on which population items happen to be included in the random sample. sample . Estimators and Sampling Distributions Sample Mean and the Central Limit Theorem Confidence Interval for a Mean (µ (µ ) with Known σ Confidence Interval for a Mean (µ (µ ) with Unknown σ • Depending on the sample size, size, the sample statistic could either represent the population well or differ greatly from the population. Confidence Interval for a Proportion (π (π) Sample Size Determination for a Mean Sample Size Determination for a Proportion C.I. for the Difference of Two Means µ 1 -µ2 (Optional) C.I. for the Difference of Two Proportions π1 -π2 (Optional) • This sampling variation can easily be illustrated. Confidence Interval for a Population Variance σ2 (Optional) McGraw- Hill/Irwin Sampling Variation Sampling Variation • Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants. Dot plot of eight sample means • The sample means ( xi ) tend to be close to the population mean (µ (µ = 520.78). McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Dot plot of eight samples of size n = 5 McGraw- Hill/Irwin Estimators and Sampling Distributions • Some Terminology • • • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Estimators and Sampling Distributions • Examples of Estimators Estimator – • a statistic derived from a sample to infer the value of a population parameter parameter.. Estimate – • the value of the estimator in a particular sample. Population parameters are represented by Greek letters and the corresponding statistic by Roman letters. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 1 Estimators and Sampling Distributions • Sampling Distributions • • • Bias The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. An estimator is a random variable since samples vary. ^ • Sampling error = θ – θ McGraw- Hill/Irwin Estimators and Sampling Distributions • Bias is the difference between the expected value of the estimator and the true parameter. ^ • Bias = E( θ ) – θ • An estimator is unbiased ^if E( θ ) = θ • On average, average, an unbiased estimator neither overstates nor understates the true parameter. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Estimators and Sampling Distributions © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Estimators and Sampling Distributions • Bias • Sampling error is random whereas bias is systematic systematic.. • An unbiased estimator avoids systematic error. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Estimators and Sampling Distributions • Efficiency • • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Estimators and Sampling Distributions • Consistency Efficiency refers to the variance of the estimator’ estimator’s sampling distribution. A more efficient estimator has smaller variance. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • A consistent estimator converges toward the parameter being estimated as the sample size increases. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 2 Sample Mean and the Central Limit Theorem • The sample mean is an unbiased estimator of µ, therefore, E( X ) = E( X) = µ Sample Mean and the Central Limit Theorem • If the population is exactly normal, then the sample mean follows a normal distribution. • The standard error of the mean is the standard deviation of the sampling error σ of x : σx = n McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem • For example, the average price, µ, of a 5 GB MP3 player is $80.00 with a standard deviation, σ, equal to $10.00. What will be the mean and standard error from a sample of 20 players? E( X ) = E( X) = µ = $80.00 McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem • Central Limit Theorem (CLT) for a Mean • σ n = 10 = $2.236 20 • If the distribution of prices for these players is a normal distribution, then the sampling distribution on x is N(80.00, 2.236). σx = © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • • McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem If a random sample of size n is drawn from a population with mean µ and standard deviation σ, the distribution of the sample mean x approaches a normal distribution with mean µ and standard deviation σ x = σ/ n as the sample size increase. If the population is normal, the distribution of the sample mean is normal regardless of sample size. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Mean and the Central Limit Theorem • Symmetric Population: Uniform Distribution • • • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Rule of thumb: to obtain a normal distribution for the sample mean, n > 30. A much smaller n will suffice if the population is symmetric symmetric.. For example, consider a uniform population U(500, 1000). © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 3 Sample Mean and the Central Limit Theorem • Symmetric Population: Uniform Distribution • The central limit theorem predicts that samples drawn from this population will have a mean of 1000 and the standard error of the mean of: Predicted S.E. for n=1 σx = σ/ n = 288.7/ 1 = 288.7 n=2 n=4 = 288.7/ 2 = 204.1 = 288.7/ 4 = 144.3 n = 16 = 288.7/ 16 = 72.2 McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Mean and the Central Limit Theorem • Histograms of Sample Means from Uniform Population McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem • Histograms of Sample Means from Uniform Population McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Mean and the Central Limit Theorem • Skewed Population: Waiting Time • Consider a strongly skewed population for waiting times at airport security screening with µ = 2.983 and σ = 2.451 McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem • Skewed Population: Waiting Time • The CLT predicts that samples drawn from this population will have a mean of 2.983 minutes and standard error of the mean: Predicted S.E. for n=1 σx = σ/ n = 2.451/ 1 = 2.451 n=2 n=4 = 2.451/ 2 = 1.733 = 2.451/ 4 = 1.255 n = 16 = 2.451/ 16 = 0.613 McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Mean and the Central Limit Theorem • Histograms of Sample Means from Skewed Population McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 4 Sample Mean and the Central Limit Theorem • Histograms of Sample Means from Skewed Population Sample Mean and the Central Limit Theorem • Range of Sample Means • The CLT permits a range or interval within which the sample means are expected to fall. Where z is from the σ µ+z standard normal table. n • If we know µ and σ, the range of sample means for samples of size n are predicted to be: 90% Interval σ µ + 1.645 n McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Sample Mean and the Central Limit Theorem • Illustration: GMAT Scores • • For samples of size n = 5 applicants, within what range would GMAT means be expected to fall? The parameters are µ = 520.78 and σ = 86.8. The predicted range for 95% of the sample means is: µ + 1.960 σ n • The standard error declines as n increases, but at a decreasing rate. σ Make the interval µ + z n small by increasing n. The distribution of sample means collapses at the true population mean µ as n increases. McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Known σ • • • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Known σ • What is a Confidence Interval? • The confidence interval for µ with known σ is: • What is a Confidence Interval? • © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • Sample Size and Standard Error = 520.78 + 1.960 86.8 5 © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 99% Interval µ + 2.576 σ n Sample Mean and the Central Limit Theorem = 520.78 + 76.08 McGraw- Hill/Irwin 95% Interval σ µ + 1.960 n A sample mean x is a point estimate of the population mean µ. A confidence interval for the mean is a range µ lower < µ < µ upper The confidence level is the probability that the confidence interval contains the true population mean. The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 5 Confidence Interval for a Mean (µ (µ) with Known σ • • Choosing a Confidence Level A higher confidence level leads to a wider confidence interval. • Greater confidence implies loss of precision. • 95% confidence is most often used. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Known σ • Interpretation • A confidence interval either does or does not contain µ. • The confidence level quantifies the risk . • Out of 100 confidence intervals, approximately 95% would contain µ, while approximately 5% would not contain µ. McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Known σ • Is σ Ever Known? Confidence Interval for a Mean (µ (µ) with Unknown σ • Student Student’’s t Distribution • Yes, but not very often. • In quality control applications with ongoing manufacturing processes, assume σ stays the same over time. • In this case, confidence intervals are used to construct control charts to track the mean of a process over time. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • x +t s n • The confidence interval for µ (unknown σ) is x - t s x +t s n <µ< McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ • Student Student’’s t Distribution Use the Student Student’’ s t distribution instead of the normal distribution when the population is normal but the standard deviation σ is unknown and the sample size is small. n © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Unknown σ • Student Student’’s t Distribution • t distributions are symmetric and shaped like the standard normal distribution. • The t distribution is dependent on the size of the sample. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 6 Confidence Interval for a Mean (µ (µ) with Unknown σ • Degrees of Freedom • Degrees of Freedom • Degrees of Freedom ( d.f d.f.) .) is a parameter based on the sample size that is used to determine the value of the t statistic. • Degrees of freedom tell how many observations are used to calculate σ, less the number of intermediate estimates used in the calculation. ν=n-1 McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • As n increases, the t distribution approaches the shape of the normal distribution. • For a given confidence level, t is always larger than z , so a confidence interval based on t is always wider than if z were used. McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ • Comparison of z and t Confidence Interval for a Mean (µ (µ) with Unknown σ • Comparison of z and t • For very small samples, t-values differ substantially from the normal. • As degrees of freedom increase, the tvalues approach the normal z -values. • For example, for n = 31, the degrees of freedom are: ν = 31 – 1 = 30 • What would the t-value be for a 90% confidence interval? McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. For ν = 30, the corresponding z -value is 1.645. McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ • Example GMAT Scores Again • Here are the GMAT scores from 20 applicants to an MBA program: McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Unknown σ • Example GMAT Scores Again • Construct a 90% confidence interval for the mean GMAT score of all MBA applicants. x = 510 s = 73.77 • Since σ is unknown, use the Student’ Student ’s t for the confidence interval with ν = 20 – 1 = 19 d.f d.f.. • First find t0.90 from Appendix D. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 7 Confidence Interval for a Mean (µ (µ) with Unknown σ Confidence Interval for a Mean (µ (µ) with Unknown σ • Example GMAT Scores Again • The 90% confidence interval is: x-t 513 – 1.729 s n <µ< x +t s n 73.77 73.77 20 < µ < 513 + 1.729 20 513 – 28.52 < µ < 513 + 28.52 • We are 90% certain that the true mean GMAT score is within the interval 481.48 < µ < 538.52. McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ • Confidence Interval Width • Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. • To obtain a narrower interval and more precision - increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence) McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Unknown σ • A “Good Good”” Sample • Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs . McGraw- Hill/Irwin Confidence Interval for a Mean (µ (µ) with Unknown σ • A “Good Good”” Sample © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Mean (µ (µ) with Unknown σ • Using Appendix D • An examination of the samples shows that sample 5 has an outlier. • The outlier is a warning that the resulting confidence interval possibly could not be trusted. • In this case, a larger sample size is needed. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • Beyond ν = 50, Appendix D shows ν in steps of 5 or 10. • If the table does not give the exact degrees of freedom, use the t-value for the next lower ν. • This is a conservative procedure since it causes the interval to be slightly wider. • For d.f d.f.. above 150, use the z -value. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 8 Confidence Interval for a Mean (µ (µ) with Unknown σ • Using Excel Confidence Interval for a Mean (µ (µ) with Unknown σ • Using MegaStat • Use Excel’ Excel’s function =TINV(probability =TINV(probability , d.f d.f.) .) to obtain a twotwo-tailed value of t. Here, “probability ” is 1 minus the confidence level. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • MegaStat give you a choice of z or t and does all calculations for you. McGraw- Hill/Irwin Confidence Interval for a Proportion (π ) • A proportion is a mean of data whose only value is 0 or 1. • The Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x /n approaches a normal distribution with mean π and standard deviation σp = • McGraw- Hill/Irwin Confidence Interval for a Proportion (π ) • Illustration: Internet Hotel Reservations • • p = x /n is a consistent estimator of π. McGraw- Hill/Irwin Confidence Interval for a Proportion (π ) • Illustration: Internet Hotel Reservations • Here are five random samples of n = 20. Each p is a point estimate of π. • Notice the sampling variation in the value of p. McGraw- Hill/Irwin Management of the PanPan- Asian Hotel System tracks the percent of hotel reservations made over the Internet. The binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the Internet After data was collected, it was determined that the proportion of Internet reservations is π = .20. • π(1 (1--π) n © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Applying the CLT • The distribution of a sample proportion p = x /n is symmetric if π = .50 and regardless of π, approaches symmetry as n increases. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 9 Confidence Interval for a Proportion (π ) • Applying the CLT • • When is it Safe to Assume Normality? As n increases, the statistic p = x /n more closely resembles a continuous random variable. As n increases, the distribution becomes more symmetric and bell shaped. As n increases, the range of the sample proportion p = x /n narrows. The sampling variation can be reduced by increasing the sample size n. • • • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Standard Error of the Proportion • The standard error of the proportion σ p depends on π, as well as n. • It is largest when π is near .50 and smaller when π is near 0 or 1. McGraw- Hill/Irwin Confidence Interval for a Proportion (π ) • Rule of Thumb: The sample proportion p = x /n may be assumed to be normal if both nπ > 10 and n(1 (1-- π) > 10. Sample size to assume normality: McGraw- Hill/Irwin Confidence Interval for a Proportion (π ) • Standard Error of the Proportion • © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Standard Error of the Proportion • Enlarging n reduces the standard error σp but at a diminishing rate. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. The formula for the standard error is symmetric. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Confidence Interval for π • The confidence interval for π is π+z π(1 (1--π) n Where z is based on the desired confidence. • Since π is unknown, the confidence interval for p = x / n (assuming a large sample) is p+z McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin p(1 (1-- p) n © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 10 Confidence Interval for a Proportion (π ) • Confidence Interval for π Confidence Interval for a Proportion (π ) • Example Auditing • z can be chosen for any confidence level. For example, • A sample of 75 retail inin-store purchases showed that 24 were paid in cash. What is p? p = x /n = 24/75 = .32 • Is p normally distributed? np = (75)(.32) = 24 n(1 (1-- p) = (75)(.88) = 51 Both are > 10, so we may conclude normality. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Example Auditing • p(1 (1-- p) n .32(1-.32) .32(175 = .32 + 1.96 = .32 + .106 .214 < π < .426 • We are 95% confident that this interval contains the true population proportion. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Narrowing the Interval The 95% confidence interval for the proportion of retail inin-store purchases that are paid in cash is: p+z McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Using Excel and MegaStat • To find a confidence interval for a proportion in Excel, use (for example) • The width of the confidence interval for π depends on - the sample size - the confidence level - the sample proportion p • To obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence level McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for a Proportion (π ) • Using Excel and MegaStat • In MegaStat, enter p and n to obtain the confidence interval for a proportion. =0.15-NORMSINV(.95)*SQRT(0.15*(1 =0.15NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) =0.15+NORMSINV(.95)*SQRT(0.15*(1=0.15+NORMSINV(.95)*SQRT(0.15*(10.15)/200) McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • MegaStat always assumes normality. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 11 Confidence Interval for a Proportion (π ) • Using Excel and MegaStat • Confidence Interval for a Proportion (π ) • Polls and Margin of Error • If the sample is small, the distribution of p may not be well approximated by the normal. Confidence limits around p can be constructed by using the binomial distribution. • McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • • Each reduction in the margin of error requires a disproportionately larger sample size. McGraw- Hill/Irwin Sample Size Determination for a Mean • Sample Size to Estimate µ • © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Size Determination for a Mean • How to Estimate σ ? To estimate a population mean with a precision of + E (allowable error), you would need a sample of size n = zσ E In polls and surveys, the confidence interval width when π = .5 is called the margin of error. error. Below are some margins of error for 95% confidence interval assuming π = .50. • Method 1: Take a Preliminary Sample Take a small preliminary sample and use the sample s in place of σ in the sample size formula. 2 • Method 2: Assume Uniform Population Estimate rough upper and lower limits a and b and set σ = [(b [(b -a)/12]½ . McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. McGraw- Hill/Irwin Sample Size Determination for a Mean • How to Estimate σ ? • Sample Size Determination for a Mean • Using MegaStat Method 3: Assume Normal Population Estimate rough upper and lower limits a and b and set σ = (b (b-a)/4. This assumes normality with most of the data with µ + 2σ so the range is 4σ 4σ. • Method 4: Poisson Arrivals In the special case when µ is a Poisson arrival rate, then σ = µ McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • There is a sample size calculator in MegaStat. The Preview button lets you change the setup and see results immediately. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 12 Sample Size Determination for a Mean Sample Size Determination for a Proportion • To estimate a population proportion with a precision of + E (allowable error), you would need a sample of size • Caution 1: Units of Measure • When estimating a mean, the allowable error E is expressed in the same units as X and σ. • Caution 2: Using z n= • Using z in the sample size formula for a mean is not conservative. • Caution 3: Larger n is Better • The sample size formulas for a mean tend to underestimate the required sample size. These formulas are only minimum guidelines. McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. • Method 1: Take a Preliminary Sample Take a small preliminary sample and use the sample p in place of π in the sample size formula. Method 2: Use a Prior Sample or Historical Data How often are such samples available? π might be different enough to make it a questionable assumption. Method 3: Assume that π = .50 This conservative method ensures the desired precision. However, the sample may end up being larger than necessary. • • McGraw- Hill/Irwin π(1 (1-- π) McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sample Size Determination for a Proportion • Caution 1: Units of Measure • For a proportion, E is always between 0 and 1. For example, example, a 2% error is E = 0.02. • Caution 2: Finite Population • For a finite population, to ensure that the sample size never exceeds the population size, use the following adjustment: n' = nN n + (N (N- 1) © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for the Difference of Two Means, µ1 – µ 2 • If the confidence interval for the difference of two means includes zero, we could conclude that there is no significant difference in means. • The procedure for constructing a confidence interval for µ1 – µ 2 depends on our assumption about the unknown variances. • Assuming equal variances: (x 1 – x 2) + t ( n1 – 1) 1)ss 12 + (n (n2 – 2) 2)ss 22 n1 + n2 - 2 1 +1 n1 n2 with ν = (n (n1 – 1) + (n (n2 – 1) degrees of freedom McGraw- Hill/Irwin 2 • Since π is a number between 0 and 1, the allowable error E is also between 0 and 1. Sample Size Determination for a Proportion • How to Estimate π? z E © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval for the Difference of Two Means, µ1 – µ2 • Assuming unequal variances: (x 1 – x 2) + t with ν' = s 12 s 22 + n1 n2 [s 12/n1 + s 22/n2]2 (Welch’’s formula for (Welch ( s 12/n1) 2 + (s (s 22/n2) 2 degrees of freedom) n1 – 1 n2 – 1 Or you can use a conservative quick rule for the degrees of freedom: ν* = min (n (n1 – 1, n2 – 1). McGraw- Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 13 Confidence Interval for the Difference of Two Proportions, π 1 – π2 • If both samples are large (i.e., np > 5 and n(1 (1--p) > 5, then a confidence interval for the difference of two sample proportions is given by ( p1 – p2) + z McGraw- Hill/Irwin p1(1 - p1) + p2(1 - p2) n1 n2 © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 14