Chapter 6 Introduction to Formal Statistical Inference Inferential Statistics Two areas of statistics: Descriptive Statistics Inferential Statistics Some Terminology Quantities of a population are called parameters and are typically denoted by Greek letters Quantities obtained from a sample are called statistics and are typically denoted by Roman letters µ is a parameter, x is a statistic Example As a means of trying to estimate the mean GPA of Bucknell students, a sample of 100 students yielded an average of 3.12. The parameter of interest is the population mean GPA (µ) The statistic is the sample mean GPA of 3.12. ( ) x Parameters For every parameter of interest, there are typically a number of statistics that can be used for estimation purposes If one is interested in the population mean, the sample mean or sample median can be used If one is interested in the population variance, the sample variance, sample range, or sample IQR can be used Sampling Distributions The sampling distribution for a sample statistic is the probability distribution of the statistic Sampling distributions are just like the probability distributions discussed earlier (i.e., sampling distributions have a mean and variance, usually dependent upon the sample size) Central Limit Theorem If X1, X2, …Xn are iid random variables (with mean µ and variance σ2), then for large n, the variable X is approximately normally distributed. That is, approximate probabilities can be calculated using the normal distribution with mean µ and variance σ2/n. Z value for sample mean x z VarX / n x EX Properties of Sampling Distributions A sample statistic used to estimate a population parameter is called a point estimate (or point estimator) There are 2 properties that are desired for point estimators: The mean of the sampling distribution of the point estimator is equal to the population parameter that it is intending to estimate (i.e., the point estimator is an unbiased estimator) The point estimator has minimum variance among all other point estimators Sampling Distribution of Mean x is always and unbiased point estimator of µ There are 2 things that are always true about the sampling distribution of x : x x n Applications If the population is Normally distributed, then X n ~ N x , x Example The weights of the jars of baby food are Normally distributed with a mean of 137.2 g and a standard deviation of 1.6 g. What is the probability that if one jar was selected at random, its weight would be more than 140 grams? Example What is the probability that if nine jars were selected at random, their average weight would be more than 140 grams? What if it’s not Normal? If we don’t know the shape or the distribution or if we know that it is not Normal, we can apply the Central Limit Theorem to find out something about the distribution. For sufficiently large samples, the sampling distribution of will be approximately Normal. Typically, a sample size of 25 or 30 is “sufficiently large” The necessary sample size depends on the skewness of the distribution of the population The larger the sample size, the better the normality Example A soft-drink bottler purchases glass bottles from a vendor. The bottles are required to have an internal pressure strength of at least 150 pounds per square inch (psi). A prospective bottle vendor claims that its production process yields bottles with a mean internal strength of 157 psi and a standard deviation of 3 psi. The bottler strikes an agreement with the vendor that permits the bottler to sample from the vendor’s production process to verify the vendor’s claim. The bottler randomly selects 40 bottles from the last 10,000 produced, measures the internal pressure of each and fins that the mean pressure for the sample to be 1.3 psi below the process mean cited by the vendor. Assuming the vendor’s claim to be true, what is the probability of obtaining a sample mean this far or farther below the process mean? What does your answer suggest about the validity of the vendor’s claim? Estimation Continued Goals of Confidence Interval Estimation Identify an interval of values likely to contain an unknown parameter Quantify how likely the interval is to contain the correct value Confidence Interval A confidence interval for a parameter is a data-based interval of numbers thought likely to contain the parameter possessing a stated probability-based confidence or reliability A Large-n Confidence Interval for µ Involving σ point estimate ± margin of error Gallup Poll www.gallup.com Results are based on telephone interviews with 825 likely voters, aged 18 and older, conducted Oct. 10-12, 2008. For results based on the total sample of likely voters, one can say with 95% confidence that the maximum margin of sampling error is ±4 percentage points. Interviews are conducted with respondents on land-line telephones (for respondents with a land-line telephone) and cellular phones (for respondents who are cell-phone only). In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls. Back to Baby Food Jars Suppose we want to estimate the actual mean weight of all baby food jars produced at the plant How can we do this? What do we know? Given that σ = 1.6 grams Suppose we take a sample of 50 jars and finds that their average weight is 142.7 grams. Formula Point estimate ± margin of error xz n Z’s for Confidence Intervals Desired Confidence z 80% 1.28 90% 1.645 95% 1.96 98% 2.33 99% 2.58