1 Section 10.1 Estimating with Confidence: Suppose I want to know how often teenagers go to the movies. Specifically, I want to know how many times per month a typical teenager (ages 13 through 17) goes to the movies. Suppose I take an SRS of 100 teenagers and calculate the __________ __________ to be x 2.1 . The sample mean is an __________ __________ of the unknown population mean _____, so I would estimate the population mean to be approximately 2.1. However, a different sample would have given a different sample mean, so I must consider the amount of __________ in the sampling model for _____. x ▪ The sampling model for ▪ The mean of the sampling model is _____. ▪ The __________ __________ of the sampling model is ______ assuming the population size is at least _____. is __________ __________. 2 2 Suppose we know that the population standard deviation is 0.5 . Then the __________ __________ for the sampling model is Then 95% of our samples will produce a __________ __________. x 0.5 .05 n 100 that is between __________ and Therefore in 95% of our samples, the interval between __________ and __________ will contain the parameter . The __________ __________ is 0.10. 2 For our sample of 100 teenagers, x 2.1 . Because the margin of error is 0.10, we are 95% confident that the true population mean lies somewhere in the interval __________, or [__________]. The interval [2.0, 2.2] is a 95% __________ __________ because we are 95% confident that the unknown lies between 2.0 and 2.2. Start with sample data. Compute an interval that has probability C of containing the true value of the parameter. This is called a ______________________________. How do we construct confidence intervals? Since the sampling model of the sample mean x is approximately __________, we can use normal calculations to construct confidence intervals. For a 95% confidence interval, we want the interval corresponding to the middle 95% of the __________ __________. For a 90% confidence interval, we want the interval corresponding to the middle 90% of the __________ __________. And so on… 3 If we are using the standard normal curve, we want to find the interval using __________. Suppose we want to find a 90% confidence interval for a standard normal curve. If the middle 90% lies within our interval, then the remaining 10% lies outside our interval. Because the curve is symmetric, there is 5% below the interval and 5% above the interval. Find the __________ with area 5% below and 5% above. These z-values are denoted __________. Because they come from the standard normal curve, they are centered at mean _____. ______ is called the ______________________________, with probability p lying to its right under the standard normal curve. To find p, we find the complement of C and divide it in half, or find 1 C . 2 For a 95% confidence interval, we want the z-values with upper p critical value ________. For a 99% confidence interval, we want the z-values with upper p critical value ________. Remember that z-values tell us how many __________ __________ we are above or below the mean. To construct a 95% confidence interval, we want to find the values __________ standard deviation below the mean and __________ standard deviations above the mean, or __________. Using our sample data, this is __________, assuming the population is at least _____. In general, to construct a level C confidence interval using our sample data, we want to find The estimate for _________ is _________. The margin of error is __________. Note that the margin of error is a positive number. It is not an interval. 4 We would like high confidence and a small margin of error. A higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. Therefore we want a _______level of confidence. A smaller margin of error allows us to get closer to the true value of the parameter, so we want a __________ margin of error. So how do we reduce the margin of error? ▪ __________ the confidence level (by decreasing the value of z*) ▪ __________ the standard deviation ▪ __________ the sample size. To cut the margin of error in half, increase the sample size by __________ times the previous size. You can have __________ confidence and a __________ margin of error if you choose the right sample size. To determine the sample size n that will yield a confidence interval for a population mean with a specified margin of error m, set the expression for the margin of error to be less than or equal to m and solve for n. CAUTION!! These methods only apply to certain situations. In order to construct a level C confidence interval using the formula x z * , for example, the data must be an __________ and n we must know the __________ standard deviation. Also, we want to eliminate (if possible) any __________. The margin of error only covers random sampling errors. Things like undercoverage, nonresponse, and poor sampling designs can cause additional errors. 5 Section 10.2 Tests of Significance: Example 10.7 Diet colas use artificial sweeteners to avoid sugar. Colas with artificial sweeteners gradually lose sweetness over time. Manufacturers therefore test new colas for loss of sweetness before marketing them. Trained tasters sip the cola along with drinks of standard sweetness and score the cola on a “sweetness score” of 1 to 10. The cola is then stored for a period of time, then each taster scores the stored cola. This is a matched pairs experiment. The reported data is the __________ in tasters’ scores. The bigger the difference, the bigger the loss in sweetness. 2.0 2.2 0.4 -1.3 0.7 1.2 2.0 1.1 -0.4 2.3 The sample mean __________ indicates a small loss of sweetness. Consider that a different sample of tasters would have resulted in different scores, and that some variation in scores is expected due to chance. Does the data provide good evidence that the cola lost sweetness in storage? To answer that question, we will perform a ____________________. 1. Identify the __________. The parameter of interest is , the mean loss in sweetness. 2. State the ____________________. There is no effect or change in the population. This is the statement we are trying to find evidence against. The cola does not lose sweetness. __________ State the ____________________. There is an effect or change in the population. This is the statement we are trying to find evidence for. The cola does lose sweetness. __________ 3. Calculate a __________ to estimate the __________. Is the value of the statistic far from the value of the parameter? If so, __________ the null hypothesis. If not, ________________ the null hypothesis. 4. Calculate the __________. If it is small, your result is ____________________. 6 Suppose the individual tasters’ scores vary according to a normal distribution with mean and 1. We want to test the null hypothesis so we assume __________. x is approximately normal with mean __________ 1 0.316 n 10 So the sampling model for standard deviation and N(0, 0.316) -0.9 -0.6 -0.3 0 0.3 0.6 0.9 Our sample mean, x , was 1.02. Assuming that the null hypothesis is true, what is the probability of getting a result at least that large? Normalcdf ( 1.02, 1E99, 0, 0.316 ) = 0.0006 The probability to the right of x is called the __________. The __________ is 0.0006, meaning that we would only expect to get this result in 6 out of 10,000 samples. This is very unlikely, so we will __________ the ______ hypothesis in favor of the __________ hypothesis and conclude that the cola actually did lose sweetness. If the P-value is small we say that our result is ____________________. The smaller the Pvalue, the stronger the evidence provided by the data. How small is small enough? Compare the P-value to the value of the __________ __________ ______This value is usually predetermined. If the __________ is as small or smaller than _____, we say that the data are __________ __________ at level _____. Hypotheses can be ____________________ or ____________________. ▪ H a : 0 is a ______-sided hypothesis because we are only looking at one direction, greater than. ▪ H a : 0 is a ______-sided hypothesis because we are looking at two directions, greater than and less than. 7 Section 10.4 Inference as Design: If we use the results of a significance test to make a decision, then we either reject the null hypothesis in favor of the alternative hypothesis, or we accept the null hypothesis. This is called ____________________. We hope that our decision will be correct, but it is possible that we make the wrong decision. There are two ways to make a wrong decision: We can reject the null hypothesis when in fact it is true. This is called a ____________________. We can accept (fail to reject) the null hypothesis when in fact it is false. This is called a ____________________. Truth about the population Decision based on sample H0 is __________ H0 is __________ _______H0 Type I Error p= Correct Decision p=1– _______H0 Correct Decision Type II Error p= We are interested in knowing the probability of making a Type I Error and the probability of making a Type II Error. A Type I Error occurs if we ________the null hypothesis when it is in fact _______. When do we reject the null hypothesis? When we assume that it is true and find that the statistic of interest falls ______the ____________________. The probability that the statistic falls in the rejection region is the area of the shaded region, or _____. One-Sided Test Therefore the probability of a Type I Error is equal to the significance level ______ of a fixed level test. The probability that the test will reject the null hypothesis H 0 when in fact H0 is true is _____. A Type II Error occurs if we ________(or fail to reject) the null hypothesis when it is in fact _______. When do we accept (or fail to reject) the null hypothesis? When we assume 8 that it is true and find that the statistic of interest falls __________ the ____________________. However, the probability that the statistic falls outside the rejection region is NOT the area of the unshaded region. Think about it… If the null hypothesis is in fact false, then the picture is NOT CORRECT… it is off center. Two-Sided Test lower critical value upper critical value To calculate the probability of a Type II Error, we must find the probability that the statistic falls outside the rejection region (the unshaded area) given that the mean is some other specified value. Two-Sided Test lower critical value upper critical value The probability of a Type II Error tells us the probability of __________ the null hypothesis when it is actually __________. The complement of this would be the probability of not accepting (in other words rejecting) the null hypothesis when it is actually false. To calculate the probability of rejecting the null hypothesis when it is actually false, compute 1 – P(Type II Error). This is called the __________ of a significance test.