Course schedule-Week 12 4/14-4/17 Pre-class assignment for 4/14 (1)Confidence intervals handout from SOCS (from J.Utts: Seeing through Statistics) Read the handout on confidence intervals pages 336-341 and answer the following questions. QUESTIONS : (1) On pg. 336 (first page) in line 2 we read : “statisticians have been able to quantify the amount by which those sample values are likely to differ from….the population” Explain what is this quote referring to! (Hint: How much sample statistics differ from the population parameter was discussed in class for means and proportions and in both cases we were able to quantify how much sample statistics varied from sample to sample. This variation was given in terms of the standard deviation of the sampling distribution.) (2) The first key to understanding confidence intervals is on the bottom of pg 340: “In 95 % of all samples the sample proportion will fall within 2 standard deviation from the population proportion.” Explain why is this not exactly true and why 1.96 standard deviation instead of 2 should be used !(Hint: Use the fact that the sampling distribution model for proportions is normal!) (3) The other key is on top of pg. 341 make sure that you understand that this statement is a direct consequence of the previous one: “In 95% of all samples the true population proportion will fall within 2 standard deviation of the sample proportion.” This means that if we would take 100 samples, say form a population with p=.7, and created the 95% confidence intervals for each of the 100 sample proportions, about 95 out of the 100 created intervals would be expected to contain the population proportion. Let us go to DATA DESK and do exactly that. (2) Confidence intervals simulation I. (Start working on it but it will be collected on Thursday 4/17 only.) Take 100 samples of size 100 from a population where the population proportion p=.7. (1) Calculate the sample proportions. (2) Plot a data histogram for the 100 sample proportions. (3) What kind of model could be used in this case for the distribution of samples proportions? What would be the mean and standard deviation of the sampling proportion distribution? (4) Questions of interest: What is the chance that this p value is closer to p=.7 than 2 (we say 2 for simplicity but we really mean 1.96) standard deviations? What is the chance that p is further away from p=.7 than 2 standard deviations? To investigate these questions: (a) Select all 100 samples and then go to : Calculate/Estimate. Specify zintervals and type in the standard deviation you calculated in part (3) for sigma. Keep Confidence 95%. Click: Show results. (In this step you created 2 standard deviation wide intervals around each p value.) (b) How could you estimate the chance that p is further away from p=.7 than 2 standard deviations using the intervals you created in the previous step? (Hint: You could find out how many of these intervals contain p=.7. Whenever the interval contains p=.7, the corresponding p which lies at the center of this interval must have been closer to p then 2 standard deviations.) © List the p values for which the 95% confidence interval did not contain the actual p=.7 value? Look at your data histogram to see where were these values in your distribution of sample proportions? (d) Estimate the chance that p is further away from p=.7 than 2 standard deviations using part ©! THINK: Every time p is closer to p than 2 standard deviation, the 2 standard deviation wide interval around p in fact contains the p value. We can use the mathematical model for the distribution of all sample proportions now to estimate the chances for p to be closer to p than 2 standard deviations. According to our sampling distribution model, since it is normal and centered at p, this happens about 95% of all times. Note: This simulation has no other practical value but to help you see how confidence intervals work. The main purpose of confidence intervals is that they predict with some certainty where the actual population proportion is. If we did not know that p=.7 the100 confidence intervals would still behave the same way and each would have a 95% chance of containing the unknown p value, meaning that in estimating p with a confidence interval we would be right about 95 out of the 100 times. We would not know which 5 times we were wrong though so we could never say anything for certain unfortunately. Being wrong with a 5% chance however is an acceptable risk for most practical problems. GROUP HOMEWORK due 4/24: From the Jessica Utts text on SOCS: 5, 6, 7, 8, 9, 10, 11, 13, 14 ActivStats: Lesson 17-MCS 1, 2, 5, 6 ;MRA-4, MRB-3, TR7-2, 3, YMM-1 Lesson18- BBT-3, MBS-2, MCS-5, MRA-4, TR7-3, TR7-4, WEN-2, 3, YMM-1 EXTRA CREDIT PROBLEM: SOCS text Problem 15 with special attention given to part © 4/14 Confidence intervals Pre-class assignment for 4/17: (0) Simulation assigned previously is collected! (1) 17.1.1 What is a confidence interval? Fill in the blank! An interval that has a high …………. of containing the population ………………………… . (2) 17.1.2 Fill in the blank! Confidence intervals are based on the ……………………. distribution, thus on ………………………………………. of the data. (3) 17.1.3 What are the two goals we have to balance when we create a confidence interval? What is the most common level of confidence used? Write down the 95% confidence interval for the average height for British women. (4) 17.2.2 Record the four steps in creating a confidence interval! (5) 17.2.3 The lesson to be learned form this activity is that we cannot do inferential statistics without first analyzing the data!! If the conditions are not met in your sample, the technique will not give a realistic, useful answer. What was the problem with creating a single confidence interval for the mean of the entire sample in the example? What did we have to do before creating the confidence interval? How did we end up solving the problem? (6) 17.3.1 Answer the questions below in the window! 17.3.2 Record the formula for the confidence interval! What is z* ? What are the other components? 17.3.3 Do quiz! 17.3.4 How can we increase precision without sacrificing certainty? Confidence intervals-II. Terminology I. Example: Assume that the scores of students on the ACT are normally distributed with unknown mean but the standard deviation is known to be 5.9. Suppose a group of 50 students take the ACT and score 20.2 on average with a standard deviation of 3.7. Define the following terms and then calculate them for the given example. (1) sampling error (2) standard deviation for the sample mean distribution (n=50) (3a) margin of error at 95% confidence level when a confidence interval for the population mean is created using the known population standard deviation (3b) Create a 95% confidence interval for the population mean using part (a)! (4) standard error (5a) margin of error at 95% confidence level when a confidence interval for the population mean is created using the statistics in the sample (5b) Create a 95% confidence interval for the population mean using part (a)! II. Fill in the missing words! If the government would require confidence intervals to carry warning labels like those on drugs, the labels would be very long indeed. Let me show you a small part of the list! 1. The sample must be a ………………………… There is no method of inference for data haphazardly collected that might be biased. Remember this fact when reading the results of polls or other sample surveys. Watch out for bias! 2. Because the mean is not a …………………………. measure of center ……………….. can have a large effect on the sample means and consequently on the confidence interval. Outliers should be …………… if their removal is justified. If you cannot remove them use a different method of inference. 3. The method is based on the assumption that the sample mean distribution is ……… or that it has the t-distribution. But neither of these is true for small samples drawn from a non-normal population. If the sample size is ….. and the population is not ………………. DO NOT USE this method. 4. You must know σ in order to be able to use the z values for small samples. This unrealistic requirement renders the z-intervals of little use in practice and we use t-intervals instead. Simple rule if you have a computer: USE t-intervals whenever the population ………………………….. is estimated by the ……………………………. . 5. You should be all right with the t-test as long as the population has a distribution that is unimodal and …………… . 6. Confidence intervals at any level have a …….chance to be ……….. even when all the conditions are fully met. The conditions however are never fully met in practice. Deciding when a statistical procedure should be used requires judgment assisted by exploratory data analysis. The difference between mathematics and statistics (that was highlighted in my lecture on the first day) can be stated thus: mathematical theorems are true; statistical methods are effective when used with skill. 4/17 Confidence intervals II. Pre-class assignment for 4/21 Lesson 18 from ActivStats (1) 18.1.1 What do we call the standard error? Record the summary at the end! (2) Answers to study question in 18.1.2 (3) Find out who was William Gosset under t-distribution on top of pg.18.2 (under green star) (4)18.2.4What are the assumptions for Student's t? (5) 18.3.1 Learn to calculate a t-interval for MPG. What is the calculated confidence interval? (6) 18.3.3Answer these questions : Which t-distribution is equal to the normal distribution? What are the two reasons for confidence intervals getting wider as the sample size decreases? Which reason is more important? ATTENTION! 4/17 is the last chance to ask me about the group homework problems!!