Estimating with Confidence 10-1 Estimating an unknown parameter Ex: The admissions director at a University proposes using the IQ scores of current students as a marketing tool. The university provides him with enough $ to administer IQ tests to 50 students. So, he gives the IQ test to an SRS of 50 of the university’s 5000 freshman. The mean IQ score for the sample is = 112. What can the director say about the mean score μ of the population of all 5000 freshman? Is the mean IQ score μ of all the freshman exactly 112? Probably not. but… The law of large numbers tells us that the sample mean from a large SRS will be close to the unknown population mean μ. Because = 112, we guess that μ is “somewhere around 112”. How close to 112 is μ likely to be? To answer this question, we ask another: How would the sample mean vary if we took many samples of 50 freshman from this same population? Recall… From last chapter, the means of all samples of size 50 would distribute normally around the true population mean μ with a standard deviation of σ/√50 Remember from our 68-95-99.7 rule that 95% of all samples of size 50 will have a mean that falls within 2 standard deviations of μ. Suppose we know σ Suppose we know σ is 15 (this is unrealistic, but just go with it). That means Sx = 15/√50 = 2.1 So, in 95% of all samples of size 50, the mean IQ score ( ) will deviate from the true μby 4.2 (up or down…that’s 2 standard deviations above or below). Here are all our samples… How confident are we? Statistical inference uses this fact about what would happen in MANY samples, to express our confidence in being able to capture the true μ in our own ONE sample. Conclusion: Our sample of 50 freshman gave = 112. The resulting interval is 112 +/- 4.2 (107.8, 116.2). We say that we are 95% confident that the unknown mean IQ μ for all Big City University freshman is between 107.8 and 116.2. Confidence Interval for Population Mean μ when σ is known That example was our first scenario for calculating a CI. The calculation depends on 3 important conditions: 1. SRS: the sample comes from a proper sample 2. Normality: The construction of the interval depends on the fact that the sampling distribution of sample means is approximately normal (which it will be, according to the CLT, , as long as our sample sizes are sufficiently large…30 is a usual cutoff) 3. Independence: To keep calculations reasonably accurate when we sample from a finite population, we should sample no more than 10% of the population (our rule of thumb) Different Confidence Levels and Critical Values We call our confidence level a C level. While a 95% confidence interval (or confidence level) is most typical, sometimes you are asked for a 99% or 90% interval. Note that for the 95% CI, we constructed it in the example by taking the Z score, (2 standard deviations) above and below the mean (Z = 1.96 to be precise). For 90%, a Z score corresponding to our ‘cutoff’ regions is +/- 1.645 For 99% it’s +/- 2.576 *This is our confidence interval for the estimate of the unknown μ as and our margin of error is Formula On the calc: If you have raw data, enter your sample data into L1. Press STAT, choose TESTS, and choose Z: interval. Input method is Data. If you have and σ and select ‘stats’ as your input method and type those in along with n (sample size) and c-level you want (such as .95 or .99). Then choose calculate and hit enter. You get the interval (lower and upper bound) and the sample mean. Margin of Error There is a tradeoff between margin of error and level of confidence. The margin of error gets smaller as Z* gets smaller (but this also lowers our confidence) MOE also gets smaller as σ gets smaller (this is hard in reality, but important conceptually). Think of σ and variability as ‘noise’- it’s easier to pin down the true μ when σ is small. MOE smaller when n gets larger. Because we take the square root of n we must take four times as many observations in order to cut the margin of error in half. Example and Steps to solving Suppose the manufacturer of video terminals wants to test screen tension. We know that when the process is operating properly, the σ = 43. Here are the tension readings from an SRS of 20 Screens: 269.5 297 269.6 283.3 304.8 280.4 233.5 257.4 317.5 327.4 264.7 310 307.7 343.3 328.1 342.6 338.8 340.1 374.6 336.1 Steps 1. Parameter- identify the population of interest and the parameter you want to draw conclusions about. The population here is “all video terminals”. We want to estimate μ, the mean tension for all these screens. 2. Conditions- choose the appropriate inference procedure. Verify the conditions for using it. Since we know σ , we should use one sample z interval. Now check requirements: 1. SRS (yes) 2. Normality: is the sampling distribution approximately normal? (Yes) The sample size is too small (n = 20) to use the central limit theorem (n>30 is our cutoff) so we look at a boxplot of the sample tension readings (calc). No outliers or strong skewness. The normal probability plot tells us that the sample data is approximately normally distributed. This data gives us no reason to doubt the normality of the population from which they came. Independence: Since we are sampling without replacement, we must assume that at least 200 video terminals (10)(20) were produced that day. See P. 631 for summary! Step 3: calculations – if conditions are met, carry out the CI inference procedure for 90% CI. Enter data in calc, = 306.3 306.3 + 1.645(43/√20) = 322.1 306.3 - 1.645(43/√20) = 290.5 Step 4: Interpretations: So, we are 90% confident the true μ tension lies between (290.5 and 322.1). Always state this part IN CONTEXT! If you wanted to change the confidence level (say to 99%), change your Z* (2.57) and you widen your interval Sample size for a desired margin of error Note- it’s the size of the sample that determines margin of error, the size of the pop does not influence the sample size we need (this is true as long as the population is much larger than the sample) What if we don’t know σ? We previously made the unrealistic assumption that we knew the value of σ. In practice, σ is usually unknown so the one sample z interval is rarely used in real life. So, we use our sample standard deviation Sx as an estimate for σ. But we must be punished/penalized for this! We divide it by n and so our estimated population standard deviation now changes depending on the size of our sample. We call this ‘estimated’ standard deviation the ‘standard error’. Because of this, we can’t use a normal “Z” distribution for our critical values…instead we use “t”. Critical T’s As our N gets bigger, the t distribution gets closer and closer to the normal Z distribution. The T distribution is based on degrees of freedom which is (n-1) instead of n. As our sample size gets bigger, n-1 has less impact as compared to n. Table C gives us critical values for T based on the degrees of freedom (n-1) – so does calc (calc is preferable). Formula So the only things that change when we don’t know our population standard deviation is our critical value is now a critical t (we can use the table or the calc…calc recommended) and the standard deviation we are using is: On calc, same as Z interval, just choose T interval Paired t procedures Comparative studies are more convincing than single-sample investigations. (matched pair design). We use these to compare treatments on 2 different subjects, or before-and-after observations on the same subject. Important distinction There are 2 types of studies we learned about earlier: Matched-Pairs design (which includes before-after studies on each individual in our sample, and comparisons between each individual of a pair of similar individuals that we split and assigned to 2 treatments), and comparative studies of 2 INDEPENDENT groups. When calculating the T-interval on a matched pairs design, we are interested in the DIFFERENCE between the 2 conditions (whether this is a before/after on one individual, or 2 similar individuals being compared). You will always have an equal number in both groups if you are doing matched pairs. For this you define L3 as L1-L2 and do a 1 sample T interval on L3. In a comparative independent samples design, the 2 samples are INDEPENDENTLY groups (and therefore may even have different numbers in each). They are not matched up in any way- this is what would be a 2sample T-interval based on L1 and L2. *For this chapter, 99% of examples will be of the first variety where you take the differences and do a one sample interval on L3. In later chapters we deal with situation 2 more, but it still helps to recognize the difference now. Example Caffeine dependence/depression. Population is all people dependent on caffeine. We want to estimate the mean difference diff = placebo - caffeine in depression patients 11 people tested and their scores on a depression test measured (placebo vs. caffeine) (P. 652) Calc- 2 sample t interval OR, define list 3 as L1 – L2 and do a 1 sample t interval on L3 Depression data Subject Caffeine Placebo 1 5 16 2 5 23 3 4 5 4 3 7 5 8 14 6 5 24 7 0 6 8 0 3 9 2 15 10 11 12 Robustness If outliers are present in the sample data, then the population may not be Normal. The t procedures are NOT robust against outliers because and s are not resistant to outliers. CI’s for proportions As always, inference is based on the sampling distribution of a statistic. Center: the mean is rho. We call the sample proportion (p-hat) is an unbiased estimator of the population proportion p. Spread: Standard deviation of p hat is √[ρ(1-ρ)/n] provided that the population is at least 10 times as large as the sample. Shape: If the sample size is large enough that both np and n(1-p) are at least 10, the distribution of p-hat is approximately normal. In reality, we don’t know the value of rho (if we did, we wouldn’t need to construct a CI for it!) So we cannot check whether (n)(rho) and n(1-rho) > 10. In large samples, p hat will be close to rho so we replace rho by p-hat in determining the values of (n)(rho) and n(1-rho) and so our Standard Error (estimated population proportion standard deviation) is Remember- P-hat (sample proportion) is the number of successes in your sample divided by total number of individuals in your sample Calculator- CI for a proportion Press STAT, choose TESTS and 1-propZint. Enter x (lets say 246), n(lets say 439) and C-level (.95). Calculate. Choosing a sample size When planning a study, we may want to actively choose a sample size that will allow us to estimate the parameter within a given margin of error. P* When calculating sample size for a specific margin of error, we often don’t know P-hat or Rho (we are running the study in the first place to find this out!) When you don’t have a ‘best guess estimate’ for your proportion of successes in the population, we make P* = .5(because that’s our most conservative estimate of probability for success: 50/50). obviously if you are given rho, or you know p hat, use that- it’s our best estimate Example: P* unknown A company wants to do a customer service survey where customers rate the service on a scale of 1 – 5 with 4 being satisfied and 5 being very satisfied. The President is interested in the percent of customers who rate them a 4 or a 5. She wants the estimate to be within 3% at a 95% confidence level. It’s too expensive/unreliable to try to question every customer, how many people should they survey? Example continued P* = .5 since we don’t know the true population proportion (rho) 1.96 [(√(.5)(.5)/n)] ≤ .03 Do some algebra… [(1.96)(.5)] /.03 ≤ √n n ≥ 1067.11 So we round up to 1068 participants. *News reports frequently describe the results of surveys with sample sizes between 1000 and 1500 and a margin of error about 3%. Summary See P. 679 for a good summary…