Objectives

Objectives 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) p  Statistical confidence (CIS gives a good explanation of a 95% CI) p  Confidence intervals p  Choosing the sample size p  t distributions p  One-sample t confidence interval for a population mean p  How confidence intervals behave Adapted from authors’ slides © 2012 W.H. Freeman and Company Overview of Inference p  Sample ≠ population, and sample mean x ≠ population mean µ. But we do not know the value of µ, and if we want to make any conclusions about µ then we have to use x to do so. p  Methods for drawing conclusions about a population from sample € data are called statistical inference. p  There are two main types of inference: p  € §  Confidence Intervals - estimating the value of a population parameter, and §  Tests of Significance - assessing evidence for a claim (hypothesis) about a population. Inference is appropriate when data are produced by either §  a random sample or §  a randomized experiment. Introducing con4idence intervals p  It is very unlikely that the sample mean based on a sample will ever equal the true mean. Our aim is to construct an interval around the sample mean which is `likely’ to contain the mean. This is called a confidence interval. p  p  In the first lecture we considered a Gallop poll for the proportion of the electorate that would vote for Obama. Gallup predicted that the Obama vote would be in the interval [45%,51%] with 95% confidence. p  p  q  The Obama vote turned out to be 50.5%, so the interval did capture the true proportion. You may be asking yourself how do we understand 95%, since 50.5% lies in this interval, there does not appear to be any uncertainty in it. In the next few slides, our objective is to understand how a confidence interval is constructed and how to understand it. Review: properties of the sample mean The sample mean x is a unique number for any particular sample. If you had obtained a different sample (by chance) you almost certainly would have had a different value for your sample mean. In fact, € you could get many different values for the sample mean, and virtually none of them would actually equal the true population mean, µ. Because the sampling distribution of x is narrower than the population distribution, by a factor of √n. The the estimates n Sample means, n subjects x€ x tend to be closer to the population €σ € µ than individual parameter n Population, x individual subjects observations are. σ µ If the population is normally distributed N(µ,σ), the sampling distribution is N(µ,σ/√n), p  Using the empirical distribution, since the sample mean is close to normal, 95% of the time it will be within 2 standard errors of the mean, that is if I had a hundred sample means, then about 95 times the sample mean lies in the interval [µ – 2×σ/√n, µ + 2×σ/√n]. p  Now we make a small correction to the empirical rule. It is not 2 standard deviations of the mean, but 1.96 standard deviations from the mean. To see why, look up 1.96 in the z-tables. p  But the mean is unknown, so our objective is to locate the true mean based on the sample mean. p  To do this we turn the story around, if the sample mean lies in the interval [µ –1.96×σ/√n, µ+1.96×σ/√n], this is the same as saying the mean µ lies in the interval [sample mean –1.96×σ/√n, sample mean +1.96×σ/√n]. q  Thus 95% of the time, the true mean (that we want to estimate) will be in the interval [sample mean –1.96×σ/√n, sample mean +1.96×σ/√n]. This is an interval which is centered about the sample mean. In the next slide we illustrate what we mean by 95%. If multiple samples were possible 95% of all sample means will σ be within 1.96 (roughly 2) n standard deviations (1.96 ×σ/√n) of the € population parameter µ. This implies that the population parameter µ will be within 1.96 standard deviations from the sample average x , in 95% of all samples. This reasoning is the essence of statistical inference. € Red dot: mean value of individual sample Mean height – sample size one p  Human heights are approximately a normal distribution. The standard deviation of a human height is 3.8 inches. p  Our objective is to construct a confidence interval for the mean height. p  We start with a very crude estimator and use just one height to estimate the mean, this is the same as using a sample of size one. In this case the standard error is 3.8/√1 = 3.8. p  Each of you construct a 95% confidence interval for the mean height using your height as the sample: [your height – 1.96×3.8, your height + 1.96×3.8] [your height – 7.44, your height + 7.44].For example, in my case the interval is [63 – 7.44, 63+7.44] = [55.56,70.44]. q  Each of you do this too. In fact it is known that the mean height of a person is 67 inches. Does you interval contain the mean? The proportion of intervals that contain the mean should be approx 95%. Mean height – sample size two p  In the previous experiment the we used just one individual to estimate the mean height. The `cost’ of using one individual was that the confidence interval was very wide. p  We repeat the experiment, but this time each of you buddy up with your neighbour and calculate the average height between the two of you (ie. (your height plus neighbour’s height)/2). You and your buddy for a sample of size two. p  We know that this the sample mean based on a sample of size n=2. has the standard error 3.8/√2 = 2.68. p  Each group construct the interval [sample mean – 1.96×2.68, sample mean + 1.96 × 2.68] = [sample mean ±5.26]. p  The mean height is 67 inches, does your interval contain the mean? p  What proportion of the intervals in the class contain the mean? Observations p  We see that the length of confidence interval when using just one person in the sample is 2×7.44 = 14.88, this is quite long, and does not really allow us to pinpoint the mean. p  Whereas the length of interval using two people to calculate the sample mean is 10.52, this is quite a big reduction in length! p  If ten people were used to calculate the sample mean the corresponding interval length would be 14.88/√10 = 4.7. p  We see that for any given interval either the mean is in this interval or not. The 95% comes into play when we look at the proportion of intervals that contain the mean. p  In reality: p  p  p  We do not know the true mean µ, so will never know whether the interval contained the mean or not. We only observe one sample of size n, and thus have one CI. One confidence interval contain information about the mean. This is why we say with 95% confidence the mean lies in it. Implications We do not need to (and cannot, anyway) take a lot of random samples to “rebuild” the sampling distribution and find µ at its center. n All we need is one SRS of Sample size n and we can rely on n Population the properties of the sampling distribution to infer reasonable values for the population mean µ. µ Multiple samples revisited With 95% confidence, we can say that µ should be within 1.96 σ standard deviations (1.96×σ/√n) from our sample mean x . p  € In 95% of all possible samples of this size n, µ will indeed fall in our confidence interval. € p  In only 5% of samples will x be farther from µ. p  “Confidence” = the proportion of possible samples that give us a € correct conclusion. n Calculation practice p  You want to rent an unfurnished one-bedroom apartment in Dallas. The mean monthly rent for 10 randomly sampled apartments is 980 dollars. Assume that monthly rents follow a normal distribution with standard deviation 280 dollars. Construct a 95% confidence interval for the mean monthly rent of a one-bedroom apartment. p  p  p  The standard error for the sample mean is 280/√10 = 88.54. Thus the 95% CI is [980 ±1.96×88.54] = [806,1153]. With 95% confidence we believe the mean price of one-bedroom apartments in Dallas lies in this interval. Does the above confidence interval mean that 95% of all rents should lie in this interval? p  No, it is the interval for the mean. If we want the interval where 95% of all rents should lie it is [980 ±1.96(88.54+280)] = [257,1720]. You do not have to understand the calculation, but you will notice this interval is much wider. The reason is that it must capture 95% of all rents, which are extremely varied. The previous CI was just capturing the mean rent, based on the sample mean, which is much less varied. Calculation practice p  Hypokalemia is diagnosed when the blood potassium level is below 3.5mEq/dl. The potassium in a blood sample varies from sample to sample and follows a normal distribution with standard deviation 0.2. p  A patient ‘s potassium is measured taken over 4 days. The sample mean level over these 4 days is 3.7. Construct a 95% confidence interval for the mean potassium and discuss whether the patient is likely to be diagnosed with Hypokalemia. p  The standard error for the sample mean is 0.2/√2 = 0.1. Thus the 95% confidence interval for the mean potassium level is [3.7±1.96×0.1] = [3.504,3.894]. This means with 95% confidence we believe the mean lies in this interval. q  Since 3.5 or less does not lie in this interval, with 95% confidence I can say that the patient does not have this condition. Con4idence interval misunderstandings p  Suppose 400 alumni were asked to rate the University of Okoboji the university counseling services on a scale 1 to 10. The sample mean was found to be 8.6 and it is known that the standard deviation is σ=2. Ima Bitlost has done the analysis, but has made some mistakes. p  Ima computes the 95% CI interval for the mean satisfaction score as [8.6±1.96×2]. What is her mistake? p  Ima has not taken into account that the sample mean has a much smaller standard deviation (standard error) than the population. The standard error is 2/√400 = 0.1. Thus the true CI is [8.6±1.96×0.1] = [8.4,8.796]. p  After correcting her mistake, she states that “I am 95% confident that the sample mean lies in the interval [8.4,8.796]” What is wrong with her statement? p  This is a meaningless statement, for sure the sample mean lies in this interval! It is the population mean that we are 95% confident lies there. p  She quickly realizes her mistake and instead states “the probability that the mean lies in the interval [8.4,8.796] is 95%”, what misinterpretation is she making now? p  p  By 95%, we mean that if we repeated the experiment many times over about 95% of the time the intervals will contain the mean. For any given interval the mean is either in there or not. There is no probability attached to it. To overcome, this issue we say that with we have 95% confidence in the mean lies in this interval. Finally, in her defense for using the normal distribution to determine the confidence coefficient (1.96) she says “Because the sample size is quite large, the population of alumni ratings will be close to normal”. Explain to Ima her misunderstanding. p  The distribution of the population always stays the same, regardless of the sample size (in this case, it is clear that variables that take integer values between 1 to 10 cannot be normal). However, the sample mean does get closer to normal as the sample size grow. With a sample size of 400, the distribution of the sample mean will be very close to normal. Different levels of con4idence p  There is no need to restrict ourselves to 95% confidence intervals. p  The level of confidence we use really depends on how much confidence we want. For example, you would expect a 99% confidence interval is more likely to contain the mean than a 95% confidence interval. p  To construct a 99% confidence interval we use exactly the same prescription as used to construct a 95% confidence interval, the only thing that changes is 1.96 goes to 2.57 (if you look up -2.57 in the ztables you will see this corresponds to 0.5%, so 99% of the time the sample mean will lie within 2.57 standard errors from the mean). p  A 99% CI for the mean one-bedroom apartment price is [980±2.56×88.54]. Length of interval is 2×2.57×88.54 q  A 90% CI for the mean one-bedroom apartment price is [980±1.64×88.54]. Length of interval is 2×2.56×88.54 What does a 100% confidence interval look like? In a 100% CI we are sure to find the mean, but this interval is so wide it is not informative. Sample size and length of the CI p  Let us return to the apartment example. We recall that for the confidence interval for the mean price is [980 ±1.96×88.54] = [806,1153]. The length of this interval is 2×1.96×88.54 = 347. p  What happens to the length of interval if I increase the sample size? p  Suppose I take a SRS of 100 apartments in Dallas, the sample mean based on this sample is 1000, what will the CI be? p  p  What we observe is: p  p  p  The standard error is 280/√100 = 28 (much smaller than when the sample size is 10), and the CI is [980 ±1.96×28]. The length of this interval is 2×1.96×28 =109. The length of the interval does not depend on the sample mean, this is just the centralizing factor. It only depends on 1.96, the standard deviation and the sample size. The length of the interval gets smaller as the sample size grow. This suggests that if we want the interval to have a certain level of precision, we can choose the sample size accordingly. Margin of Error p  Margin of error is the lingo used for the plus and minus part in the confidence interval. p  That is the confidence interval is [sample mean±1.96×σ/√n], the margin of error is 1.96×σ/√n. q  q  For example, in the previous example the margin of error for the CI based on 10 apartments is 1.96×88.54. The margin of error for the CI based on 100 apartments is 1.96×28. q  The margin of error in some sense, is a measure of accuracy. The smaller the margin error the more precisely we can pinpoint the true mean. q  Suppose we want the margin or error to be equal to some value, then we can find the sample size such that we obtain that margin of error. Solve for n the equation MoE = 1.96×σ/√n (the Margin of Error and the standard deviation σ are given). See the next slide for an example. Calculation practice: What sample size for a given margin of error? Annual coffee sales: A marketing firm plans to study the annual sales in coffee shops. They want to estimate the mean annual sales to within $0.2 million, this time with 98% confidence. How many coffee shops should they sample to obtain a margin of error of at most $0.2 million with a confidence level of 98%? From a previous study they guess σ ≈ $1.03 million. To solve the formula we need to find the correct z-score that will give a 98% CI. Looking up the tables we see The z* = 2.326. Thus we solve the equation: 2 2 ⎛ z * σ ⎞ ⎛ 2.326 ×1.03 ⎞ 2 n ≈ ⎜ ≈ ⎟ ⎜ ⎟ = 12.0 = 144. 0.2 ⎝ m ⎠ ⎝ ⎠ From the calculation, we see they need 144 observations such that the margin of error is 0.2million. Calculation practice p  In a study of bone turn over in young women with a medical condition, serum TRAP was measured in 31 subjects. The sample mean was 13.2 units per liter. Assume the standard deviation is known to be 6.5U/l. Find the 80% CI for the mean serum level. p  Look up 10% in the z-tables, this gives 1.28. The standard error for the sample mean is 6.5/√31 = 1.16. Altogether this gives the CI [13.2±1.16×1.28] =[11.7,14.6]. This means with we believe with 80% confidence the mean level of serum for women with this medical condition should lie in this interval. By choosing such a low level of confidence our interval is quite narrow, but our confidence in this interval is relatively low. q  How large a sample size should we choose such that the 80% CI for the mean has the margin of error 1U/l. q  This means solving 1.28×6.5/√n = 1, n=(1.28×6.5/1)2 =70. A confidence interval for µ can be expressed two ways. p  x ± m. m is called the margin of error Egg carton example: 64.17g ± 2.83 g. We say “We conclude that µ is within 2.83g of 64.17g, with 95% confidence.” p  Two endpoints of an interval: ( x − m) to ( x+ m). Egg carton example: 61.34g to 67.00g. We say “We conclude that µ is between 61.43g and 67.00g, with 95% confidence.” Again, the confidence level C is the proportion of possible samples for € € which the conclusion is correct . That is, it is the proportion of possible samples for which the interval contains µ. (C usually is given in %.) But there is an important issue to deal with. §  We do not know the value of σ any more than we know the value of µ. When σ is unknown In the case the we can estimate the standard deviation from the data. The sample standard deviation s provides an estimate of the population standard deviation σ. But when the sample size is small, the sample contains only a few individuals. Then s is a mediocre estimate of σ. p  When the sample size is large, the sample is likely to contain elements representative of the whole population. Then s is a good estimate of σ. p  The data is unlikely to contain values in the tails and, s is likely to underestimate σ. p  Population distribution Large sample Small sample The z-‐transform with estimated standard deviation p  Simply replacing the true standard deviation with the estimated standard deviation can have severe consequences on the confidence interval if we do not correct for it. p  To see why consider the z-transforms of the sample mean with known and estimated standard deviations: p  (sample mean - µ)/(σ/√n) p  (sample mean - µ)/(s/√n) p  In the first case, z-transform will be a standard normal. In the second case the estimated standard deviation adds extra variability into the `system’. In particular, because s can be small then σ, this means the z-transform can be larger and take higher values then we would expect for a standard normal. p  In the next few slides we show that when we estimate the standard deviation the z-transform is no longer a standard normal, but the so called t-distribution. How brewers saved statistics p  Just over 100 years ago, W.S. Gosset was a biometrician who worked for Guiness Brewery in Dublin, Ireland. p  Gosset realized that his inferences with small sample data seemed to be incorrect too often – his true confidence level was less than it was stated to be! p  p  He worked out the proper method that took into account substituting s for σ. But he had to publish under a pseudonym: Student. p  Gosset’s theory is based on the distribution of the quantity t= p  x −µ s n . This looks like the z-score for x , except that s replaces σ in the denominator. Student’s t distributions Suppose that an SRS of size n is drawn from an Normal(µ,σ) population. p  x −µ z = When σ is known, the sampling distribution for σ n is Normal(0,1). p  When σ is estimated from the sample standard deviation s, the x −µ t = sampling distribution for will be very close to normal if the s n sample size n is large. This is because for large n, s will be a very reliable estimator of σ. q  However, in the case that n is not so large, the variability in s will have an impact on the distribution. q  It is clear that the impact it has depends on the sample size. Student’s t distributions p  When σ is estimated from the sample standard deviation s, the sampling distribution for t = x −µ s The sample distribution of t = n will depend on the sample size. x −µ s n is a t distribution with n − 1 degrees of freedom. p  The degrees of freedom (df) is a measure of how well s estimates σ. The larger the degrees of freedom, the better σ is estimated. q  This means we need a new set of tables! When n is very large, s is a very good estimate of σ, and the corresponding t distributions are very close to the normal distribution. The t distributions become wider (thicker tailed) for smaller sample sizes, reflecting that s can be smaller than σ, so the corresponding ttransform is more likely to take extreme values than the z-transform. Impact on con4idence intervals Suppose we want to construct the C% confidence interval for the mean. The standard deviation is unknown, so as well as estimating the mean we also estimate the standard deviation from the sample. Practical use of t: t* t* is related to the chosen confidence level C. p  C C is the area under Student’s t curve between −t* and t*. p  The confidence interval is thus: x ± t* s n −t* t* Example: For an 80% confidence level C, 80% of Student’s t curve’s area is contained in the interval. Con=idence level and the margin of error The confidence level C determines the value of t* (in table D). The margin of error also depends on t*. §  Higher confidence C implies a larger m = t* × s n margin of error m (thus less precision in our estimates). §  A lower confidence level C produces a smaller margin of error m (thus C better precision in our estimates). §  We find t* in the line of Table D for df = n−1 and confidence level C. −t* t* Table D When σ is unknown, we use a t distribution with “n−1” degrees of freedom (df). Table D shows the z-values and t-values corresponding to landmark P-values/ confidence levels. t= When the sample is very large, we use the normal distribution and the standardized z-value. x −µ s n p  Focus first on 2.5%. For each n, the 2.5% corresponds to the area on the left and right tails of the t-distribution with n degrees of freedom. Remember a distribution gives the chance/likelihood of certain outcomes. p  Recall that for a normal distribution, the point where we get 2.5% on the left and the right of the tails of the distribution is 1.96. p  If we go down the table. we see that as the sample size, n, increases the value corresponding to 2.5, goes from 12.71 (for n=1) to a number that is very close to 1.96 for extremely large n. p  This means for small n the variability on the standard deviation s means that the chance of the t-transform being extreme is relatively large. p  However, as n grows, the estimator of the standard deviation improves, and the t-transform gets closer to a normal distribution. p  You will observe the same is true for other percentages. Take a look at 5% and 0.5% and look down the table. Calculation practice (red wine 1) It has been suggested that drinking red wine in moderation may protect against heart attacks. This is because red wind contains polyphenols which act on blood cholesterol. To see if moderate red wine consumption increases the average blood level of polyphenols, a group of nine randomly selected healthy men were assigned to drink half a bottle of red wine daily for two weeks. The percent change in their blood polyphenol levels are presented here: 0.7 3.5 Sample average 4.0 4.9 5.5 7.0 x = 5.50 Sample standard deviation s = 2.517 Degrees of freedom df = n − 1 = 8 7.4 8.1 8.4 We will encounter two problems when doing the analysis. The first is that the sample size is not huge so we have to hope that the sample mean is close to normal. The second is the standard deviation is unknown and has to be estimated from the data. q  What is the 95% confidence interval for the average percent change? p  First, we determine what t* is. The degrees of freedom are df = n − 1 = 8 and C = 95%. From Table D we get t* = 2.306. (…) p  The margin of error m is: m = t* × s/√n = 2.306 × 2.517/√9 ≈ 1.93. So the 95% confidence interval is 5.50 ± 1.93, or 3.57 to 7.43. p  We can say “With 95% confidence, the mean of percent increase is between 3.57% and 7.43%.” p  What if we want a 99% confidence interval instead? p  For C = 99% and df = 8, we find t* = 3.355. Thus m = 3.355 × 2.517/ √9 ≈ 2.81. p  Now, with 99% confidence, we only can conclude the mean is between 2.69 and 8.31. (A big price to pay for the extra confidence.) Calculation practice (red wine 2) Let us return to the same study, but this time we increase the sample size to 15 men. The data is now: 0.7,3.5,4,4.9,5.5,7,7.4,8.1,8.4, 3.2,0.8,4.3,-0.2,-0.6,7.5 The sample mean in this case is 4.3 and the sample standard deviation is 3.06. Since the sample size has increased, it is likely that the sample standard deviation is a more reliable estimator of the true standard deviation. The number of degrees of freedom is 14. Just as in the previous example we can construct a 95% confidence interval but now we use 14df instead of 8dfs. More calculation practice p  Let us return to the example of prices of apartments in Dallas. 10 apartments are randomly sampled. The sample mean and the sample standard deviation based on this sample is 980 dollars and 250 dollars (both are estimators based on a sample of size ten). Construct a 95% confidence interval for the mean: p  The standard error is 250/√10 = 79. p  Looking up the t-tables at 2.5% and 9 degrees of freedom gives 2.262. p  q  The 95% confidence interval for the mean is [980 ± 2.262×79]=[801,1159]. Suppose we want to know whether the price of apartments have increased since last year, where the mean price was 850 dollars. q  Based on this interval we see that 850 dollars and greater is contained in this interval. This means the mean could be 850 dollars or higher. There given the sample it is unclear whether the mean price of apartments has increased since last year or not. Example: comparing z and t-‐values p  We want to calculate a 99% CI for the mean weight of a newborn calf. To do this upload the calf data into Statcrunch. p  Go to Stat, from here you have two options. If we treat the standard deviation of calve weights as known (not random), then we can use the z-statistic, else we need to use the t-statistic. p  Suppose we choose t-statistic option -> one-sample -> with data -> then choose the variable of weights at birth (wt 0). To get the 99% CI we need to select 99% on the second pages of the options. We get the 99% interval (using a t-distribution with 43 degrees of freedom) [90.05,96.37] pounds. This means we 99% confidence we believe the mean weight of new born calves lies in this interval. p  To see how well the normal distribution works, we do the same, but choose the z-statistic option. This gives the 99% confidence interval [90.19,96.23]. Notice, that it is slightly narrower, because it does not take into account the underestimation of the standard deviation. More calculation practice p  Let us return to the M&M data. Suppose we want to calculate a 99% confidence interval for the mean number of M&Ms in plain, peanut butter and peanut M&Ms. These can be calculated using the summary statistics output: Summary statistics for Total: Group by: Type Type n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3 M 84 17.297619 8.259753 2.8739786 0.3135768 18 14 7 21 17 19 P 40 9.814744 3.1328492 0.49534693 8 15 6 21 7 8 PB 46 10.913043 3.325604 1.8236238 0.26887867 11 10 8 18 10 11 8.675 Using this output we can calculate the confidence intervals for the mean number of M&Ms in each type. Do this. Statcrunch will also give the CIs p  Go to Stats -> t-statistics -> one-sample -> with data -> select the column you want to analyse (choose the Group by if you want it grouped), on the next page select confidence interval and the level you want it at. Sample mean Std. err DF L Limit U limit 17.2 0.31 83 16.4 18.12 8.6 0.49 39 7.33 10.01 10.9 0.268 45 10.18 11.63 Looking at the intervals, do you think it that the mean number of M&Ms in a plain and peanut bag could be the same. What about the mean number in peanut and peanut butter? Later on we shall make a formal test on these questions. Calculation practice: coffee shop sales A marketing firm randomly samples 45 coffee shops and determines their annual sales. The sample has an average of $2.67 million and a standard deviation of $1.03 million. What can we say with 90% confidence about the mean annual sales for the population of all coffee shops? p  The degrees of freedom is 45−1 = 44. p  For 90% confidence, we find t* = 1.680. p  The margin of error is 1.680×1.03/√45 = 0.258 p  So the interval for the true mean is 2.67 ± 0.26. x ± t* s n “We conclude that the mean annual sales of all coffee shops is between $2.41 million and $2.93 million, with 90% confidence.” p  Summary of con4idence interval for µ. p  The confidence interval for a population mean µ is x ± t* s p  p  n. t* is obtained from Student’s t distribution using n−1 degrees of freedom. (Table D in the textbook.) t* is the value such that the confidence level C is the area between –t* and t*. p  Confidence is the proportion of samples that lead to a correct conclusion (for a specific method of inference). p  p  p  p  The investigator chooses the confidence level C. Tradeoff: more confidence means bigger margin of error, wider intervals. The degrees of freedom is associated with s, the estimate for σ. * The margin of error t s / larger samples are better. n also depends on the sample size: Sample size and experimental design An investigator may need a certain margin of error m (e.g., in a marketing survey, in a drug trial, etc.). So plan ahead what sample size to use to achieve that margin of error. You will have to guess the value of σ, perhaps from historical data, and you will not know the degrees of freedom at first. But you can do a rough calculation. 2 ⎛ z * σ ⎞ m ≈ z* ⇔ n ≈ ⎜ ⎟ . n ⎝ m ⎠ σ This is done in the planning stages of the study. It is not an inference or conclusion and there are no data yet. Remember, too, that there typically are costs and constraints associated with large samples. Economy and feasibility are factors that will tend to keep sample sizes smaller. Interpretation of con=idence, again p  The confidence level C is the proportion of all possible random samples (of size n) that will give results leading to a correct conclusion, for a specific method. p  In other words, if many random samples were obtained and confidence intervals were constructed from their data with C = 95% then 95% of the intervals would contain the true parameter value. p  In the same way, if an investigator always uses C = 95% then 95% of the confidence intervals he constructs will contain the parameter value being estimated. p  But he never knows which ones do! p  Changing the method (such as changing the value of t*) will change the confidence level. p  Once computed, any individual confidence interval either will or will not contain the true population parameter value. It is not random. p  It is not correct to say C is the probability that the true value falls in the particular interval you have computed. * x ± t ×s / n Cautions about using p  This formula is only for inference about µ, the population mean. Different formulas are used for inference about other parameters. p  The data must be a simple random sample from the population. p  The formula is not quite correct for other sampling designs. (But see a statistician to get the right inference method.) p  Confidence intervals based on t* are not resistant to outliers. p  If n is small and the population is not normal, the true confidence level could be smaller than C. (Usually n ≥ 30 suffices unless the data are highly skewed.) p  This inference cannot rescue sampling bias, badly produced data or computational errors.

Objectives

Related documents

Products

Support

Objectives

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib