CALCULATION OF CONFIDENCE INTERVALS FOR POINT ESTIMATES AND CHANGE 1 Introduction and Assumptions We do sample surveys to make estimates about a population of interest. We can say how good our estimates are by applying the principles of the central limit theorem, which means that we can assume our estimate is the theoretical mean from many theoretical surveys. A confidence interval is a measure of uncertainty around the estimate from our sample survey, telling us the range of values within which the true (population) mean lies with a given degree of confidence. In other words the confidence intervals show us the range within which 95% or 99% or 99.9% of sample means could be expected to lie if the survey was repeated. This helps us to decide whether a sample mean is reliable enough for our purposes. A 95% confidence interval is the most commonly used. In these examples, we will be focussing on calculating confidence intervals for percentages or proportions. Initially, we will assume that: The confidence intervals calculated are 95%. The data is collected through a simple random sample (SRS) and there are no design effects. The sample size is large (n>30) to allow the assumption of approximation to a normal distribution to be used. The populations we are dealing with are large enough to be assumed as infinite. Don’t worry if you don’t know what all these assumptions mean at the moment, as most will be explored later in this paper. 2 Calculating a confidence interval for a single percentage In order to calculate a confidence interval, we need to know the following information: The point estimate, which is the percentage or proportion estimated from our sample (or the sample mean); The standard error. This measures how far our estimate is from the mean estimate that would be obtained from many (theoretical) surveys. This is calculated as: s.e. = √ ((p(1-p))/n) where: p= the point estimate n=sample size The confidence interval (CI) is calculated as: CI = p ± (1.96 * s.e.) The value 1.96 specifies that this is a 95% Confidence Interval. To calculate other levels of confidence, please see section 6 below. EXAMPLE 1 A survey of 205 adults estimates that 43% have caring responsibilities. So, s.e. = √ ((0.43*0.57)/205) = 0.03547 CI= 0.43 ± (1.96*s.e.) = 0.43 ± 0.07 Therefore the 95% Confidence Interval for this estimate is (36%, 50%), i.e. we are 95% confident that the true percentage of adults with caring responsibilities for adults is between 36% and 50%. 3 Effect of changes in sample size Increasing the sample size will reduce confidence intervals and result in more precise estimates, but it must be remembered that diminishing returns will be realised from doing this; i.e. after a certain size of sample it is not really worth increasing the sample any more. This is illustrated in the graph below. 95% Confidence Intervals on an estimate of 50% for different sample sizes 10 9 8 95% CI (+/-) 7 6 5 4 3 2 1 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Sample Size So, increasing the sample size from 100 to 500 reduces the CIs from 9.8 to 4.3, whereas increasing the sample size further to 1000 only reduces the CIs to 3.1. Even when we examine doubling sample size, there is a clear tail-off in the benefits gained, as shown in the table below. Sample Size 100 200 400 800 1600 3200 6400 12800 25600 51200 95% CI on 50% Estimate 9.8 6.9 4.9 3.5 2.5 1.7 1.2 0.9 0.6 0.4 Sample sizes for many Government surveys need to be large to allow a minimum sample size for the smaller local authorities and other sub-groups of interest: the sample collected is not required for the Scotland level estimates but rather the disaggregation required. This is also the case when a sub-group of interest cannot be identified on the sampling frame. This means it is not possible to stratify by it, and it is not able be over-sampled at the time of drawing the sample. This may mean a large overall sample is required to get a sufficient number in a small sub-group. 4 Using confidence intervals to assess statistical significance and levels of change required An important use of confidence intervals in Government is to make a statement about whether there is a significant difference between two numbers. The question can be posed in two slightly different ways: Looking backward: Has there been a significant increase between the last two data points/is there a significant difference between two sub-groups? Looking forward: What level of change would be required between two data points/ subgroups to say there had been a significant change? Looking Backward Quite often in Government, the information we receive is simply the estimate and the relevant CIs. We need to be able to use this information as a rule of thumb to assess whether we think there has been a significant change. If the intervals do not overlap then there is a significant difference between two points, but if they do overlap it does not necessarily mean there is no significant difference. This is best illustrated with an example. EXAMPLE 2 Imagine this was the information we had been given: Year 2006 2007 Point Estimate a% b% CI ± 3 p.p. ± 4 p.p. p.p = percentage point Policy colleagues wish to know whether there had been a real change between 2006 and 2007. Firstly, we can see if the following statement is true: 1. The difference between a and b is greater than or equal to the sum of the magnitude of the intervals (in this case 7). If this is the case, there is a significant difference between the two points. So, for example if, a=34% and b=42% then b – a = 8 p.p. This is greater than 7, so we conclude that there is a significant difference between these points. If the first statement is not true, we can then look to see if the second statement is true: 2. The difference between a and b is smaller than the larger interval of the two points (in this case 4). If this is the case there is not a significant difference between the two points So, for example if, a=34% and then b – a = 3 p.p. b=37% This is less than 7, so the first statement is not true. This is less than 4, which means the second statement is true and so we conclude that there is not a significant difference between these points. If neither statement 1 or 2 is true, then we can see if the third statement is true 3. The difference between a and b is greater than or equal to the square root of the sum of the squares of the two magnitudes (in this case √(32+42) = √(9+16) = 5) If this is the case, there is a significant difference between the two points. So, for example if, a=34% and b=40% then b – a = 6 p.p. This is less than 7, so the first statement is not true. This is more than 4, so the second statement is not true. This is greater than 5, so we conclude that there is a significant difference between these points. If you have a case where none of these statements are true then there is not a significant difference between these points. EXAMPLE 3 The Scottish Household Survey collects information about smoking levels in adults. Suppose we wish to know whether there has been a change in smoking levels between 1999 and 2007. Year 1999 2007 Smoking Level 30.4% 24.7% 95% CI +/- 0.9 p.p. +/- 1.0 p.p. Is the difference between 1999 and 2007 greater than the sum of the confidence intervals? In this case, the difference is 4.4 percentage points. The sum of the confidence intervals is 1.9 percentage points. The answer to our question is Yes, so we conclude that there is a significant difference between 1999 and 2007. Suppose instead we want to know whether smoking levels in women and men are different. Smoking Level 95% CI Male 26.0% +/- 1.4 p.p. Female 24.0% +/- 1.3 p.p. Is the difference between men and women bigger than the sum of the confidence intervals? The difference is 2.0 percentage points. The sum of the confidence intervals is 2.7 percentage points. The answer to our question is No. Is the difference between men and women smaller than the largest confidence interval? The difference is 2.0 percentage points. The largest confidence interval is 1.4 percentage points. The answer to our question is No. So, we ask our final question: Is the difference between men and women greater than or equal to the square root of the sum of the squares of the two confidence intervals? The difference is 2.0 percentage points. We need to calculate: √(1.42 + 1.32) = 1.9 As 2.0 is larger than 1.9, the answer to our question is Yes. We therefore conclude that there is a significant difference between the smoking rates of men and women. Word of caution – multiple tests When carrying out any kind of hypothesis testing, as we are doing with the confidence intervals in these examples, we need to be aware of the probabilities of false positives and false negatives. These are known as Type I (accepting that there is a difference when in fact there is none, or false positive) and Type II (accepting there is no difference when in fact there is, or false negative) errors. The chances of finding a false positive are inflated greatly the more tests you do. Intuitively, this makes sense: if you do 20 different comparisons at the 5% level then we would expect that one of them will be a Type I error. So, be wary of (e.g.) comparing several different years in the same time series. Really, the test that should be applied here is an ANOVA (as you will usually be testing whether any one year is different from the others), or if you do want to know whether all the years are different from each other, then a Bonferroni adjustment must be applied. A Bonferroni adjustment is a simple method to ensure that a test is still significant to the level we originally intended i.e. a way to keep the chances of a Type I error for the comparisons as a whole at the 5% level. EXAMPLE 4 Suppose 4 samples are to be compared, and the maximum overall probability of a Type I error (or α) we would like is 0.05. Our 4 samples are in different years: 2003 2004 2005 2006 The total number of comparisons here is 6. Given our overall α, we need to calculate the α level for each of the 6 comparisons to ensure that the overall α is 0.05. The Bonferroni adjustment divides the overall α by the number of comparisons, so in this case: α for each comparison = α/6 = 0.05/6 = 0.0083. Therefore the CIs we would use to make these 6 comparisons, to ensure an overall Type I error rate of 0.05, are 99.17% confidence intervals. Or, roughly, to make 6 comparisons which keep the overall error rate at 0.05, 99% CIs have to be used for each of the 6 comparisons. Looking Forward This is a reasonably simple figure to work out if the data are to be collected in the same way (i.e. with the same design and sample size) and with the same sort of assumptions as they were before. If this is the case, and we have a point estimate plus the CI, we can answer the question: How much change in the next period would be considered a statistically significant change? This is best illustrated with an example. This will not give an absolutely precise result, but it is usually good enough for our purposes (i.e. the result will be the same to 1 decimal point). EXAMPLE 5 Basically we use the rules given above in the previous section. Say we have an estimate of 50% from a sample size of 600. The CI for this estimate is therefore ± 4 pp. To work out the change required we simply take the square root of the sum of the squares, assuming that the next time point will have a similar design: Change required = √ (16+16)= 4√2 Therefore we simply have to multiply the CI by √2 to get the change required. However, if we know that the design or sample size is likely to change in the next year, it is slightly more difficult to answer. However, we can recalculate what the CI would have been in our current data point if we made these other assumptions, and use this, along with our real CI, to calculate the level of change required. Clearly this requires us to have information about sample sizes and the design of the data. 5 Finite Population Correction The size of the population we are interested in does not normally affect confidence intervals. It means that a sample of 10,000 households in Scotland (representing a population of about 2.5 million) will have the same confidence intervals as a sample of 10,000 households in England. This means for many calculations, especially if we are using a dataset which represents households or adults in Scotland, the sample is such a small proportion of the population that the population can be treated as if it were infinite. It is the absolute size of the sample that is important. Only when the sample represents over 5% of the population does the assumption of an infinite population need to be scrapped. In this case, a finite population correction need to be applied. This is included in the calculation of the standard error, and will reduce s.e.s by the appropriate amount, depending on how large your sample is in relation to the population. The calculation is shown in the equation below. F = √ (N-n)/(N-1) where F= Finite Population correction N=Population size n= sample size This is then incorporated into the equation below to calculate the correct standard error: s.e. = √(F*(p(1-p))/n) You will see from the equation that F tends towards 1 the larger N becomes, hence why we do not have to worry about this when populations are large but sample sizes small in relation to N. EXAMPLE 6 Suppose there is a need to survey a sample of Scottish Government employees about their method of travel to work. There are 5,000 SG employees, and suppose a simple random sample of 2,000 is drawn. We discover that 40% of respondents travel to work by bus. If we were assuming an infinite population, the 95% confidence interval around the estimate would be +/- 1.8 percentage points. However, the finite population correction is: F = √ (5000-2000)/4999 = √3000/4999 = √0.6 = 0.77 Therefore, the 95% Confidence interval is +/-(0.77*1.8) = +/- 1.39 p.p. 6 Calculating Confidence intervals which are not 95% We have been concentrating on calculating 95% confidence intervals, as this the convention used. The z-score 1.96 is what specifies that this is a 95% confidence interval. Another z-score can be used to specify another level of confidence. Some z-scores are shown below. XX% Confidence Interval 80% 90% 96% 99% 99.9% Z-Score 1.28 1.65 1.96 2.58 3.29 7 Calculating Confidence Intervals when the data are not proportions This paper has concentrated on calculating confidence intervals for percentages /proportions, but clearly this can be generalised to incorporate variance measured from any data collection. s2 is used to represent the estimated variance here, and this can be incorporated into the equation for the s.e. as shown below: s.e. = √ (s2/n) In the case of a proportion, s2=p(1-p). 8 In the real world – how to incorporate survey design into confidence intervals We have assumed in this paper that the data we have come from a Simple Random Sample (SRS). However, in practice, sampling techniques like stratification, systematic random sampling and clustering are commonly used to either improve the spread of the sample or to save money. These are explained in more detail in the Sampling Section of the Methodology Glossary. http://www.scotland.gov.uk/Topics/Statistics/About/Methodology/sampling. Features of the design such as stratification and clustering can affect the standard errors that are used to calculate confidence intervals. These are known as complex standard errors, and are calculated through a number of iterative techniques. These can then be expressed as a design factor, which is a multiplier which states by how much the error is increased or decreased compared to a SRS design, given the design you have used. Please note that there is a different design factor for every level of every variable. What is usually given from surveys is an average design factor to use in calculation of confidence intervals, but remember that by definition it will either over or underestimate the effect that the design has on your particular estimate. So be careful to consider what you are estimating and how the design could impact on that. For example, the Scottish Household Survey had an average design factor of 1.2, but the design factor for accommodation type (which is more likely to be clustered in geographical areas, and therefore more effected by the clustering in the sample) will be larger. The design factor is more useful for adjusting standard errors. But the design effect is the square of the design factor, and tells you how much information you have gained or lost by using a complex survey rather than a simple random sample. A design effect of 2 means that you would need to have a survey that is twice the size of a simple random sample to get the same amount of information. Whereas a design effect of 0.5 means that you would gain the precision from a complex survey of only half the size of a simple random sample. Design effects of 2 are quite common, but those of 0.5 are rare. The design factor is incorporated into the CI equation as shown below: CI = p +/- D*(1.96 * s.e.) Where D = Design Factor If you would like to know more details about design effects and other issues related to analysing complex survey data, then visit the Practical Exemplars of Analysis of Surveys (PEAS) website, a useful resource created by Napier University: http://www2.napier.ac.uk/depts/fhls/peas/index.htm 10 Summary Equation The general equation for a confidence interval is: P +/- Z* F*D*√ (s2/n) Where: P = the sample mean estimate. In many cases, from national government sources, this will be the proportion of people with a certain characteristic. Z = the z-score that we use, assuming normality, to specify the level of confidence. For 95%, this will be 1.96. F = the finite population correction. We don’t have to worry about this in general until our sample is about 5% of the population or more D = a design factor, which takes account of the clustering, stratification etc. s2 = the estimated variance from the sample. n = the sample size.