Confidence Interval Coverage Probabilities

INTRODUCTION In statistics, sampling data plays a very important role in allowing one to make inferences and deduce certain parameters from a sample collected. The sample data drawn is random and unbiased, thus the properties of a sample will correspond very closely to with those of the populations. As a result, there is no need to collect data for the whole population which is time-consuming, cost ineffective and difficult to manage. Hence, the sample data collected will be able to represent a parameter of interest of a population by estimation. One way of estimating a parameter of a population is by using confidence interval obtained from a sample distribution of random data. However, there are factors that can affect the precision of confidence interval, they are sample size, confidence interval and the underlying population distribution. Therefore, in this paper, the effect of sample size and underlying population distribution on the coverage probabilities of confidence interval for population mean based on normal distribution and chi-squared distribution will be investigated. The width of confidence interval will be affected by sample size as when sample size increases, with the condition of confidence level being constant, the width of confidence interval decreases. This indicates that the precision of the confidence interval increases. This can be simply understood as with larger sample size, one will be able to observe the data collected more precisely so one will be more confident in the data collected. As a result, the estimation made will be tighter in range. Similarly, from the underlying population distribution, we can observed the graphical illustration of the distribution and make certain degree of induction such as the standard deviation of the data. The more spread out the population distribution, the greater the standard deviation. Consequently, the confidence interval will be wider as the there is less consistency in the data. METHODOLOGY In carrying out this activity, Microsoft Excel is used to generate 500 observations and only one example is shown for parts with many repetitions in order to make the paper more neat and easier to read. For population distribution which is normally distributed, the 500 observations can be generated by using the function, =NORMINV(RAND(), mean, standard deviation). After that, in order to generate a graphical illustration for the population distribution in the form of line graph, all observation is arranged in one column and labelled x-axis while another column is for the probability of each of the data on the x-axis. The probability is calculated using the function, =NORMDSIT(A, mean, variance, FALSE), where A is the cell reference of a data and “FALSE” is used because the probability generated is not cumulative probability. Then, insert tab is selected and go the XY Scatter, then Scatter, and Scatter with smooth lines is selected to produce the line graph of the population distribution. This is repeated for the two normal distribution of mean=10, variance=2 and another one with mean=10, variance=4. On the other hand, for the Chi-squared distribution, the function used to generate the observations is different, which is =CHIINV(RAND(),v), where v is the degree of freedom. Similar steps are taken as in that of normal distribution where two columns, x-axis column and y-axis column, are generated. In y-axis column, to find the probability of each of the data, the formula used is =CHIDIST(A, v). Lastly, the line graph of the Chi-squared distribution can be plotted in the same way as that in normal distribution. This is used by substituting the degree of freedom of 1,10 and 30. Next, from the 500 data generated, 10 random data is selected. This can be done using the function, =INDEX(B3:B10, RANDBETWEEN(1, ROWS(B3:B10)), 1), where B3:B10 is the list of cell references which contains the observations generated. After 10 random data,x is selected, they are used to calculate the sample mean using =AVERAGE(B3:B12), where B3:B12 is the cell references of the sample data. Standard deviation is also calculated using =STDEV.P(B3:B12). Then, the alpha value is calculated, which is 0.05 and the sample size, n=10. So we can calculate the sampling error using =CONFIDENCE.NORM(alpha value, standard deviation, sample size). Lower boundary is the sample mean minus by the sampling error while the upper boundary is the sample mean add with the sampling error. Hence, the confidence interval at 95% confidence level is obtained. This is repeated for 200 times to generate 200 of 95% confidence interval for sample size, n=10. So 200 confidence interval is generated from 200 sets of sample with n=10 for each of the 5 distribution. This procedure is repeated using n=30 and n=50. Then, the proportion of confidence interval that contains the population mean is determined by dividing the total number of confidence interval containing population mean by the sample size, which is n=200 as there are 200 outcomes. The total number of confidence interval containing the population mean is determined using the function, =COUNTIFS(G5, “<=population mean”, H5, “>=population mean”), where G5 and H5 are lower boundary and upper boundary respectively. If population mean is found within the interval, an outcome of 1 is showed while outcome 0 means population mean is not within the interval. Later, the coverage probability of confidence interval for population mean is calculated by using the formula: The coverage probability is calculated for each of the distributions with n=10, n=30 and n=50. Lastly, the results of coverage probability of the confidence interval for each of the population means with different sample size is observed and discussed. RESULTS AND DISCUSSION The coverage probability of confidence interval for population mean is shown in the table below: Population Sample size, distribution n Normal 10 distribution 30 with 50 mean=10, variance=2 Normal 10 distribution 30 with 50 mean=10, variance=4 Chi-squared 10 distribution 30 with 1 50 degree of freedom Chi-squared 10 distribution 30 with 10 50 degree of freedom Chi-squared 10 distribution 30 with 30 50 degree of freedom Formula for standard error: Standard error 0.0197 0.0192 0.0180 Sampling error 0.0386 0.0376 0.0353 Confidence Interval (0.8764,0.9537) (0.8824,0.9576) (0.8946,0.9654) 0.0180 0.0139 0.0202 0.0353 0.0272 0.0396 (0.8946,0.9654) (0.9328,0.9872) (0.8703,0.9497) 0.0226 0.0174 0.0256 0.0443 0.0341 0.0502 (0.8408,0.9292) (0.9008,0.9692) (0.7948,0.8952) 0.0249 0.0180 0.0192 0.0482 0.0353 0.0376 (0.8062,0.9038) (0.8946,0.9654) (0.8824,0.9576) 0.0234 0.0230 0.0226 0.0459 0.0451 0.0443 (0.8292,0.9208) (0.8350,0.9250) (0.8408,0.9292) Formula for sampling error/margin of error: From the table, it is shown that in general an increase in sample size causes the confidence interval of coverage probability to be narrower and tighter where the standard error and sampling error become smaller. Occasionally, when the sample size increases, the confidence interval becomes wider. This might be due to the higher variability of the population distribution, which means the population distribution is larger. For example, for normal distribution with mean=10 and variance=4, there is less consistency in the data generated. As a result, there are more extreme values which can affect the width of the confidence interval, causing it to be wider, even if the sample size increases. To put it simply, a less consistent data makes it harder to make predictions so one will be less confident with the estimation made. Similarly, for Chi-squared distribution with 1 degree of freedom, we can see from Diagram 1 that the graph is positively skewed, the range of data is very big, from very small value to very large value, with most of the data clustered at one extreme while the other extreme has little data. This causes the difference between data to be large due to the presence of extreme values. As a result, the confidence interval of coverage probability is larger as the range of data is large. Apart from that, based on the table, most of the confidence intervals of coverage probability do contain the 95% confidence level used for estimation of population mean. This is true as by using a confidence level of 95%, it means that from 100 samples, there are at least 95 samples which have a confidence interval containing the population mean. CONCLUSION In conclusion, the increase in sample size generally causes the confidence interval of coverage probability to be narrower due to smaller standard error and the underlying distribution which is wider with higher variability will cause the confidence interval of coverage probability to be wider as the standard error increases. In addition, it is also shown that at large sample size, the coverage probability of confidence interval for population mean is at least 0.845 as most of the distribution will form confidence interval that contains the population mean.

Confidence Interval Coverage Probabilities

Related documents

Products

Support

Confidence Interval Coverage Probabilities

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib