Module 4 Practical 7 Practical 7 Estimation and Confidence Intervals Objectives: By the end of this practical you should be able to: describe what is meant by the standard error of a sample estimate and calculate the standard error for a sample mean explain the meaning and interpretation of a confidence interval calculate a 95% confidence interval for a population mean explain the effect that sample size and variability in the raw data have on the width of the confidence interval 1. Open the Stata file named unhs_hh&poverty.dta . This file has household information from UNHS2, with respect to several variables, some of which relate to education and others which relate to poverty . One question of interest was “What is the average size of a household in rural areas in the four regions of Uganda”. (a) Use the command summ hhsize if region==1 & rurban==0 in Stata to determine, for rural households in the Central region, an estimate of the average number of members in a household, and the corresponding standard deviation. Note them down in the first row of the table below, and use this information to determine the standard error of your estimate of the mean. Region n (rural) Mean Std. Deviation Std. error = s/n Central (region=1) Eastern (region=2) Northen (region=3) Western (region=4) Repeat your procedure above to obtain similar results for the Eastern, Northern and Western regions of Uganda. Districts Training Programme Module 4 Practical 7 – Page 1 Module 4 Practical 7 (b) Use your results above to compute (by “hand”) a 95% confidence interval for the mean number of persons per household in the Northern region. Obtain your t-value using Statistical Tables or the Stata command display invttail(k,0.025) to obtain the upper-tail value from a tdistribution with k degrees of freedom. (c) Now write down the answer to the question first proposed with respect to the Northern region, i.e. “What is the average size of rural households in the Northern region of Uganda”, attaching to your answer the confidence interval calculated above and interpreting carefully what it means. (d) Verify your answers above, and obtain 95% confidence intervals for the remaining 3 regions, by using the Stata command ci hhsize if region==1 & rurban==0 . Enter your results for the 95%% confidence interval in the relevant column in table below. Region Central (region=1) Eastern (region=2) Northen (region=3) Western (region=4) n (rural) 90% conf. int. 95% conf. int. 99% conf. int. Note that a 99% confidence interval (or other levels of confidence) can be obtained using: ci hhsize if region==1 & rurban==0 , level(99) Districts Training Programme Module 4 Practical 7 – Page 2 Module 4 Practical 7 Use the above command to obtain 90% and 99% confidence intervals and enter them in the table above. What can you say about the width of the confidence interval as the level of confidence increases from 90% to 95% to 99%? 2. In this exercise, you will select samples of different sizes to explore the effect that sample size has over the width (upper limit minus lower limit) of the confidence interval. Open the Stata file called hhsize_samples.dta. This file has information on household size for samples of different sizes drawn from rural households in the Central region in Uganda. The columns names, i.e. sample10, sample20, sample50, sample100 and sample200, represent the number of observations in the sample. (a) For each sample, obtain the mean, standard deviation, standard error of the mean, and a 95% confidence interval for the mean. Note down the results below, then calculate (by “hand”) results for the final column. Sample size Mean Standard deviation Std. error 95% C.I. for true mean Width of 95% Conf. Interval 10 20 50 100 200 (b) Do you observe any trends in the standard error with increasing sample size? If so, can you attribute reasons why this is so? Districts Training Programme Module 4 Practical 7 – Page 3 Module 4 Practical 7 (c) What can you say about the change in the width of the confidence interval as the sample size increases? What reasons can you give for any changes you observe? (d) With respect to one particular sample size, explain how you think your standard errors and confidence intervals would change if the standard deviation was to increase. If it did, would your estimate of the mean be less or more precise? 3. Consider again data from the file unhs_hh&poverty.dta . After restricting the data to rural households in the Central region, results below (using the Stata command was proportion hsex) were obtained for the proportion of male and female headed households. [Note: there is no need to reproduce these results]. Proportion estimation Number of obs = 1520 -------------------------------------------------------------| Proportion Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------hsex | Female | .2473684 .0110709 .2256525 .2690844 Male | .7526316 .0110709 .7309156 .7743475 Interpret these results. In particular, how would you report results in answer to the question “What is the proportion of female headed households in rural Central region of Uganda?” Districts Training Programme Module 4 Practical 7 – Page 4 Module 4 Practical 7 4. This final exercise is aimed at giving you further practice in deriving estimates and forming confidence intervals, and interpreting these quantities. EITHER work on a data set of your own to find estimates of key responses of interest, reporting these estimates together with their standard errors, and also reporting confidence intervals for the true population parameters, OR use the file unhs_hh&poverty.dta to answer the question posed below for your own district. Stata commands and steps needed for selecting data from Mukono district in the Central region are given below to help you in selecting data from your own district. Step 1: Use the command preserve (so you can return to the original data set by using the command restore). Step 2: Use the command keep if region==1 (if your district is in the Central region, else change to 2, or 3 or 4 as appropriate). Step 3: Use the command label list distlab to determine the code used in the variable dist for your own district (Note: distlab is a label that holds label values for codes of variable dist). For example, Mukono in the Central region will have code 109. Step 4: Use the command keep if dist==109 (for restricting data to Mukono district). Replace the code 109 by the district code for the district of your own choice. Questions to answer with respect to data from your selected district: (a) The variable log_welf refers to the logarithm of the household’s monthly consumption expenditure per adult equivalent, used as a proxy for the household’s income. What is the mean monthly consumption expenditure per adult equivalent for rural households in your own district? (Your answer should also include measures of precision, e.g. standard error and confidence interval). Districts Training Programme Module 4 Practical 7 – Page 5 Module 4 Practical 7 (b) The variable hlitrate refers to whether or not the household head is literate. What proportion of urban households have literate household heads? (Again the appropriate answer should not be only the estimate, but also include measures of precision). What proportion of rural households have literate household heads? What proportion of rural households have literate female household heads? What proportion of rural households have literate male household heads? Write a short paragraph that summarizes the results above for rural households? Districts Training Programme Module 4 Practical 7 – Page 6