Module 4 Session 4 plus Module 4: Session 4 onwards Project Work using UNHS2 data There are four tasks listed on the following pages. You select one of these to form the work of your project. The aim of the project work is for you to acquire skills in data analysis with clear objectives in mind, while at the same time consolidating your knowledge of Stata. You carry out work on this project in several of the sessions that follow, but should aim to complete your data analysis by the end of day 5 of this module and complete a report on your data analysis by the end of session 11 on day 6. Resource persons will assist you if you have difficulties in producing specific parts of the analysis using STATA. The overall objectives of the four tasks (from which you choose one) are as follows: Task 1: Investigate health conditions of children (defined as those less than 18 years of age) in sampled households. Task 2: Investigate health conditions of adults (defined as those who are 18 years of age or above) in sampled households. Task 3: Investigate the educational level of children (defined as those less than 18 years and more than 4 years of age) in sampled households. Task 4: Investigate the educational level of adults (defined as those aged 18 years and more) in sampled households. The data comes from sections 2, 3A and 4 of the socio-economic component of the UNHS 2002/2003 Socio-Economic questionnaire. This data set includes the personal characteristics of household members, their health and education, and is available in the STATA file called SocioSects2to4.dta. Each task involves using a subset of this data. Copies of the relevant pages of the questionnaire are on the final pages of this handout and/or in an Excel file associated with Session 4. If time permits, you can also look at some poverty related variables. This data is in file pov0203.dta and is available at the household level. The steps to follow are given below. Aim to complete up to Step (vii) in this session. Do NOT attempt to start the major analyses related to the study objectives in this session. You do this in Session 6, after a resource person has checked your analysis plan in Step (vii). Until that time, we recommend that you browse your data and carry out exploratory data analysis procedures that help you to understand the data variables and data structure. Districts Training Programme Module 4 Session 4 plus – Page 1 Module 4 Session 4 plus (i) First decide which of the objectives interests you most, and select the corresponding task. If you working alongside a friend or colleague, it is highly desirable that you both agree on whether to work on Tasks 1 and 2, (health) or Tasks 3 and 4 (education). This will enable you to discuss and compare your results as you go along. However, your final report should be done by you, rather than jointly. (ii) Next, read the specific questions to be answered in relation to your task as given on one of the following pages. If you are not clear about any of the questions, ask a resource person. (iii) Read the steps from (iv) onwards below, so you get a quick idea of good practice procedures to be followed when undertaking a data analysis. (iv) Now input your data file SocioSects2to4.dta into Stata. If Stata tells you that you don’t have enough memory, type the command set memory 32m and try again. Browse your data and check you can identify the variables related to your task. Note these down. Remember that for any analysis, background variables, i.e. variables from hh to rurban, are likely to be needed at some stage, so note these too as part of the set of variables you will need for analysis. Ensure you understand the meaning of the variables you need for your task. (v) You will notice that your task involves using a subset of the data. Don’t create this subset until step (vi) below. For now, produce some summary statistics, using Stata facilities you have learnt, that will enable you to identify the number of records that your data subset will have. This is standard good practice. Note down this number. Also produce some simple data summaries which will tell you about the number of records your subset data will have across (a) regions; (b) urban and rural areas; (c) across sex of household member. Note down these numbers (this is to help with the checking process in step (vi) below). (vi) Now it is time to select the data subset you need for your task. BUT before you do so, save the data set SocioSects2to4.dta under a different name and check you know the directory on your computer from where you can recover this data set again. Now select the data subset you need, possibly using either the keep or drop commands, together with an if option. Using a few Stata commands, e.g. performing the same analysis as undertaken in step (v) above, verify whether you have succeeded in selecting the data subset you need. If you are happy with the result, then save your data. (vii) Conduct some simple exploratory data analysis to understand your data subset better. You may have already done this in step (vi) above. Note down your results since this analysis can also form a part of your project write up report. (viii) Now consider how you might address the first three specific objectives related to your task. However, before starting the analysis using Stata, consider the type of tables and graphs that will help in achieving the objectives. Draw up, on sheets of paper, “dummy” tables and graphs to show the type of tables and graphs that would be appropriate. Districts Training Programme Module 4 Session 4 plus – Page 2 Module 4 Session 4 plus For example, with respect to (say) two-way tables, you will want to specify what variable makes up the rows of your table, what variable makes up the columns of your table, what the table cells (contents) will have, and a suitable title for your table. With respect to graphs, include a title, and consider whether you want a pie chart, bar chart, multiple bar chart, box plot, or other type of graph. Give as much information as possible concerning this graph so that you are clear about its form at the time of analysis. Once you have completed this task, show your results to a resource person to check that the tables and graphs you have suggested are appropriate for the objectives at hand. (ix) Once the resource person is satisfied with your suggestions, you may start the data analysis using Stata. Most of this work will be done in Session 6. Note down the procedures you followed while doing your analysis, or a better approach is to set up an appropriate Do file so that you can return to the same analysis at a later stage. (x) After each piece of analysis, corresponding to each question of your task taken in turn, write up the component of your report related to that analysis before you proceed to the next question in your task. This way, you will not run out of time at the end, and will have the opportunity to revise your report to a suitable length at a later stage. Aim to complete questions 1, 2, 3 of your task by the end of Session 6. (xi) After a short presentation in Session 10 – to give you an appreciation of sampling weights, you will have time to learn about merging of data files. You will then have time to continue with your project work and begin addressing objectives 4 and 5 of your task. (xii) You should aim to complete your project report by the end of Session 11 and hand your report to a resource person. Districts Training Programme Module 4 Session 4 plus – Page 3 Module 4 Session 4 plus Task 1 Overall objective To investigate health conditions of children (defined as those less than 18 years of age) in sampled households. Specific objectives (questions): 1. What percentage of children suffered a sickness or injury during the 30 days prior to survey work? Provide an appropriate estimate at this stage, but after Session 7, you should aim to return to this question and provide a standard error and a confidence interval for your estimate. 2. (i) What specific sicknesses/injuries were experienced by children in the household in the period of interest? Use a suitable graph to summarise these results. (ii) There is specific interest in investigating the occurrence of malaria and respiratory diseases. Produce a table to explore how the proportion of those with each of these diseases varies across the 4 regions? 3. What percentage took treatment for malaria and respiratory diseases? Of those who suffered malaria, what percent did not go outside home for treatment of their sickness or injury? What were the reasons for not consulting outside? The remainder of this task will be restricted to household level variables and information concerning one child in the household. For convenience, the oldest child will be selected for this purpose. Create this data subset, save the data in a new name, and then proceed. 4. Is there evidence of an association between the use of a mosquito net and the occurrence of malaria? Explore whether this association exists for every region or for every district within one region of your choice? 5. Merge your data file with the household level file called pov0203.dta and investigate whether the mean consumption expenditure per adult equivalent (variable welfare), converted to logs if appropriate, differs significantly across those who (a) suffered and (b) did not suffer, an illness or injury in the 30 days prior to survey work. Do this for the whole of your data subset and then repeat the analysis for a district of your choice. Districts Training Programme Module 4 Session 4 plus – Page 4 Module 4 Session 4 plus Task 2 Overall objective To investigate health conditions of adults (defined as those who are 18 years of age or above) in sampled households. Specific objectives (questions): 1. What percentage of adults suffered a sickness or injury during the 30 days prior to survey work? Provide an appropriate estimate at this stage, but after Session 7, you should aim to return to this question and provide a standard error and a confidence interval for your estimate. 2. (i) What specific sicknesses/injuries were experienced by adults in the household in the period of interest? Use a suitable graph to summarise these results. (ii) There is specific interest in investigating the occurrence of malaria and respiratory diseases. Produce a table to explore how the proportion of those with each of these diseases varies across the 4 regions? 3. What percentage took treatment for malaria and respiratory diseases? Of those who suffered malaria, what percent did not go outside home for treatment of their sickness or injury? What were the reasons for not consulting outside? The remainder of this task will be restricted to household level variables and information concerning the head of the household. Create this data subset, save the data in a new name, and then proceed. 4. For heads of household, is there evidence of an association between the use of a mosquito net and the occurrence of malaria? Explore whether this association exists for every region or every district within one region of your choice? 5. Merge your data file with the household level file called pov0203.dta and investigate whether the mean consumption expenditure per adult equivalent (variable welfare), converted to logs if appropriate, differs significantly across household heads who (a) suffered and (b) did not suffer, an illness or injury in the 30 days prior to survey work. Do this for the whole of your data subset and then repeat the analysis for a district of your choice. Districts Training Programme Module 4 Session 4 plus – Page 5 Module 4 Session 4 plus Task 3 Overall objective To investigate the educational level of children (defined as those less than 18 years and more than 4 years of age) in sampled households. Specific objectives (questions): 1. What percentage of children have never attending school? Provide an appropriate estimate at this stage, but after Session 7, you should aim to return to this question and provide a standard error and a confidence interval for your estimate. 2. How does the percentage in question above vary between (a) rural and urban areas, (b) across the different regions, and (c) across the different age groups, <10 years, 10 and <14 years, and 14 years. Consider showing your results with suitable graphical presentations. 3. Of those children who have never attended school, what are the key reasons for not attending? Consider a suitable table that may be produced to summarise your answer. The remainder of this task will be restricted to household level variables and information concerning one child in the household. For convenience, the oldest child will be selected for this purpose. Create this data subset, save the data in a new name, and then proceed. 4. Consider the distance that current school going children have to travel to get to school as “Near”, i.e. less than 1 km away, “Some distance”, i.e. 1-2 kms, and “Far”, i.e. more than 2 kms. Is there any evidence that the distance to the school is related (associated) with the type of school (boarding, day or day/boarding) they are attending. Explore whether this association exists or not for every district within a region of your choice. 5. Merge your data file with the household level file called pov0203.dta and investigate whether the mean consumption expenditure per adult equivalent (variable welfare), converted to logs if appropriate, differs significantly across (a) those who have never had schooling and (b) those who are currently attending school. Do this for the whole of your data subset and then repeat the analysis for a district of your choice. Districts Training Programme Module 4 Session 4 plus – Page 6 Module 4 Session 4 plus Task 4 Overall objective To investigate the educational level of adults (defined as those aged 18 years and more) in sampled households. Specific objectives (questions): 1. What percentage of adults currently attend school? Provide an appropriate estimate at this stage, but after Session 7, you should aim to return to this question and provide a standard error and a confidence interval for your estimate. 2. How does the percentage in question above vary between (a) rural and urban areas, (b) across the different regions, and (c) across the different age groups, <25 years, 25 and <40 years, and 40 years. Consider showing your results with suitable graphical presentations. 3. Of those adults who have never attended school, what are the key reasons for not attending? Consider a suitable table that may be produced to summarise your answer. The remainder of this task will be restricted to household level variables and information concerning the head of the household. Create this data subset, save the data in a new name, and then proceed. 4. Is there any evidence that the literacy level of household heads is associated with whether or not they have ever attended a literacy program? Explore whether this association exists or not for every district within a region of your choice. 5. Merge your data file with the household level file called pov0203.dta and investigate whether the mean consumption expenditure per adult equivalent (variable welfare), converted to logs if appropriate, differs significantly across those who (a) have never had schooling or have only completed primary level schooling and (b) those whose educational level is higher than primary level. Do this for the whole of your data subset and then repeat the analysis for a district of your choice. Districts Training Programme Module 4 Session 4 plus – Page 7 Module 4 Session 4 plus UGANDA BUREAU OF STATISTICS THE REPUBLIC OF UGANDA UGANDA NATIONAL HOUSEHOLD SURVEY 2002/2003 SOCIO-ECONOMIC SURVEY QUESTIONNAIRE SECTION 1A: IDENTIFICATION PARTICULARS 1. STRATUM: 2. COUNTY: 3. SUB-COUNTY 4. PARISH: 5. EA/ LC1: 6. HOUSEHOLD SR. NO.: 7. SAMPLE NO.: 8. HOUSEHOLD CODE: 9. NAME OF HEAD: THIS SURVEY IS BEING CONDUCTED BY THE UGANDA BUREAU OF STATISTICS OF THE MINISTRY OF FINANCE, PLANNING AND ECONOMIC DEVELOPMENT UNDER THE AUTHORITY OF THE UGANDA BUREAU OF STATISTICS ACT, 1998. THE UGANDA BUREAU OF STATISTICS P.O. BOX 13, ENTEBBE, TEL: 041 – 320741, 322099, 322100, 322101, 075 -720745, 077 - 705127 Fax: 320147 E-mail: unhs@infocom.co.ug, ubos@infocom.co.ug Website: www.ubos.org Districts Training Programme Module 4 Session 4 plus – Page 8