Income and poverty status predictors for Tanzania National Bureau of Statistics, Tanzania and Oxford Policy Management Ltd, UK With support from the UK Department for International Development June 2001 Contents Section 1 Introduction ................................................................................................................ 2 Section 2 Using population census data in poverty monitoring ................................................ 2 Section 3 Modelling household expenditure.............................................................................. 4 Section 4 Refining the specification for modelling household expenditure .............................. 6 Section 5 Comparison of a set of alternative models ................................................................. 8 Section 6 Conclusions .............................................................................................................. 11 Appendix1 Modelling expenditure by stratum ........................................................................ 13 Appendix 2 Modelling the likelihood that households are poor .............................................. 14 Tables Table 1 Determinants of expenditure per adult equivalent and per household: initial model................. 5 Table 2 Determinants of expenditure using additional variables ............................................................ 6 Table 3 Determinants of expenditure using expanded explanatory variables, including assets ............. 8 Table 4 Modelling determinants of expenditure using information already in the census form............. 9 Table 5 Proportions of variance explained by alternative models of expenditure ................................ 10 Table 6 Determinants of expenditure using variables selected in model 4 ........................................... 11 Table A1 Determinants of expenditure per adult equivalent for (1) Dar es Salaam, (2) other urban areas, and (3) rural areas ................................................................................................................ 13 Table A2 Logistic regression of Poverty using 1991/92 specification ................................................. 14 Table A3 Logistic Regression of Poverty using the revised specification ............................................ 15 1 Section 1 Introduction There is growing interest in predicting household expenditure and poverty measures for small geographical areas by combining detailed information from a household budget survey with the more limited data, but more extensive coverage, of a Population Census.1 Tanzania is planning to undertake a Population Census in 2002; this data may be used for this type of poverty mapping. The National Bureau of Statistics decided to use the available data from the 2000/01 Household Budget Survey to identify the most important variables for predicting expenditure and poverty status. Information on these predictors would then be incorporated into the census questionnaire, to maximise its usefulness for poverty mapping. 2 The work was carried out in two stages. A first stage of modelling was undertaken using the available HBS data. This sample contained around 1,300 households. The results were then presented to the Research and Analysis Working Group of the Poverty Monitoring Steering Committee. The group gave its opinion on a number of issues; on this basis, the final modelling was undertaken. This report is split into 6 sections. Section 2 outlines the analytical approach taken and its limitations. It also places the work in the context of more general poverty monitoring issues surrounding the use of the Population Census data. In Section 3, regression models run on the 1991/92 HBS data are replicated as closely as possible for the 2000 data. 3 Section 4 refines the specification by disaggregating some of the variables used in the earlier model. Section 5 compares a number of models and evaluates the improvement in explanatory power relative to the data that is already collected in the census. Section 6 concludes. Section 2 Using population census data in poverty monitoring The census provides a unique data source for poverty monitoring because it covers every household in the country. For this reason, it can provide highly disaggregated welfare information, down to the district level and below. This is particularly important in the context of decentralisation, where district administrations will increasingly be held accountable for the provision of services. Census data may be used to inform the allocation of resources, to monitor performance and to cross check administrative data. The 2002 Census may, in particular, provide a baseline for district administrations as decentralisation begins. While the census provides this opportunity, the amount of data that can be collected must be limited because of the scale of the operation. Many of the non-income poverty measures defined by the PRSP and the Vice-President’s Office are included in census form. These include literacy and the level of education achieved; access to an adequate water supply and sanitation; mortality (infant, under five, adult) and life expectancy. However, a case can be made for the addition of a number of other non-income measures. These include measures of health sector outputs and further information on enrolment.4 See, for example: Alderman, H. et al, (2000) ‘Is Census Income an Adequate Measure of Household Welfare? Combining Census and Survey Data to Construct a Poverty Map of South Africa’, Statistics South African and World Bank; and Hentschel, J., Lanjouw, J.O., Lanjouw, P. and Poggi, J. (2000) ‘Combining Census and Survey Data to Study Spatial Dimensions of Poverty: A Case Study of Ecuador’, World Bank Economic Review, Vol. 14, No. 1:147-165. 2 The work was carried out by Trudy Owens and Patrick Ward of OPM. 3 The earlier models are reported in ‘Developing a Poverty Baseline in Tanzania’ May 2000, NBS and OPM. 4 See ‘ Tanzania Census Form 2001: Comments on its usefulness for poverty monitoring’, P Ward, OPM, May 2001 (mimeo) for further discussion of these issues. 2 1 High quality data on income and expenditure cannot be reliably collected in a census. It is possible, however, to use modelling to predict household expenditure and poverty status from other variables which can be collected in a census. Expenditure is considered a more reliable measure of household consumption – and welfare - than reported income. The modelling is carried out using HBS data, which includes information on both household expenditure and on the predictor variables. Once the census has collected data on the predictor variables, the model parameters may be used to estimate expenditure, poverty and distributional measures for small geographic areas, including districts and below. The modelling was carried out on a data set of around 1,200 households, which represent the data collected in the ‘national sample’ during the first three months of the HBS. Models are developed for two main outcome variables: total household consumption expenditure and expenditure per adult equivalent. Expenditure measures include the value of home produced items consumed. 5 In addition, one model is developed for predicting whether households are poor or not. However, the main focus is on the expenditure measures, which provide more powerful and flexible measures of welfare. 6 The models are developed for the population as a whole and separately for the rural population, since the latter represents around 80 percent of the total population. Models are also presented in the Appendix for the other two HBS strata, i.e. Dar es Salaam and other urban areas. The objective of this work was to inform the development of the 2002 census questionnaire. The models were used to assess the predictive value of data that was already included in the Census and then to examine which additional information should be included in order to increase the models predictive power. The procedure used was to apply forwards stepwise selection of variables on each set of variables defined for the alternative models. Variables were included in the model if their associated p-value was less than 0.2. There are a number of limitations to this work. The modelling process identifies variables that correlate with household expenditure. However, the relationship is not necessarily causal – changes in the predictor variables will not necessarily produce a change in household expenditure. Ownership of consumer goods, for example, is a consequence of household income rather than a determinant of it. The models are intended to provide geographically disaggregated welfare measures, although these methods must be considered experimental, since they are still not well tested. The predictors may also provide useful proxies for inclusion in other surveys where it would be impossible to collect full information on household consumption but where distributional issues may be of interest. This is done in the CWIQ survey, for example, and could equally be used to look at health indicators by (estimated) household expenditure category, for example, in a DHS. However, because the models are not causal, we do not believe that they can be relied upon for tracking income poverty over time. This should be left to periodic household budget surveys. It should be remembered that the data set is small and covers only three months of the year. It is hoped that the modelling will be repeated using the full data set once it is available. This should provide more robust estimates of the model parameters and will allow the model to be disaggregated more reliably by stratum and, possibly, by region. 5 This work builds on an interim analysis of a part of the 2000/01 HBS data set. For further details of the data and the expenditure and poverty measures see ‘Trends in Poverty and Social Indicators: Tanzania 1991/92 – 2000, A Preliminary Analysis’ (Draft) National Bureau of Statistics and Oxford Policy Management, April 2001 (mimeo). 6 Models of poverty status also provided a much less sensitive means for selecting variables for inclusion in the census form and, after initial exploration, were not used extensively in this work. 3 In addition to these limitations, we have not found any guidelines for defining an ‘adequate’ model, which can be counted upon to produce disaggregated estimates of a given accuracy – there is no particular ‘R-squared’ value that is defined as adequate. The R-squared measures can also be increased by increasing the p-value at which a variable is allowed to remain in the model and by using the log of expenditure as the dependent variable. We have also used more than one dependent variable and the models cover more than one stratum. These different models frequently select different variables. For these reasons, the selection of additional variables to be collected in the census cannot be made mechanically. There is a strong case for collecting information that is of intrinsic interest, even at the expense of reducing somewhat the explanatory power the models although informed by the modelling exercise. We return to this issue in Section 5. Section 3 Modelling household expenditure Using the 1991/92 HBS data, earlier work estimated models for two variables: expenditure per adult equivalent and the likelihood of being poor.7 This was carried out for Mainland Tanzania and each of the three strata individually. For the expenditure variable, the parameters were estimated using linear multiple regression with stepwise selection of variables. For poverty status, a binary dependent variable, parameters were estimated using a logistic procedure, once again using stepwise selection of variables. In the earlier analysis, nineteen variables were included in the first estimation. These were: Household size, as measured as the number of household members Household dependency ratio The number of rooms in the dwelling unit occupied by the household The distance to drinking water The distance to the nearest market The distance to the nearest clinic or dispensary A dummy variable if the household head was educated beyond standard 4 A dummy variable for households in rural areas A dummy variable for households in other urban areas The number of household members employed or in self-employment (outside the farm) A dummy variable for whether the household head was employed or self-employed A dummy variable for whether the household head was inactive A dummy variable if the household head was over 60 A number of dummy variables to denote the use of modern building materials for the foundations, floor, walls and roof A weighted index of ownership of consumer goods. Using the stepwise procedure 11 of these were excluded as not significant in the 1991/92 estimation for Mainland Tanzania. The same analysis was repeated using the 2000 data. Table 1, column 1, reports the results of estimating this same specification for 2000. Explanatory variables that did not have a significant coefficient for any model are omitted entirely from the table. Variables which were significant for at least one model are included in the table; a coefficient is entered against them for each model in which they are significant (using p<=0.2). The results are similar to those found in the 1991/92 analysis. Household size, whether the head has a job, whether the household lives in rural or other urban areas and the dependency ratio continue to 7 ‘Developing a Poverty Baseline in Tanzania’ cited in fn 3. 4 be significant variables in explaining expenditure per adult equivalent. The picture is more mixed for modern building materials. Using concrete for foundations and floors no longer appear to be significant explanatory variables. However, the use of a modern roof or wall material do appear to be significant variables in explaining expenditure, although the sign of the latter is not in the direction that would be expected. In addition, the distance to water in the dry season and distance to a clinic or dispensary also appear as significant explanatory variables, although once again with signs that would not have been expected. Together, these variables explain 35 per cent of the variation in expenditure per adult equivalent. This is a much higher proportion explained than in the 1991/92 data set, where it was only 5 percent. Concern that we may be introducing a spurious correlation by including household size as an explanatory variable when it is also the denominator of the dependent variable lead us to estimate the same specification without household size.8 The results are reported in column 2 of Table 1. This made little difference to the model, although three additional variables became significant in explaining expenditure. These are: if the household head was over 60, the number of employed adults in the household, and the number of rooms in the dwelling. The negative coefficients of the number of employed individuals and the number of rooms are not what would be expected and may be because these variables are acting as proxies for household size. For this reason, the models constructed in later sections retained household size as an explanatory variable. Table 1 Determinants of expenditure per adult equivalent and per household: initial model Expenditure per adult equivalent (1) Coeff. Sig. 57 ** -1,029 ** -9,216 ** 241 * -7,635 ** -1,621 6,041 ** 2,457 -6,814 ** Expenditure per adult equivalent (2) Coeff. Sig. 50 ** Expenditure per household Coeff. Sig. Weighted Index of Assets 303 ** Household Size 4,817 ** Dependency Ratio -14,943 ** -11,734 * Distance to Water 251 * 529 ** Lives in Rural Area -6,856 * -21,351 ** Modern Wall Material Modern Roof 6,488 ** 16,193 * Head Employed or Self-employed 4,440 ** Lives in Urban Area outside Dar es Salaam -6,172 * -22,361 ** Household Head Over 60 2,823 No. of rooms -697 ** Number of Employed Adults in the Household -1,242 * 3,402 Head Inactive -18,425 ** Male Household Head 5,275 Distance to Health Clinic 77 * 100 ** 404 ** Constant 26,547 ** 24,064 ** 19,590 * R-squared 0.35 0.31 0.38 Number of observations 1223 1223 1223 Notes: (1) Including household size as an explanatory variable. (2) Excluding household size as an explanatory variable. Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level We also estimated the regression using total household expenditure as the dependent variable and including household size as an explanatory variable. The results are reported in column 3 of Table 1. Again the results are broadly similar, although household size now has a positive coefficient since larger households will tend to have a larger total expenditure (while they tend to have a lower expenditure per adult equivalent). The explanatory power of the model 8 In an additional specification we examined the effect of excluding the dependency ratio as an explanatory variable. It made no appreciable changes to the results. 5 increases to 0.38. Again, a number of additional variables become significant in addition to those in the first model. These include the number of employed household members, whether the head is male or inactive and the number of rooms in the dwelling. These exercises established that the HBS 2000 data produces broadly similar models to the 1991 data. They also suggested that household size should be retained as an explanatory variable. In the following sections, we go on to extend these results using both expenditure per adult equivalent and expenditure per household as the dependent variables. Section 4 Refining the specification for modelling household expenditure This section refines the models estimated in Section 3. A number of the variables that were used in those models were disaggregated in order to use more of the information available. This was done with information on the education, age and occupation of the head. Information on household sources of income and housing materials was also explored. The index of assets owned was also disaggregated. The use of a non-linear term for age was also explored. Stepwise regression was used to select variables. Table 2 presents the results, for those variables that were added to the model. Additional dummies for the level of education of the household head were significant. These used heads with no education as the comparative group and had dummies for standard 1-4, standard 5-8, form1-4 and form 5 plus. Age of the head, and age squared, were also included; household size squared was also included. Table 2 Determinants of expenditure using additional variables Expenditure per adult equivalent Coeff. Sig. -2,060 ** 63 ** 49 ** -7,902 ** 2,557 2,086 5,840 ** -1,703 4,280 -412 * 4 ** -7,310 ** -6,807 ** 1,764 4,188 Expenditure per household Coeff. Sig. 4,808 ** Household Size Household Size – squared Weighted Index of Assets 303 Dependency Ratio -14,960 Head Employed 24,202 Head Self-employed 20,546 Modern Roof 15,734 Modern Wall Material Level of Education: Form 5 plus Age of Household Head 177 Age of Household Head – squared Lives in Rural Area -21,815 Lives in Urban Area outside Dar -22,254 Modern Toilet Level of Education: Form 1-4 Distance to Health Clinic 373 Head Works on Own Farm 21,500 Male Household Head 5,412 Number of Employed Adults in the Household 3,081 Distance to Water 255 ** 569 Constant 36,736 ** -7,636 R-squared 0.39 0.38 Observations 1223 1223 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level ** * * ** * ** ** ** ** The disaggregation or addition of some other variables did not add to the specification. Producing additional dummy variables by further disaggregation of the main activity of the 6 household head did not add to the specification. Information on the source of household income did not add significantly to the information provided by the activity of the household head. Splitting the dummies on materials used in the construction of the dwelling did not improve the specification. For this reason we decided to continue to use the previous dummies of modern materials in foundation, floor, wall and roof construction. Similarly, including ten dummies for types of water supply was found to be no more effective than including a dummy for protected water supply, using those who had an unprotected water supply as the comparative group. Overall, the explanatory power of the model improves for expenditure per adult equivalent – 39 per cent of the variation in expenditure is explained by this new set of variables, compared to 35 percent previously. The key variables improving the explanatory power appear to be the dummies on levels of education and using the actual age of the head rather than the dummy for whether the head is over 60. There is very little change to the model using expenditure per household as the dependent variable. In the previous estimations, assets were included as a weighted index, replicating their inclusion in the Poverty Baseline report. The next step was to include individual assets separately, since it would not be practical for the census to collect information on all of the assets that go to make up the index. Instead, the objective was to identify which particular assets should be recorded in the census. Table 3 includes dummy variables for whether a household owned each individual asset. This improves the explanatory power of the models.9 For the model of expenditure per adult equivalent, twelve individual assets are included in the specification increasing the r-squared 0.39 to 0.45. For total household expenditure 8 assets are added increasing the r-squared from 0.38 to 0.43. Unfortunately, the assets that are selected in the two regressions are usually different ones. A model of expenditure per adult equivalent was also developed separately for each stratum – that is, for Dar es Salaam, other urban areas and rural areas (see Appendix 1). This shows that the nine assets selected for the whole mainland population are not always selected for each of the stratum. To collect information on all assets that appear as significant in each of those models would involve collecting information on 38 items. The logistic regression model which predicts household poverty status also selects yet other assets (Appendix 2). The modelling undertaken so far established that the addition of information on the ownership of individual assets would appreciably increase the amount of variance explained by the models. There was a good case for collecting some additional information on the ownership of assets in the census. However, the cost of the data collection must also be considered and only a limited number of variables could be added to the census questionnaire. As discussed in section 2, there are limitations to the modelling, including the small sample size and the difficulty in defining a cutoff proportion of variance explained to define an ‘adequate’ model. Since the different models also identify different assets, it was concluded that the modelling alone did not provide a sufficient basis for choosing the assets for inclusion in the census form. At this point, the results produced were presented to the Research and Analysis working group. The group took a number of decisions which formed the basis for the final round of the modelling. 9 We also included the number of agricultural plots, number of livestock, number of donkeys and number of poultry owned by the household rather than a dummy for whether they owned any or not. The r-squared remained virtually unchanged. We therefore left the variables as dummies, which would be easier to collect in a census if they were selected. 7 Table 3 Determinants of expenditure using expanded explanatory variables, including assets Expenditure per adult equivalent Coeff. Sig. -2,089 ** 59 ** -8,127 ** 1,928 4,054 * 5,420 ** -1,129 1,927 -5,174 ** 4,263 2,020 * -1,922 -4,479 ** 12,113 -423 ** 4 ** -7,508 ** -6,677 ** 2,312 ** 218 * 88 * -4,848 1,884 * 4,500 2,885 2,669 * 5,564 2,593 * 2,206 * 1,530 5,298 * -1,910 * Expenditure per household Coeff. Sig. 4,909 ** Household Size Household Size – squared Dependency Ratio -19,090 ** Livestock Complete music system 39,621 * Modern Roof 16,224 ** Modern Wall Material Modern Toilet Hand milling machine -18,351 ** Donkeys Watches 5,844 Fields/Land -8,806 Wheel-barrow -11,524 Fishing net and other equipment 20,847 Age of Household Head 151 Age of Household Head - squared Lives in Rural Area -16,726 * Lives in Urban Area -16,850 ** Table Distance to Water 369 Distance to Health Clinic 447 ** Cart Ownership of more than one house 5,071 Level of Education: Form 1-4 Level of Education: Form 5 plus Hoes 8,098 Telephone 39,919 * Electric/charcoal iron 9,922 Cupboards,wardrobes,drawers,etc 8,943 * Bicycle 5,042 Video 30,729 * Books (not school books) -5,821 Head Employed 18,816 ** Head Self-employed 19,776 ** Head Works on Own Farm 18,633 ** Lanterns, lamps, etc 9,193 * Motor cycle 19,503 ** Present business working capital 6,988 Refrigerator or freezer 31,683 * Constant 35,098 ** 600 R-squared 0.45 0.43 Observations 1223 1223 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level 10 Section 5 Comparison of a set of alternative models The Research and Analysis group reviewed the work undertaken to this point. It accepted that there was a case for selecting assets that were of intrinsic interest. It was decided that a total of six assets would be added to the census questionnaire and that their selection should take into account the list of non-income poverty measures defined by the Vice President’s Office and the PRSP. Four of the assets should be selected from these indicators and should 10 Assets that were owned by less than 1 per cent of the sample were excluded. In most cases the stepwise regression dropped them automatically. These were: feeding machine, incubator, harrow, boat, boat engine, coffee pulping machine and reaper. 8 reflect some key aspects of poverty. These aspects included access to transport and communication and the ownership of productive assets. They should also be common, rather than rare, assets. The choice should be informed by the previous modelling. The remaining two assets would be selected based largely on their capacity to increase the variance explained by the model, once the first four had been included. It was also decided to return to the HBS data and to extract some additional information on variables that were already included in the census form but that it had not been possible to include in the earlier models (on cooking and lighting fuels). The variable on the number of rooms in the house was also replaced with the number of bedrooms, since this is what is collected in the Census. A number of models were developed on this basis. The first model used only those variables that were already available in the Census. These were: household size, age, education and employment of the head, type of materials used in house construction, type of toilet, and whether drinking water is protected. The results of these estimations are reported in Table 4. Note the sample size drops to 1213 due to the inclusion of variables on cooking fuel and lighting. This model acts as the baseline, being the model that can be derived without any additions to the census form. It explains 37 per cent of expenditure per adult equivalent, and 34 per cent of per household expenditure. In rural areas the basic model explains 32 and 35 per cent of the variance respectively. Table 4 Modelling determinants of expenditure using information already in the census form Whole Population Expenditure Expenditure per adult per household equivalent Coeff. Sig. Coeff. Sig. -1,884 ** 6,599 ** 61 ** -99 -8,370 ** -19,196 ** 4 * 5,561 * 17,798 * 7,019 ** 20,930 ** -1,376 2,084 * 5,661 * 18,907 * -2,145 * -7,082 * -8,470 ** -28,563 ** 3,924 * 30,672 ** 2,757 * 24,019 ** -7,319 ** -24,738 ** 23,147 ** 6,082 4,578 2,482 4,400 Rural Population Expenditure Expenditure per adult per household equivalent Coeff. Sig. Coeff. Sig. -1,956 ** 4,216 ** 60 ** -5,774 * -15,036 * 3 3,861 6,981 ** 20,666 ** -2,301 -5,344 2,121 * 6,726 22,846 -2,344 * -7,575 * Household Size Household Size - squared Dependency Ratio Age of Household Head - squared Level of Education: Form 5 plus Modern Roof Modern Wall Material Modern Toilet Level of Education: Form 1-4 Modern Lighting Lives in Rural Area Head Employed 7,098 * 55,267 Head Self-employed 23,977 Lives in Urban Area outside Dar Head Works on Own Farm 21,186 Level of Education: Standard 5-8 5,308 Male Household Head 5,328 No. of bedrooms 2,573 Number of Employed Adults in 1,171 the Household Age of Household Head -394 * 194 * -282 198 Constant 39,522 ** 2,447 28,108 ** -14,823 R-squared 0.37 0.34 0.32 0.35 Observations 1213 1213 561 561 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level ** ** ** * Subsequent models are then compared to this baseline. Table 5 compares the proportion of the variance explained for these alternative models, for both the whole population and the 9 rural population. The second model extends the first by allowing all assets to be included, resulting in substantial improvements in the proportion of the variance explained. The third model uses a more limited set of nineteen assets that were found to be consistently good predictors, based on the overall and stratum specific models produced earlier.11 The model that uses this set of nineteen assets also explains a substantial fraction of the variance. However, nineteen was still too many to be included in the census form; subsequent models assess the explanatory power of using only six additional variables, selected in line with the principles set out above. The first of these – model 4 – includes a telephone, a radio (including one contained in a complete music system – these two variables were combined), a bicycle, a hoe, a wheelbarrow and an iron (electric or charcoal). As would be expected, the predictive power is lower than the model using all nineteen assets. In the population as a whole, some 41 and 38 per cent percent of the variance is explained for expenditure per adult equivalent and per household respectively; the corresponding figures are 36 and 37 percent for rural areas. However, these proportions are still appreciably higher than for the baseline model without any additional information on assets. Subsequent models examine the consequence of replacing some of the items used in model 4. The effects of substituting particular items are often small, which gives some scope for using other assets if required. However, model 4 usually explains the highest proportion of variance and also conforms most closely to the principles agreed with the working group. The details of model 4 are given in Table 6. Table 5 Proportions of variance explained by alternative models of expenditure Expenditure per adult equivalent Household expenditure Mainland Tanzania – whole sample 1. Basic – existing census data 2. Basic plus all assets 3. Basic plus 19 most often significant assets 4. Basic plus telephone, bike, hoe, wheelbarrow, iron, radio 5. Basic plus telephone, car, hoe, wheelbarrow, iron, radio 6. Basic plus telephone, car, lantern, wheelbarrow, iron, radio 7. Basic plus telephone, car, table, wheelbarrow, iron, radio 8. Model 4 plus distances 0.372 0.455 0.444 0.408 0.406 0.397 0.399 0.416 0.341 0.442 0.437 0.383 0.390 0.399 0.390 0.389 Rural areas 1. Basic – existing census data 2. Basic plus all assets 3. Basic plus 19 most often significant assets 4. Basic plus telephone, bike, hoe, wheelbarrow, iron, radio 5. Basic plus telephone, car, hoe, wheelbarrow, iron, radio 6. Basic plus telephone, car, lantern, wheelbarrow, iron, radio 7. Basic plus telephone, car, table, wheelbarrow, iron, radio 8. Model 4 plus distances 0.319 0.449 0.429 0.359 0.353 0.356 0.358 0.373 0.346 0.442 0.423 0.372 0.372 0.385 0.380 0.385 Model For this model, or a similar model derived using the full data set, to be applied to the census data, it is essential that the information in the census be collected so that variables identical to 11 These were: a telephone, video, table, cupboards, electric iron, watches, wheelbarrow, books, fishing net, harrow, hoe, livestock, bicycle, complete music system, radio, fridge, lantern and motor-cycle. 10 the HBS variables can be constructed in the census data. It should also be remembered that the application of these models is computationally intensive. The addition to model 4 of information on distance to clinics and water in the dry season improved the explanatory power of the model marginally for the whole sample and by around one percentage point in rural areas. It was felt that this information should be added only if it provided a useful indicator of access to services in its own right. Table 6 Determinants of expenditure using variables selected in model 4 Whole Population Expenditure Expenditure per adult per household equivalent Coeff. Sig. -2,087 ** 6,127 ** 62 ** -94 -7,161 ** -16,476 ** 6,364 ** 18,727 ** -1,567 1,550 -2,015 * -6,529 * 2,578 26,828 ** 2,381 23,565 ** -416 ** 204 * 4 ** -8,128 ** -23,377 ** -7,199 ** -20,936 ** 5,524 * 15,952 3,985 8,896 336 324 8,960 * 79,500 ** 1,658 4,906 2,987 ** 4,815 -5,376 * -12,388 22,388 ** 4,667 2,884 1,924 3,051 Rural Areas Expenditure Expenditure per adult per household equivalent Coeff. Sig. -2,016 ** 3,886 ** 55 ** -5,273 * -13,619 * 6,327 ** 17,290 ** -2,319 * -5,598 1,847 -2,254 * -6,719 * 4,837 47,938 ** 24,057 ** -287 180 3 Household Size Household Size - squared Dependency Ratio Modern Roof Modern Wall Material Modern Toilet Modern Lighting Head Employed Head Self-employed Age of Household Head Age of Household Head - squared Lives in Rural Area Lives in Urban Area outside Dar Level of Education: Form 1-4 6,762 19,473 Level of Education: Form 5 plus 3,246 Any Form of Radio -325 2,600 Telephone 1,454 76,264 Bicycle 2,005 4,102 Hoes 1,730 659 Wheel-barrow -6,553 ** -13,242 Head Works on Own Farm 21,239 Level of Education: Standard 5-8 2,844 Male Household Head 3,780 No. of bedrooms 2,170 Number of Employed Adults in the 1,203 Household Electric/charcoal iron 3,330 * 11,295 3,780 * 12,431 Constant 38,134 ** -2,687 26,923 ** -13,022 R-squared 0.41 0.38 0.36 0.37 Observations 1213 1213 561 561 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level * ** Section 6 Conclusions The objective of this work was to inform the development of the census form to collect additional information that would be useful for poverty monitoring – particularly for estimating income poverty levels for small geographic areas. To do this, models of household expenditure were developed using a part of the HBS 2000/01 data. These models predict household expenditure using data that is available in both the census and the HBS. The work was carried out in two stages. The first stage used HBS data that was already available. It replicated the work carried out on the 1991/92 HBS data. It also extended it in a number of ways, establishing that the model of the 2000/01 data could be improved by disaggregating a number of variables. Amongst these, it was shown that the addition of 11 information on the ownership of a range of assets appreciably improved the explanatory power of the models. Although many of the variables that can be used to predict household expenditure were already in the census, the modelling showed that the addition of information on assets would improve its utility for poverty mapping. However, the modelling exercises did not provide an automatic means of deciding which or how many assets should be added to the census form. There is no cut-off proportion of the variance that ensures that a model is adequate. Furthermore, the assets selected by the regression models vary substantially according to which dependent variable is used and the stratum. It would be prohibitively expensive to collect information on all of these. It was argued that the selection of the assets, while being informed by the modelling exercise, should be based on wider considerations than simply the modelling results. At this point, the Research and Analysis working group reviewed the work. The group provided some criteria by which the assets could be selected. It was also decided to extract some additional variables from the HBS data, so that the models could use more of the information that would already be available in the census. Further models were then developed using this data. A set of alternatives, using a maximum of six assets, was developed. One particular set of six was recommended for addition to the census; this set usually maximised the proportion of the variance explained and conformed with the principles agreed with the working group. 12 Appendix1 Modelling expenditure by stratum Table A1 estimates expenditure per adult equivalent across the three strata. Caution must be exercised when interpreting these results due to the small sample size. In particular the estimation for Dar es Salaam, which reports an r-squared of 0.67, fails the regression F-test due to the inclusion of too many variables. Table A1 Determinants of expenditure per adult equivalent for (1) Dar es Salaam, (2) other urban areas, and (3) rural areas Household Size Household Size – squared Dependency Ratio Modern Floor Fishing net and other equip. Modern Roof Modern Wall Material Motor vehicles Hoes Tractor Head Self-employed Fields/Land Record/cassette player, tape recorder Male Household Head Age of Household Head Age of Household Head - squared No. of rooms Distance to Water Spraying machine Plough, etc Dish antenna / decoder Level of Education: Form 1-4 Level of Education: Form 5 plus Radio and radio cassette Telephone Cart Wells Motor cycle Beds Livestock Cupboards, wardrobes, drawers, etc Books (not school books) Chairs Computer Cooking pots, cups, other kitchen utensils Hand milling machine Lanterns, lamps, etc Level of Education: Standard 5-8 Modern Foundation Mosquito net Other Poultry Refrigerator or freezer Watches Water heater Bicycle Distance to Health Clinic Dar es Salaam Coeff. Sig. -5,731 ** 321 ** -13,227 4,749 15,868 ** 13,078 -6,160 * 11,349 * 5,226 34,411 * 6,494 * 3,805 42,147 ** -3,564 -1,502 * 16 * 1,078 3,447 ** -12,654 * 14,225 ** 4,956 5,711 6,061 * 6,777 ** 6,831 * 16,696 -12,159 * -22,173 ** -9,952 9,381 ** 3,742 Other urban areas Coeff. Sig. -3,947 ** 130 ** 4,972 * 6,946 * -7,358 * 17,018 5,634 * ** 2,527 * -3,374 ** -243 3 660 * 233 10,598 -4,121 ** * * 7,237 9,063 * -9,332 ** -7,256 * -7,264 ** 3,181 3,071 -3,767 21,309 3,636 -7,691 3,225 -2,013 -3,017 3,146 4,680 5,366 6,349 4,924 2,091 13 Rural areas Coeff. Sig. -1,825 ** 47 ** -7,359 ** 2,925 -2,910 * ** ** -4,592 ** * ** * ** ** -6,168 2,073 * 1,837 128 ** * ** Donkeys 6,079 Electric/charcoal iron 3,846 Ownership of more than one house 2,917 Modern Toilet 1,629 Sewing machine -5,424 Table 2,498 Video 14,372 Water pumping set -3,058 Wheel-barrow -4,328 Constant 58,803 ** 27,917 ** 21,869 R-squared 0.67 0.47 0.45 Observations 280 379 564 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level * * ** * ** ** ** ** Appendix 2 Modelling the likelihood that households are poor This model uses the binary dependent variable which takes the value of 1 if the household is below the poverty line and 0 if its expenditure lies above the line. The coefficients here indicate the effect of the independent variables on the likelihood that households are poor. A positive coefficient indicates that increasing values of the variable increase the likelihood of poverty. A negative coefficient indicates that increasing values reduce the likelihood that households are poor. Table A2 reports the results of estimating the same specification as in the 1991/92 analysis. The results are similar, as is the predictive power of the model. Table A2 Logistic regression of Poverty using 1991/92 specification Coeff. Sig. Weighted Index of Assets -0.02 ** Household Size 0.29 ** Dependency Ratio 1.07 * Modern Floor -0.73 Distance to Health Clinic -0.02 Modern Wall Material 0.46 Modern Roof -1.11 ** Distance to Water -0.10 Male Household Head -0.48 Constant -0.92 * Pseudo r-squared 0.25 Predictive power 75% Observations 1223 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level 14 Table A3 estimates the likelihood of being poor using the expanded model specification. With the addition of 11 assets the pseudo r-squared and the predictive power of the model are improved. Table A3 Logistic Regression of Poverty using the revised specification Coeff. Sig. Household Size 0.34 Tractor 1.89 Dependency Ratio 1.36 Electric/charcoal iron -0.77 Bicycle -0.78 Modern Roof -1.51 Spraying machine -0.99 Motor cycle -5.00 Sofa set -1.76 Head Employed -1.68 Head Self-employed -0.99 Head Works on Own Farm -1.05 Watches -0.74 Books (not school books) 0.57 Age of Household Head 0.07 Age of Household Head – squared -0.00 Lives in Rural Area -1.44 Lives in Urban Area -1.09 House(s) -0.74 Distance to Water -0.05 Distance to Health Clinic -0.03 Wheel-barrow 1.74 Bee hives 1.76 Livestock 0.94 Lanterns, lamps, etc -0.77 Dish antenna / decoder -1.11 Mosquito net -0.67 Refrigerator or freezer -2.95 Sewing machine 1.93 Constant -0.61 Pseudo r-squared 0.37 Predictive power 82% Observations 1209 Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level 15 ** * * * * ** ** ** * * ** ** ** ** * ** **