Income and poverty status predictors for Tanzania

advertisement
Income and poverty status predictors
for Tanzania
National Bureau of Statistics, Tanzania
and
Oxford Policy Management Ltd, UK
With support from the UK Department for International
Development
June 2001
Contents
Section 1 Introduction ................................................................................................................ 2
Section 2 Using population census data in poverty monitoring ................................................ 2
Section 3 Modelling household expenditure.............................................................................. 4
Section 4 Refining the specification for modelling household expenditure .............................. 6
Section 5 Comparison of a set of alternative models ................................................................. 8
Section 6 Conclusions .............................................................................................................. 11
Appendix1 Modelling expenditure by stratum ........................................................................ 13
Appendix 2 Modelling the likelihood that households are poor .............................................. 14
Tables
Table 1 Determinants of expenditure per adult equivalent and per household: initial model................. 5
Table 2 Determinants of expenditure using additional variables ............................................................ 6
Table 3 Determinants of expenditure using expanded explanatory variables, including assets ............. 8
Table 4 Modelling determinants of expenditure using information already in the census form............. 9
Table 5 Proportions of variance explained by alternative models of expenditure ................................ 10
Table 6 Determinants of expenditure using variables selected in model 4 ........................................... 11
Table A1 Determinants of expenditure per adult equivalent for (1) Dar es Salaam, (2) other urban
areas, and (3) rural areas ................................................................................................................ 13
Table A2 Logistic regression of Poverty using 1991/92 specification ................................................. 14
Table A3 Logistic Regression of Poverty using the revised specification ............................................ 15
1
Section 1 Introduction
There is growing interest in predicting household expenditure and poverty measures for small
geographical areas by combining detailed information from a household budget survey with
the more limited data, but more extensive coverage, of a Population Census.1 Tanzania is
planning to undertake a Population Census in 2002; this data may be used for this type of
poverty mapping. The National Bureau of Statistics decided to use the available data from
the 2000/01 Household Budget Survey to identify the most important variables for predicting
expenditure and poverty status. Information on these predictors would then be incorporated
into the census questionnaire, to maximise its usefulness for poverty mapping. 2
The work was carried out in two stages. A first stage of modelling was undertaken using the
available HBS data. This sample contained around 1,300 households. The results were then
presented to the Research and Analysis Working Group of the Poverty Monitoring Steering
Committee. The group gave its opinion on a number of issues; on this basis, the final
modelling was undertaken.
This report is split into 6 sections. Section 2 outlines the analytical approach taken and its
limitations. It also places the work in the context of more general poverty monitoring issues
surrounding the use of the Population Census data. In Section 3, regression models run on
the 1991/92 HBS data are replicated as closely as possible for the 2000 data. 3 Section 4
refines the specification by disaggregating some of the variables used in the earlier model.
Section 5 compares a number of models and evaluates the improvement in explanatory power
relative to the data that is already collected in the census. Section 6 concludes.
Section 2 Using population census data in poverty monitoring
The census provides a unique data source for poverty monitoring because it covers every
household in the country. For this reason, it can provide highly disaggregated welfare
information, down to the district level and below. This is particularly important in the
context of decentralisation, where district administrations will increasingly be held
accountable for the provision of services. Census data may be used to inform the allocation
of resources, to monitor performance and to cross check administrative data. The 2002
Census may, in particular, provide a baseline for district administrations as decentralisation
begins. While the census provides this opportunity, the amount of data that can be collected
must be limited because of the scale of the operation.
Many of the non-income poverty measures defined by the PRSP and the Vice-President’s
Office are included in census form. These include literacy and the level of education
achieved; access to an adequate water supply and sanitation; mortality (infant, under five,
adult) and life expectancy. However, a case can be made for the addition of a number of
other non-income measures. These include measures of health sector outputs and further
information on enrolment.4
See, for example: Alderman, H. et al, (2000) ‘Is Census Income an Adequate Measure of Household Welfare?
Combining Census and Survey Data to Construct a Poverty Map of South Africa’, Statistics South African and
World Bank; and Hentschel, J., Lanjouw, J.O., Lanjouw, P. and Poggi, J. (2000) ‘Combining Census and
Survey Data to Study Spatial Dimensions of Poverty: A Case Study of Ecuador’, World Bank Economic
Review, Vol. 14, No. 1:147-165.
2
The work was carried out by Trudy Owens and Patrick Ward of OPM.
3
The earlier models are reported in ‘Developing a Poverty Baseline in Tanzania’ May 2000, NBS and OPM.
4
See ‘ Tanzania Census Form 2001: Comments on its usefulness for poverty monitoring’, P Ward, OPM, May
2001 (mimeo) for further discussion of these issues.
2
1
High quality data on income and expenditure cannot be reliably collected in a census. It is
possible, however, to use modelling to predict household expenditure and poverty status from
other variables which can be collected in a census. Expenditure is considered a more reliable
measure of household consumption – and welfare - than reported income. The modelling is
carried out using HBS data, which includes information on both household expenditure and
on the predictor variables. Once the census has collected data on the predictor variables, the
model parameters may be used to estimate expenditure, poverty and distributional measures
for small geographic areas, including districts and below.
The modelling was carried out on a data set of around 1,200 households, which represent the
data collected in the ‘national sample’ during the first three months of the HBS. Models are
developed for two main outcome variables: total household consumption expenditure and
expenditure per adult equivalent. Expenditure measures include the value of home produced
items consumed. 5 In addition, one model is developed for predicting whether households are
poor or not. However, the main focus is on the expenditure measures, which provide more
powerful and flexible measures of welfare. 6 The models are developed for the population as
a whole and separately for the rural population, since the latter represents around 80 percent
of the total population. Models are also presented in the Appendix for the other two HBS
strata, i.e. Dar es Salaam and other urban areas.
The objective of this work was to inform the development of the 2002 census questionnaire.
The models were used to assess the predictive value of data that was already included in the
Census and then to examine which additional information should be included in order to
increase the models predictive power. The procedure used was to apply forwards stepwise
selection of variables on each set of variables defined for the alternative models. Variables
were included in the model if their associated p-value was less than 0.2.
There are a number of limitations to this work. The modelling process identifies variables
that correlate with household expenditure. However, the relationship is not necessarily causal
– changes in the predictor variables will not necessarily produce a change in household
expenditure. Ownership of consumer goods, for example, is a consequence of household
income rather than a determinant of it. The models are intended to provide geographically
disaggregated welfare measures, although these methods must be considered experimental,
since they are still not well tested. The predictors may also provide useful proxies for
inclusion in other surveys where it would be impossible to collect full information on
household consumption but where distributional issues may be of interest. This is done in the
CWIQ survey, for example, and could equally be used to look at health indicators by
(estimated) household expenditure category, for example, in a DHS.
However, because the models are not causal, we do not believe that they can be relied upon
for tracking income poverty over time. This should be left to periodic household budget
surveys.
It should be remembered that the data set is small and covers only three months of the year.
It is hoped that the modelling will be repeated using the full data set once it is available. This
should provide more robust estimates of the model parameters and will allow the model to be
disaggregated more reliably by stratum and, possibly, by region.
5
This work builds on an interim analysis of a part of the 2000/01 HBS data set. For further details of the data
and the expenditure and poverty measures see ‘Trends in Poverty and Social Indicators: Tanzania 1991/92 –
2000, A Preliminary Analysis’ (Draft) National Bureau of Statistics and Oxford Policy Management, April 2001
(mimeo).
6
Models of poverty status also provided a much less sensitive means for selecting variables for inclusion in the
census form and, after initial exploration, were not used extensively in this work.
3
In addition to these limitations, we have not found any guidelines for defining an ‘adequate’
model, which can be counted upon to produce disaggregated estimates of a given accuracy –
there is no particular ‘R-squared’ value that is defined as adequate. The R-squared measures
can also be increased by increasing the p-value at which a variable is allowed to remain in the
model and by using the log of expenditure as the dependent variable. We have also used
more than one dependent variable and the models cover more than one stratum. These
different models frequently select different variables.
For these reasons, the selection of additional variables to be collected in the census cannot be
made mechanically. There is a strong case for collecting information that is of intrinsic
interest, even at the expense of reducing somewhat the explanatory power the models although informed by the modelling exercise. We return to this issue in Section 5.
Section 3 Modelling household expenditure
Using the 1991/92 HBS data, earlier work estimated models for two variables: expenditure
per adult equivalent and the likelihood of being poor.7 This was carried out for Mainland
Tanzania and each of the three strata individually. For the expenditure variable, the
parameters were estimated using linear multiple regression with stepwise selection of
variables. For poverty status, a binary dependent variable, parameters were estimated using a
logistic procedure, once again using stepwise selection of variables.
In the earlier analysis, nineteen variables were included in the first estimation. These were:
 Household size, as measured as the number of household members
 Household dependency ratio
 The number of rooms in the dwelling unit occupied by the household
 The distance to drinking water
 The distance to the nearest market
 The distance to the nearest clinic or dispensary
 A dummy variable if the household head was educated beyond standard 4
 A dummy variable for households in rural areas
 A dummy variable for households in other urban areas
 The number of household members employed or in self-employment (outside the farm)
 A dummy variable for whether the household head was employed or self-employed
 A dummy variable for whether the household head was inactive
 A dummy variable if the household head was over 60
 A number of dummy variables to denote the use of modern building materials for the
foundations, floor, walls and roof
 A weighted index of ownership of consumer goods.
Using the stepwise procedure 11 of these were excluded as not significant in the 1991/92
estimation for Mainland Tanzania.
The same analysis was repeated using the 2000 data. Table 1, column 1, reports the results of
estimating this same specification for 2000. Explanatory variables that did not have a
significant coefficient for any model are omitted entirely from the table. Variables which
were significant for at least one model are included in the table; a coefficient is entered
against them for each model in which they are significant (using p<=0.2). The results are
similar to those found in the 1991/92 analysis. Household size, whether the head has a job,
whether the household lives in rural or other urban areas and the dependency ratio continue to
7
‘Developing a Poverty Baseline in Tanzania’ cited in fn 3.
4
be significant variables in explaining expenditure per adult equivalent. The picture is more
mixed for modern building materials. Using concrete for foundations and floors no longer
appear to be significant explanatory variables. However, the use of a modern roof or wall
material do appear to be significant variables in explaining expenditure, although the sign of
the latter is not in the direction that would be expected. In addition, the distance to water in
the dry season and distance to a clinic or dispensary also appear as significant explanatory
variables, although once again with signs that would not have been expected. Together, these
variables explain 35 per cent of the variation in expenditure per adult equivalent. This is a
much higher proportion explained than in the 1991/92 data set, where it was only 5 percent.
Concern that we may be introducing a spurious correlation by including household size as an
explanatory variable when it is also the denominator of the dependent variable lead us to
estimate the same specification without household size.8 The results are reported in column 2
of Table 1. This made little difference to the model, although three additional variables
became significant in explaining expenditure. These are: if the household head was over 60,
the number of employed adults in the household, and the number of rooms in the dwelling.
The negative coefficients of the number of employed individuals and the number of rooms
are not what would be expected and may be because these variables are acting as proxies for
household size. For this reason, the models constructed in later sections retained household
size as an explanatory variable.
Table 1 Determinants of expenditure per adult equivalent and per household:
initial model
Expenditure per
adult equivalent
(1)
Coeff.
Sig.
57
**
-1,029
**
-9,216
**
241
*
-7,635
**
-1,621
6,041
**
2,457
-6,814
**
Expenditure per
adult equivalent
(2)
Coeff.
Sig.
50
**
Expenditure per
household
Coeff.
Sig.
Weighted Index of Assets
303
**
Household Size
4,817
**
Dependency Ratio
-14,943
** -11,734
*
Distance to Water
251
*
529
**
Lives in Rural Area
-6,856
* -21,351
**
Modern Wall Material
Modern Roof
6,488
**
16,193
*
Head Employed or Self-employed
4,440
**
Lives in Urban Area outside Dar es Salaam
-6,172
* -22,361
**
Household Head Over 60
2,823
No. of rooms
-697
**
Number of Employed Adults in the Household
-1,242
*
3,402
Head Inactive
-18,425
**
Male Household Head
5,275
Distance to Health Clinic
77
*
100
**
404
**
Constant
26,547
** 24,064
**
19,590
*
R-squared
0.35
0.31
0.38
Number of observations
1223
1223
1223
Notes: (1) Including household size as an explanatory variable.
(2) Excluding household size as an explanatory variable.
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
We also estimated the regression using total household expenditure as the dependent variable
and including household size as an explanatory variable. The results are reported in column 3
of Table 1. Again the results are broadly similar, although household size now has a positive
coefficient since larger households will tend to have a larger total expenditure (while they
tend to have a lower expenditure per adult equivalent). The explanatory power of the model
8
In an additional specification we examined the effect of excluding the dependency ratio as an explanatory
variable. It made no appreciable changes to the results.
5
increases to 0.38. Again, a number of additional variables become significant in addition to
those in the first model. These include the number of employed household members, whether
the head is male or inactive and the number of rooms in the dwelling.
These exercises established that the HBS 2000 data produces broadly similar models to the
1991 data. They also suggested that household size should be retained as an explanatory
variable. In the following sections, we go on to extend these results using both expenditure
per adult equivalent and expenditure per household as the dependent variables.
Section 4 Refining the specification for modelling household
expenditure
This section refines the models estimated in Section 3. A number of the variables that were
used in those models were disaggregated in order to use more of the information available.
This was done with information on the education, age and occupation of the head.
Information on household sources of income and housing materials was also explored. The
index of assets owned was also disaggregated. The use of a non-linear term for age was also
explored. Stepwise regression was used to select variables.
Table 2 presents the results, for those variables that were added to the model. Additional
dummies for the level of education of the household head were significant. These used heads
with no education as the comparative group and had dummies for standard 1-4, standard 5-8,
form1-4 and form 5 plus. Age of the head, and age squared, were also included; household
size squared was also included.
Table 2 Determinants of expenditure using additional variables
Expenditure per adult
equivalent
Coeff.
Sig.
-2,060
**
63
**
49
**
-7,902
**
2,557
2,086
5,840
**
-1,703
4,280
-412
*
4
**
-7,310
**
-6,807
**
1,764
4,188
Expenditure per
household
Coeff.
Sig.
4,808
**
Household Size
Household Size – squared
Weighted Index of Assets
303
Dependency Ratio
-14,960
Head Employed
24,202
Head Self-employed
20,546
Modern Roof
15,734
Modern Wall Material
Level of Education: Form 5 plus
Age of Household Head
177
Age of Household Head – squared
Lives in Rural Area
-21,815
Lives in Urban Area outside Dar
-22,254
Modern Toilet
Level of Education: Form 1-4
Distance to Health Clinic
373
Head Works on Own Farm
21,500
Male Household Head
5,412
Number of Employed Adults in the Household
3,081
Distance to Water
255
**
569
Constant
36,736
**
-7,636
R-squared
0.39
0.38
Observations
1223
1223
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
**
*
*
**
*
**
**
**
**
The disaggregation or addition of some other variables did not add to the specification.
Producing additional dummy variables by further disaggregation of the main activity of the
6
household head did not add to the specification. Information on the source of household
income did not add significantly to the information provided by the activity of the household
head. Splitting the dummies on materials used in the construction of the dwelling did not
improve the specification. For this reason we decided to continue to use the previous
dummies of modern materials in foundation, floor, wall and roof construction. Similarly,
including ten dummies for types of water supply was found to be no more effective than
including a dummy for protected water supply, using those who had an unprotected water
supply as the comparative group.
Overall, the explanatory power of the model improves for expenditure per adult equivalent –
39 per cent of the variation in expenditure is explained by this new set of variables, compared
to 35 percent previously. The key variables improving the explanatory power appear to be
the dummies on levels of education and using the actual age of the head rather than the
dummy for whether the head is over 60. There is very little change to the model using
expenditure per household as the dependent variable.
In the previous estimations, assets were included as a weighted index, replicating their
inclusion in the Poverty Baseline report. The next step was to include individual assets
separately, since it would not be practical for the census to collect information on all of the
assets that go to make up the index. Instead, the objective was to identify which particular
assets should be recorded in the census. Table 3 includes dummy variables for whether a
household owned each individual asset. This improves the explanatory power of the models.9
For the model of expenditure per adult equivalent, twelve individual assets are included in the
specification increasing the r-squared 0.39 to 0.45. For total household expenditure 8 assets
are added increasing the r-squared from 0.38 to 0.43. Unfortunately, the assets that are
selected in the two regressions are usually different ones.
A model of expenditure per adult equivalent was also developed separately for each stratum –
that is, for Dar es Salaam, other urban areas and rural areas (see Appendix 1). This shows
that the nine assets selected for the whole mainland population are not always selected for
each of the stratum. To collect information on all assets that appear as significant in each of
those models would involve collecting information on 38 items. The logistic regression
model which predicts household poverty status also selects yet other assets (Appendix 2).
The modelling undertaken so far established that the addition of information on the
ownership of individual assets would appreciably increase the amount of variance explained
by the models. There was a good case for collecting some additional information on the
ownership of assets in the census. However, the cost of the data collection must also be
considered and only a limited number of variables could be added to the census
questionnaire. As discussed in section 2, there are limitations to the modelling, including the
small sample size and the difficulty in defining a cutoff proportion of variance explained to
define an ‘adequate’ model. Since the different models also identify different assets, it was
concluded that the modelling alone did not provide a sufficient basis for choosing the assets
for inclusion in the census form.
At this point, the results produced were presented to the Research and Analysis working
group. The group took a number of decisions which formed the basis for the final round of
the modelling.
9
We also included the number of agricultural plots, number of livestock, number of donkeys and number of
poultry owned by the household rather than a dummy for whether they owned any or not. The r-squared
remained virtually unchanged. We therefore left the variables as dummies, which would be easier to collect in a
census if they were selected.
7
Table 3 Determinants of expenditure using expanded explanatory variables,
including assets
Expenditure per adult
equivalent
Coeff.
Sig.
-2,089
**
59
**
-8,127
**
1,928
4,054
*
5,420
**
-1,129
1,927
-5,174
**
4,263
2,020
*
-1,922
-4,479
**
12,113
-423
**
4
**
-7,508
**
-6,677
**
2,312
**
218
*
88
*
-4,848
1,884
*
4,500
2,885
2,669
*
5,564
2,593
*
2,206
*
1,530
5,298
*
-1,910
*
Expenditure per
household
Coeff.
Sig.
4,909
**
Household Size
Household Size – squared
Dependency Ratio
-19,090
**
Livestock
Complete music system
39,621
*
Modern Roof
16,224
**
Modern Wall Material
Modern Toilet
Hand milling machine
-18,351
**
Donkeys
Watches
5,844
Fields/Land
-8,806
Wheel-barrow
-11,524
Fishing net and other equipment
20,847
Age of Household Head
151
Age of Household Head - squared
Lives in Rural Area
-16,726
*
Lives in Urban Area
-16,850
**
Table
Distance to Water
369
Distance to Health Clinic
447
**
Cart
Ownership of more than one house
5,071
Level of Education: Form 1-4
Level of Education: Form 5 plus
Hoes
8,098
Telephone
39,919
*
Electric/charcoal iron
9,922
Cupboards,wardrobes,drawers,etc
8,943
*
Bicycle
5,042
Video
30,729
*
Books (not school books)
-5,821
Head Employed
18,816
**
Head Self-employed
19,776
**
Head Works on Own Farm
18,633
**
Lanterns, lamps, etc
9,193
*
Motor cycle
19,503
**
Present business working capital
6,988
Refrigerator or freezer
31,683
*
Constant
35,098
**
600
R-squared
0.45
0.43
Observations
1223
1223
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level 10
Section 5 Comparison of a set of alternative models
The Research and Analysis group reviewed the work undertaken to this point. It accepted
that there was a case for selecting assets that were of intrinsic interest. It was decided that a
total of six assets would be added to the census questionnaire and that their selection should
take into account the list of non-income poverty measures defined by the Vice President’s
Office and the PRSP. Four of the assets should be selected from these indicators and should
10
Assets that were owned by less than 1 per cent of the sample were excluded. In most cases the stepwise
regression dropped them automatically. These were: feeding machine, incubator, harrow, boat, boat engine,
coffee pulping machine and reaper.
8
reflect some key aspects of poverty. These aspects included access to transport and
communication and the ownership of productive assets. They should also be common, rather
than rare, assets. The choice should be informed by the previous modelling. The remaining
two assets would be selected based largely on their capacity to increase the variance
explained by the model, once the first four had been included.
It was also decided to return to the HBS data and to extract some additional information on
variables that were already included in the census form but that it had not been possible to
include in the earlier models (on cooking and lighting fuels). The variable on the number of
rooms in the house was also replaced with the number of bedrooms, since this is what is
collected in the Census.
A number of models were developed on this basis. The first model used only those variables
that were already available in the Census. These were: household size, age, education and
employment of the head, type of materials used in house construction, type of toilet, and
whether drinking water is protected. The results of these estimations are reported in Table 4.
Note the sample size drops to 1213 due to the inclusion of variables on cooking fuel and
lighting. This model acts as the baseline, being the model that can be derived without any
additions to the census form. It explains 37 per cent of expenditure per adult equivalent, and
34 per cent of per household expenditure. In rural areas the basic model explains 32 and 35
per cent of the variance respectively.
Table 4 Modelling determinants of expenditure using information already in the
census form
Whole Population
Expenditure
Expenditure
per adult
per household
equivalent
Coeff.
Sig.
Coeff.
Sig.
-1,884
**
6,599
**
61
**
-99
-8,370
** -19,196
**
4
*
5,561
*
17,798
*
7,019
**
20,930
**
-1,376
2,084
*
5,661
*
18,907
*
-2,145
*
-7,082
*
-8,470
** -28,563
**
3,924
*
30,672
**
2,757
*
24,019
**
-7,319
** -24,738
**
23,147
**
6,082
4,578
2,482
4,400
Rural Population
Expenditure
Expenditure
per adult
per household
equivalent
Coeff.
Sig.
Coeff.
Sig.
-1,956
**
4,216
**
60
**
-5,774
*
-15,036
*
3
3,861
6,981
**
20,666
**
-2,301
-5,344
2,121
*
6,726
22,846
-2,344
*
-7,575
*
Household Size
Household Size - squared
Dependency Ratio
Age of Household Head - squared
Level of Education: Form 5 plus
Modern Roof
Modern Wall Material
Modern Toilet
Level of Education: Form 1-4
Modern Lighting
Lives in Rural Area
Head Employed
7,098
*
55,267
Head Self-employed
23,977
Lives in Urban Area outside Dar
Head Works on Own Farm
21,186
Level of Education: Standard 5-8
5,308
Male Household Head
5,328
No. of bedrooms
2,573
Number of Employed Adults in
1,171
the Household
Age of Household Head
-394
*
194
*
-282
198
Constant
39,522
**
2,447
28,108
**
-14,823
R-squared
0.37
0.34
0.32
0.35
Observations
1213
1213
561
561
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
**
**
**
*
Subsequent models are then compared to this baseline. Table 5 compares the proportion of
the variance explained for these alternative models, for both the whole population and the
9
rural population. The second model extends the first by allowing all assets to be included,
resulting in substantial improvements in the proportion of the variance explained. The third
model uses a more limited set of nineteen assets that were found to be consistently good
predictors, based on the overall and stratum specific models produced earlier.11 The model
that uses this set of nineteen assets also explains a substantial fraction of the variance.
However, nineteen was still too many to be included in the census form; subsequent models
assess the explanatory power of using only six additional variables, selected in line with the
principles set out above.
The first of these – model 4 – includes a telephone, a radio (including one contained in a
complete music system – these two variables were combined), a bicycle, a hoe, a wheelbarrow and an iron (electric or charcoal). As would be expected, the predictive power is
lower than the model using all nineteen assets. In the population as a whole, some 41 and 38
per cent percent of the variance is explained for expenditure per adult equivalent and per
household respectively; the corresponding figures are 36 and 37 percent for rural areas.
However, these proportions are still appreciably higher than for the baseline model without
any additional information on assets.
Subsequent models examine the consequence of replacing some of the items used in model 4.
The effects of substituting particular items are often small, which gives some scope for using
other assets if required. However, model 4 usually explains the highest proportion of
variance and also conforms most closely to the principles agreed with the working group.
The details of model 4 are given in Table 6.
Table 5 Proportions of variance explained by alternative models of expenditure
Expenditure
per adult
equivalent
Household
expenditure
Mainland Tanzania – whole sample
1. Basic – existing census data
2. Basic plus all assets
3. Basic plus 19 most often significant assets
4. Basic plus telephone, bike, hoe, wheelbarrow, iron, radio
5. Basic plus telephone, car, hoe, wheelbarrow, iron, radio
6. Basic plus telephone, car, lantern, wheelbarrow, iron, radio
7. Basic plus telephone, car, table, wheelbarrow, iron, radio
8. Model 4 plus distances
0.372
0.455
0.444
0.408
0.406
0.397
0.399
0.416
0.341
0.442
0.437
0.383
0.390
0.399
0.390
0.389
Rural areas
1. Basic – existing census data
2. Basic plus all assets
3. Basic plus 19 most often significant assets
4. Basic plus telephone, bike, hoe, wheelbarrow, iron, radio
5. Basic plus telephone, car, hoe, wheelbarrow, iron, radio
6. Basic plus telephone, car, lantern, wheelbarrow, iron, radio
7. Basic plus telephone, car, table, wheelbarrow, iron, radio
8. Model 4 plus distances
0.319
0.449
0.429
0.359
0.353
0.356
0.358
0.373
0.346
0.442
0.423
0.372
0.372
0.385
0.380
0.385
Model
For this model, or a similar model derived using the full data set, to be applied to the census
data, it is essential that the information in the census be collected so that variables identical to
11
These were: a telephone, video, table, cupboards, electric iron, watches, wheelbarrow, books, fishing net,
harrow, hoe, livestock, bicycle, complete music system, radio, fridge, lantern and motor-cycle.
10
the HBS variables can be constructed in the census data. It should also be remembered that
the application of these models is computationally intensive.
The addition to model 4 of information on distance to clinics and water in the dry season
improved the explanatory power of the model marginally for the whole sample and by around
one percentage point in rural areas. It was felt that this information should be added only if it
provided a useful indicator of access to services in its own right.
Table 6 Determinants of expenditure using variables selected in model 4
Whole Population
Expenditure
Expenditure
per adult
per household
equivalent
Coeff.
Sig.
-2,087
**
6,127 **
62
**
-94
-7,161
**
-16,476 **
6,364
**
18,727 **
-1,567
1,550
-2,015
*
-6,529
*
2,578
26,828 **
2,381
23,565 **
-416
**
204
*
4
**
-8,128
**
-23,377 **
-7,199
**
-20,936 **
5,524
*
15,952
3,985
8,896
336
324
8,960
*
79,500 **
1,658
4,906
2,987
**
4,815
-5,376
*
-12,388
22,388 **
4,667
2,884
1,924
3,051
Rural Areas
Expenditure
Expenditure
per adult
per household
equivalent
Coeff.
Sig.
-2,016 **
3,886 **
55 **
-5,273
*
-13,619
*
6,327 **
17,290 **
-2,319
*
-5,598
1,847
-2,254
*
-6,719
*
4,837
47,938 **
24,057 **
-287
180
3
Household Size
Household Size - squared
Dependency Ratio
Modern Roof
Modern Wall Material
Modern Toilet
Modern Lighting
Head Employed
Head Self-employed
Age of Household Head
Age of Household Head - squared
Lives in Rural Area
Lives in Urban Area outside Dar
Level of Education: Form 1-4
6,762
19,473
Level of Education: Form 5 plus
3,246
Any Form of Radio
-325
2,600
Telephone
1,454
76,264
Bicycle
2,005
4,102
Hoes
1,730
659
Wheel-barrow
-6,553 **
-13,242
Head Works on Own Farm
21,239
Level of Education: Standard 5-8
2,844
Male Household Head
3,780
No. of bedrooms
2,170
Number of Employed Adults in the
1,203
Household
Electric/charcoal iron
3,330
*
11,295
3,780
*
12,431
Constant
38,134
**
-2,687
26,923 **
-13,022
R-squared
0.41
0.38
0.36
0.37
Observations
1213
1213
561
561
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
*
**
Section 6 Conclusions
The objective of this work was to inform the development of the census form to collect
additional information that would be useful for poverty monitoring – particularly for
estimating income poverty levels for small geographic areas. To do this, models of
household expenditure were developed using a part of the HBS 2000/01 data. These models
predict household expenditure using data that is available in both the census and the HBS.
The work was carried out in two stages. The first stage used HBS data that was already
available. It replicated the work carried out on the 1991/92 HBS data. It also extended it in a
number of ways, establishing that the model of the 2000/01 data could be improved by
disaggregating a number of variables. Amongst these, it was shown that the addition of
11
information on the ownership of a range of assets appreciably improved the explanatory
power of the models. Although many of the variables that can be used to predict household
expenditure were already in the census, the modelling showed that the addition of
information on assets would improve its utility for poverty mapping.
However, the modelling exercises did not provide an automatic means of deciding which or
how many assets should be added to the census form. There is no cut-off proportion of the
variance that ensures that a model is adequate. Furthermore, the assets selected by the
regression models vary substantially according to which dependent variable is used and the
stratum. It would be prohibitively expensive to collect information on all of these. It was
argued that the selection of the assets, while being informed by the modelling exercise,
should be based on wider considerations than simply the modelling results.
At this point, the Research and Analysis working group reviewed the work. The group
provided some criteria by which the assets could be selected. It was also decided to extract
some additional variables from the HBS data, so that the models could use more of the
information that would already be available in the census. Further models were then
developed using this data. A set of alternatives, using a maximum of six assets, was
developed. One particular set of six was recommended for addition to the census; this set
usually maximised the proportion of the variance explained and conformed with the
principles agreed with the working group.
12
Appendix1 Modelling expenditure by stratum
Table A1 estimates expenditure per adult equivalent across the three strata. Caution must be
exercised when interpreting these results due to the small sample size. In particular the
estimation for Dar es Salaam, which reports an r-squared of 0.67, fails the regression F-test
due to the inclusion of too many variables.
Table A1 Determinants of expenditure per adult equivalent for (1) Dar es
Salaam, (2) other urban areas, and (3) rural areas
Household Size
Household Size – squared
Dependency Ratio
Modern Floor
Fishing net and other equip.
Modern Roof
Modern Wall Material
Motor vehicles
Hoes
Tractor
Head Self-employed
Fields/Land
Record/cassette player, tape recorder
Male Household Head
Age of Household Head
Age of Household Head - squared
No. of rooms
Distance to Water
Spraying machine
Plough, etc
Dish antenna / decoder
Level of Education: Form 1-4
Level of Education: Form 5 plus
Radio and radio cassette
Telephone
Cart
Wells
Motor cycle
Beds
Livestock
Cupboards, wardrobes, drawers, etc
Books (not school books)
Chairs
Computer
Cooking pots, cups, other kitchen
utensils
Hand milling machine
Lanterns, lamps, etc
Level of Education: Standard 5-8
Modern Foundation
Mosquito net
Other
Poultry
Refrigerator or freezer
Watches
Water heater
Bicycle
Distance to Health Clinic
Dar es Salaam
Coeff.
Sig.
-5,731
**
321
**
-13,227
4,749
15,868
**
13,078
-6,160
*
11,349
*
5,226
34,411
*
6,494
*
3,805
42,147
**
-3,564
-1,502
*
16
*
1,078
3,447
**
-12,654
*
14,225
**
4,956
5,711
6,061
*
6,777
**
6,831
*
16,696
-12,159
*
-22,173
**
-9,952
9,381
**
3,742
Other urban areas
Coeff.
Sig.
-3,947
**
130
**
4,972
*
6,946
*
-7,358
*
17,018
5,634
*
**
2,527
*
-3,374
**
-243
3
660
*
233
10,598
-4,121
**
*
*
7,237
9,063
*
-9,332
**
-7,256
*
-7,264
**
3,181
3,071
-3,767
21,309
3,636
-7,691
3,225
-2,013
-3,017
3,146
4,680
5,366
6,349
4,924
2,091
13
Rural areas
Coeff.
Sig.
-1,825
**
47
**
-7,359
**
2,925
-2,910
*
**
**
-4,592
**
*
**
*
**
**
-6,168
2,073
*
1,837
128
**
*
**
Donkeys
6,079
Electric/charcoal iron
3,846
Ownership of more than one house
2,917
Modern Toilet
1,629
Sewing machine
-5,424
Table
2,498
Video
14,372
Water pumping set
-3,058
Wheel-barrow
-4,328
Constant
58,803
**
27,917
**
21,869
R-squared
0.67
0.47
0.45
Observations
280
379
564
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
*
*
**
*
**
**
**
**
Appendix 2 Modelling the likelihood that households are poor
This model uses the binary dependent variable which takes the value of 1 if the household is
below the poverty line and 0 if its expenditure lies above the line. The coefficients here
indicate the effect of the independent variables on the likelihood that households are poor. A
positive coefficient indicates that increasing values of the variable increase the likelihood of
poverty. A negative coefficient indicates that increasing values reduce the likelihood that
households are poor.
Table A2 reports the results of estimating the same specification as in the 1991/92 analysis.
The results are similar, as is the predictive power of the model.
Table A2 Logistic regression of Poverty using 1991/92 specification
Coeff.
Sig.
Weighted Index of Assets
-0.02
**
Household Size
0.29
**
Dependency Ratio
1.07
*
Modern Floor
-0.73
Distance to Health Clinic
-0.02
Modern Wall Material
0.46
Modern Roof
-1.11
**
Distance to Water
-0.10
Male Household Head
-0.48
Constant
-0.92
*
Pseudo r-squared
0.25
Predictive power
75%
Observations
1223
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
14
Table A3 estimates the likelihood of being poor using the expanded model specification.
With the addition of 11 assets the pseudo r-squared and the predictive power of the model are
improved.
Table A3 Logistic Regression of Poverty using the revised specification
Coeff.
Sig.
Household Size
0.34
Tractor
1.89
Dependency Ratio
1.36
Electric/charcoal iron
-0.77
Bicycle
-0.78
Modern Roof
-1.51
Spraying machine
-0.99
Motor cycle
-5.00
Sofa set
-1.76
Head Employed
-1.68
Head Self-employed
-0.99
Head Works on Own Farm
-1.05
Watches
-0.74
Books (not school books)
0.57
Age of Household Head
0.07
Age of Household Head – squared
-0.00
Lives in Rural Area
-1.44
Lives in Urban Area
-1.09
House(s)
-0.74
Distance to Water
-0.05
Distance to Health Clinic
-0.03
Wheel-barrow
1.74
Bee hives
1.76
Livestock
0.94
Lanterns, lamps, etc
-0.77
Dish antenna / decoder
-1.11
Mosquito net
-0.67
Refrigerator or freezer
-2.95
Sewing machine
1.93
Constant
-0.61
Pseudo r-squared
0.37
Predictive power
82%
Observations
1209
Where a coefficient is entered against a variable, p<0.2; * significant at 5% level; ** significant at 1% level
15
**
*
*
*
*
**
**
**
*
*
**
**
**
**
*
**
**
Download