Multivariate Analysis of Health Survey Data

advertisement
Analyzing Health Equity Using
Household Survey Data
Lecture 10
Multivariate Analysis of Health
Survey Data
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Why multivariate analysis?
• Health sector inequalities measured through bivariate
relationship b/w health vbl. and SES
• To go beyond measurement of inequalities, need
multivariate analysis, e.g.
– Finer description of inequality through standardisation for
age, gender, etc.
– Explanation of inequality through decomposition of
covariance
– Identification of causal relationship b/w health vbl. and SES
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Descriptive analysis
• Aim is to describe SES related inequality in health
• How does health vary with SES, conditional on
other factors?
• OLS describes how mean of health varies with
SES, conditional on controls
• Modelling issues (OVB, endogeneity) are
irrelevant
• But, cannot place causal interpretation on
estimates
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Causal analysis
• For causal inference need modelling approach
• Appropriate model and estimator depends upon
degree of detail required
• To identify total causal effect and not its
mechanisms, reduced form is adequate e.g.
decomposition
• To separately identify direct and indirect effects,
need structural model
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Household production model
• Health “produced” from inputs
• Inputs selected conditional on (unobservable)
health endowments
• So, inputs endogenous
• RF demand relations  combined technological
impact and behavioural response
• To isolate technological impact, must confront
endogeneity of inputs:
– Instrumental variables
– Panel data
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Sample design and area effects
• Health data come from complex surveys
• Stratified sampling – separate sampling from
population sub-groups (strata)
• Cluster sampling – clusters of observations not
sampled independently
• Over sampling – e.g. of poor, insured
• Area effects – feature of population but
importance depends on sample design
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Standard stratified sampling
• Population categorised by relatively few strata e.g.
urban/rural, regions
• Separate random sample of pre-defined size
selected from each strata
• Sample strata proportions need not correspond to
population proportions  sample weights
(separate issue)
• In pop. means differ by strata, standard errors of
means and other descriptive statistics should be
adjusted down
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stratification and modelling
• Exogenous stratification – OLS is consistent,
efficient and SEs valid
• Endogenous stratification – adjust SEs
• Relative to simple SEs, adjustment can be
important
• Relative to corrections for hetero. and clustering,
adjustment is usually modest
• May want intercept/slope differences by strata
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example of adjustment to OLS
standard errors
Table 1: OLS regression of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
Standard Errors
Coefficient Unadjusted Stratification Hetero.
Cluster
Strat. &
adjusted
Robust
adjusted
cluster adj.
Child's age (months)
3.70***
0.1986
0.2466
0.2470
0.2885
0.2872
Child's age squared (/100)
-2.38***
0.1554
0.1755
0.1758
0.1966
0.1957
Child is male
12.31***
3.2927
3.2708
3.2792
3.3649
3.2844
(log) Hhold. Consumption per capita
-37.85***
3.9843
4.1046
4.1116
5.4035
5.4582
Safe drinking water
-7.43
4.9533
4.8300
4.8441
9.1538
9.2098
Satifactory sanitation
-15.53***
5.1009
4.8199
4.8326
6.1202
6.0937
Years of schooling of household head -0.87*
0.4804
0.4770
0.4786
0.7302
0.7188
Mother has primary school diploma
-2.33
4.0598
4.1309
4.1397
6.1913
6.2438
Sample size
5218
Notes: Dependent variable is negative of z-score, multiplied by 100.
***, ** & * indicate 1%, 5% & 10% significance according to unadjusted standard errors.
Bold indicates a change in significance level relative to that using unadjusted standard errors.
Regression also contains region dummies at the level of stratification.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Cluster sampling
•
2-stage (or more) sampling process
1. Clusters sampled from pop./strata
2. Households sampled from clusters
•
•
Observations are not independent within clusters
and likely correlated through unobservables
Consequences and remedies depend on the
nature of the within cluster correlation
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Exogenous cluster effects
yic  Xic β  c   ic , E  ic | Xic , c   E  ic   0,
If
E  c | Xic   E  c 
(1)
have random effects model.
Conventional estimators e.g. OLS, probit, etc. are consistent
but inefficient and SEs need adjustment.
Can accept inefficiency and adjust SEs. In Stata, use option
cluster(varname)
For efficiency, must estimate and take account of withincluster correlation, e.g. GLS, random effects probit.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Endogenous cluster effects
(1) with E  c | Xic   E  c 
is the fixed effects model
Regressors correlated with composite error  conventional
estimators are inconsistent.
Need to purge cluster effects from composite error.
In linear model – cluster dummies, differences from cluster
means or first differences.
Binary choice – fixed effects logit.
Having purged cluster effects, is no need to correct SEs
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Comparison of estimators for a
cluster sample
Table 2: Regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
Child's age (months)
Child's age squared (/100)
Child is male
(log) Hhold. Consumption p.c.
Safe drinking water
Satifactory sanitation
Years of schooling of HoH
Mum has primary school dip.
Intercept
Sample size
5218
OLS
Cluster
Coeff. adjusted SE
3.72***
0.2917
-2.40***
0.1987
12.26***
3.4527
-50.93***
5.1149
-12.55
8.6438
-22.90***
5.6974
- 0.39
0.6628
2.67
5.3187
445.00***
44.5600
Random Effects
Robust
Coeff.
SE
3.74***
0.2451
- 2.40***
0.1742
12.19***
3.2394
-43.17***
4.0778
-7.93
4.8984
-19.39***
4.8446
-0.33
0.4828
1.71
4.1140
377.01***
32.1941
Fixed Effects
Robust
Coeff.
SE
3.78***
0.2430
-2.44***
0.1732
12.97***
3.2443
-30.37***
4.6090
-2.75
5.4247
-9.77**
4.9364
-0.55
0.5081
1.74
4.3186
276.19***
35.0991
2
R
0.1527 B-P LM
485.84 (0.000)
Hausman 50.54 (0.0000)
Notes: Dependent variable is negative of z-score, multiplied by 100.
***, ** & * indicate significance at 1%, 5% & 10% respectively.
SE - standard error, Robust SE - robust to general heteroskedasticity.
B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of commune effects (p-value).
Hausman - Hausman test of random versus fixed effects (p-value).
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stata computation
OLS with cluster corrected SEs
regr depvar varlist, cluster(commune)
OLS with cluster and stratification corrected SEs
svyset commune, strata(region)
svy: reg depvar varlist
Random effects (FGLS)
xtreg depvar varlist, re i(commune)
Fixed effects
xtreg depvar varlist, fe i(commune)
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
But community effects can be interesting
Exogenous community effects
*
Define, c  Zc  c the model becomes
yic  X ic   Z c  c*   ic , E  ic | X ic , Z c , c*   E  ic   0
*
*



E

|
X
,
Z

E

Condition for consistency:
ic
c
 c
 c 
SEs need to be adjusted for within-cluster correlation.
Efficiency loss from OLS may not be large.
This REM also known as the hierarchical model.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
(2)
Endogenous community effects
• With a single cross-section, not possible to
include community level regressors
• With panel data, can do this
• In cross-section:
– Run fixed effects and obtain estimates of the
community level effects
– Regress these effects on community level
regressors
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example explanation of
community effects
Table 3: Analysis of commune level variation in height-for-age z-scores (*-100),
Rural Vietnam 1998 (children<10 years)
OLS
Random Effects
2nd-stage Fixed Effects
Cluster
Robust
Commune Health Centre Vbls.
Coeff.
Coeff.
Coeff.
adj. SE
SE
SE
Vitamin A available >= 1/2 time
-10.11
6.6530
-6.86143
6.5927
-8.27114
6.7506
Has electricity
-38.79***
11.4558
-50.56***
12.1861
-45.34***
10.7991
Has clean water source
9.57
7.6534
7.2341
8.4061
7.0070
8.7610
Has sanitory toilet
-27.53***
7.0928
-24.50***
7.6694
-24.30***
7.8715
Has child growth chart
-13.85*
7.2046
-10.2623
7.5879
-11.732
7.6292
Number of inpatient beds
1.52*
0.8298
2.12**
0.9242
2.09**
0.9744
Has a doctor
11.39
6.9765
9.6255
7.1834
10.1856
7.5207
Intercept
371.89***
48.8784
344.71***
41.5639
279.13***
41.6264
Sample size
4099
2
R
0.1313
B-P LM
248.42
(0.0000)
Notes: Dependent variable is negative of z-score, multiplied by 100.
OLS & Random Effects - Coefficients on commune level regressors only are presented.
2nd stage Fixed Effects - Estimated commune effects from fixed effects regressed on commune vbls..
***, ** & * indicate significance at 1%, 5% & 10% respectively.
SE - standard error, Robust SE - robust to general heteroskedasticity.
B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of community effects (p-value).
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stata computation for 2-step
procedure
Run fixed effects and save predictions of the fixed effects
xtreg depvar varlist, fe i(commune)
predict ce, u
Use the between-groups panel estimator to regress these
predicted effects on community level regressors
xtreg ce varlist2, be i(commune)
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Sample weights
• Stratification, over-sampling and non-response can
all lead to a sample that is not representative of the
population
• Sample weights are the inverse of the probability
that an observation is a sample member
• Sample weights must be applied to get unbiased
estimates of population means, etc. and correct
SEs
• Should also be applied in “descriptive regressions”
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Should weights be applied to
estimate a model?
• If selection is on exogenous factors, unweighted
estimates are consistent and more efficient than
weighted
– Simple (robust) SEs are OK
• Otherwise, weighting required for consistency
– If stratification and weights, take account of both in
computation of SEs
– If no stratification, apply conventional SE formula to
weighted data.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
What if there is parameter
heterogeneity in population?
yis  X isβ s   is
1
β

Say we are interested in an average, such as
N
S
N β
s 1
s
s
Consistent estimate is the population weighted average of
the sector specific OLS estimates βˆ s
Unweighted OLS on the whole sample is not consistent for
the average parameter.
But neither is weighted OLS on the whole sample.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example application of sample
weights
Table 4: Weighted regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
OLS
Adjusted
Coeff.
SE
Child's age (months)
3.90***
0.3218
Child's age squared (/100)
-2.51***
0.2206
Child is male
14.86***
3.5718
(log) Hhold. Consumption p.c. -50.14***
5.5131
Safe drinking water
-12.16
10.2770
Satifactory sanitation
-22.01***
5.9503
Years of schooling of HoH
-0.21
0.7355
Mum has primary school dip. 3.62
5.6510
Intercept
428.15***
48.9827
Sample size
5218
2
R
0.1496
Random Effects
Robust
Coeff.
SE
3.90***
0.2652
-2.50***
0.1875
14.56***
3.3595
-40.67***
4.3511
-6.92
5.1624
-19.81***
5.3653
-0.15
0.5122
3.04
4.2925
347.47***
34.9686
2
R
0.4320
Fixed Effects
Robust
Coeff.
SE
3.91***
0.2642
-2.51***
0.1875
14.89***
3.3731
-26.05***
5.0196
-2.07
5.6079
-10.48*
5.4439
-0.42
0.5363
2.19
4.4958
236.12***
38.5646
2
R
0.2457
Notes: Dependent variable is negative of z-score, multiplied by 100.
***, ** & * indicate significance at 1%, 5% & 10% respectively.
Adjusted SE - standard error adjusted for clustering and stratification and robust to hetero.
Robust SE - standard error robust to general heteroskedasticity.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and
Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Download