[CLICK HERE AND TYPE TITLE]

advertisement
International Biometric Society
FLEXIBLE BINARY RESPONSE MODELS TO CONTROL FOR NON-RANDOM SAMPLE
SELECTION
Rosalba Radice
Birkbeck, University of London, Department of Economics, Mathematics and Statistics
HIV prevalence estimates from population-based surveys are vulnerable to selection bias if
HIV status is missing for a proportion of the eligible population. Standard approaches, such
as imputation, to correct prevalence estimates for selective nonparticipation assume that
data are missing at random, that is there are no unobserved confounders associated with
both HIV status and consent to test. This assumption is unlikely to hold since a person’s
belief about his or her HIV status may be related to the actual status and hence influence
the likelihood of consenting to a test through unobservables. For instance, individuals who
already know or suspect that they are HIV positive may be less likely to consent (Floyd et
al., 2013). If this is the case then there will be a selection bias in population prevalence
estimates based on an incorrect assumption of missing at random.
One potential solution to this problem is the adoption of sample selection models which can
provide consistent estimates of the parameter of interest, even when missing data are
systematically related to some unobserved characteristic of the individual (Heckman, 1979).
A drawback of these models is that they typically rely on strong parametric assumptions,
such as bivariate normality of the error terms, symmetric link functions and pre-specified
form of covariate effects.
We introduce a novel approach for relaxing joint normality, symmetric links and parametric
covariate effects in selection models using copulas, skew distributions and regression
splines. Copulas provide a nice framework to relax the bivariate normality assumption,
combining marginal distributions. A substantial advantage of the copula approach is that the
marginal may come from different families. This construction allows researchers to consider
marginal distributions and the dependence between them as two separate but related
issues. To account for asymmetric links, skew distributions, such as power-normal and
reciprocal power-normal, are considered. Finally, regression splines allow for flexibility in
modelling covariate effects.
We apply this method to estimating HIV prevalence in the 2007 Zambian Demographic and
Health Survey. Existing results indicating the presence of selection bias in the estimation of
HIV prevalence for men and women in Zambia are not robust to relaxing the assumption of
joint normality, symmetric link and pre-specified form of covariate effects. As
misspecification results in inconsistent estimates, future research involving selection models
to account for missing data should routinely conduct sensitivity analyses for alternative
functional forms using this approach.
References
Floyd, S., Molesworth, A., Dube, A., Crampin, A. C., Houben, R., Chihana, M., Price, A.,
Kayuni, N., Saul, J., French, N., Glynn, J. (2013). Underestimation of HIV prevalence in
surveys when some people already know their status, and ways to reduce the bias. AIDS,
27, 233--242.
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47,
153–162.
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014
Download