International Biometric Society FLEXIBLE BINARY RESPONSE MODELS TO CONTROL FOR NON-RANDOM SAMPLE SELECTION Rosalba Radice Birkbeck, University of London, Department of Economics, Mathematics and Statistics HIV prevalence estimates from population-based surveys are vulnerable to selection bias if HIV status is missing for a proportion of the eligible population. Standard approaches, such as imputation, to correct prevalence estimates for selective nonparticipation assume that data are missing at random, that is there are no unobserved confounders associated with both HIV status and consent to test. This assumption is unlikely to hold since a person’s belief about his or her HIV status may be related to the actual status and hence influence the likelihood of consenting to a test through unobservables. For instance, individuals who already know or suspect that they are HIV positive may be less likely to consent (Floyd et al., 2013). If this is the case then there will be a selection bias in population prevalence estimates based on an incorrect assumption of missing at random. One potential solution to this problem is the adoption of sample selection models which can provide consistent estimates of the parameter of interest, even when missing data are systematically related to some unobserved characteristic of the individual (Heckman, 1979). A drawback of these models is that they typically rely on strong parametric assumptions, such as bivariate normality of the error terms, symmetric link functions and pre-specified form of covariate effects. We introduce a novel approach for relaxing joint normality, symmetric links and parametric covariate effects in selection models using copulas, skew distributions and regression splines. Copulas provide a nice framework to relax the bivariate normality assumption, combining marginal distributions. A substantial advantage of the copula approach is that the marginal may come from different families. This construction allows researchers to consider marginal distributions and the dependence between them as two separate but related issues. To account for asymmetric links, skew distributions, such as power-normal and reciprocal power-normal, are considered. Finally, regression splines allow for flexibility in modelling covariate effects. We apply this method to estimating HIV prevalence in the 2007 Zambian Demographic and Health Survey. Existing results indicating the presence of selection bias in the estimation of HIV prevalence for men and women in Zambia are not robust to relaxing the assumption of joint normality, symmetric link and pre-specified form of covariate effects. As misspecification results in inconsistent estimates, future research involving selection models to account for missing data should routinely conduct sensitivity analyses for alternative functional forms using this approach. References Floyd, S., Molesworth, A., Dube, A., Crampin, A. C., Houben, R., Chihana, M., Price, A., Kayuni, N., Saul, J., French, N., Glynn, J. (2013). Underestimation of HIV prevalence in surveys when some people already know their status, and ways to reduce the bias. AIDS, 27, 233--242. Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–162. International Biometric Conference, Florence, ITALY, 6 – 11 July 2014