ESTIMATING THE DOSE-RESPONSE FUNCTION THROUGH THE GLM APPROACH Barbara Guardabascio, Marco Ventura Italian National Institute of Statistics 7th June 2013, Potsdam 1 Outline of the talk Motivations; literature references; our contribution to the topic; the econometrics of the dose-response; how to implement the dose-response; our programs; applications. 2 Motivations Main question: how effective are public policy programs with continuous treatment exposure? Fundamental problem: treated individuals are self-selected and not randomly. Treatment is not randomly assigned (possible) solution: estimating a dose-response function 3 Motivations What is a dose-response function? It is a relationship between treatment and an outcome variable e.g.: birth weight, employment, bank debt, etc Treatment Effect Function 10000 -20000 0 -10000 0 E[year6(t+1)]-E[year6(t)] 10000 15000 20000 5000 -5000 E[year6(t)] Dose Response Function 0 2 4 6 Treatment level Dose Response 8 10 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 0 2 4 6 Treatment level Treatment Effect 8 10 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 4 Motivations How can we estimate a dose-response function? It can be estimated by using the Generalized Propensity Score (GPS) 5 Literature references 1. Propensity Score for binary treatments: Rosenbaum and Rubin (1983), (1984) 2. for categorical treatment variables: Imbens (2000), Lechner (2001) 3. Generalized Propensity Score for continuous treatments: Hirano and Imbens, 2004; Imai and Van Dyk (2004) 6 Our contribution Ad hoc programs have been provided to STATA users (Bia and Mattei, 2008), but … … these programs contemplate only Normal distribution of the treatment variable (gpscore.ado and doseresponse.ado) We provide new programs to accommodate other distributions, not Normal. (gpscore2.ado and doseresponse2.ado) 7 The econometrics of the dose-response {Yi(t)} set of potential outcomes for Where [t0, t1] is the set of potential treatments over 8 The econometrics of the dose-response Let us suppose to have N individuals, i=1 … N Xi vector of pre-treatment covariates; Ti level of treatment delivered; Yi (Ti) outcome corresponding to the treatment Ti 9 The econometrics of the dose-response We want the average dose response function (t ) EYi (t ) Hirano-Imbens define the GPS as the conditional density of the actual treatment given the covariates R r (T | X ) 10 The econometrics of the dose-response Balancing property: X 1{T t} | r (t , x) Within strata with the same r(t,x) the probability that T=t does not depend on X 11 The econometrics of the dose-response If weak unconfoundedness holds we have Y (t ) T | X t This means that the GPS can be used to eliminate any bias associated with differences in the covariates and … 12 The econometrics of the dose-response The dose-response function can be computed as: (t , r ) EY (t ) | r (t , X ) r EY | T t , R r (t ) E t , r (t , X ) 13 How to implement the GPS The dose-respone can be implemented in 3 steps: FIRST STEP: 1. Regress Ti on Xi and take the conditional distribution of the treatment given the covariates Ti| Xi 14 How to implement the GPS f (Ti ) | X i ~ D ' X i , 2 Where f(.) is a suitable transformation of T (link) D is a distribution of the exponential family β parameters to be estimated σ conditional SE of T|X 15 How to implement the GPS ˆ D T , ˆ ' X , ˆ 2 R i i i GPS 1a. Test the balancing property 16 How to implement the GPS SECOND STEP: Model the conditional expectation of E[Yi| Ti, Ri ] as a function of Ti and Ri (t , r ) EYi | Ti , Ri 0 1Ti 2Ti 2 3 Ri 4 Ri2 5Ti Ri 17 How to implement the GPS THIRD STEP: Estimate the dose-response function by averaging the estimated conditionl expectation over the GPS at each level of the treatment we are interested in 1 (t ) N N ˆt , rˆ(t , X ) i i 18 How to implement the GPS Where is the novelty? in the FIRST STEP Instead of a ML we use a GLM exponential distribution (family) combined with a link function 19 our programs Link\Distr Normal Inv. Normal Binomial Poisson Neg. Binomial Gamma Identity X X X X X X Log X X X X X X X X X Logit X Probit X Cloglog X Power Opower X X X X Nbin X Loglog X Logc X 20 our programs We have written two programs: doserepsonse2.ado; estimates the dose-response function and graphs the result. It carries out step 1 – 2 – 3 of the previous slides by running other 2 programs 21 our programs gpscore2.ado: evaluates the gpscore under 6 different distributional assumptions step 1 of the previous slides doseresponse_model.ado: Carries out step 2 of the previous slides 22 our programs doseresponse2 varlist , outcome(varname) t(varname) family(string) link(string) gpscore(newvarname) predict(newvarname) sigma(newvarname) cutpoints(varname) nq_gps(#) index(string) dose_response(newvarlist) Options t_transf(transformation) normal_test(test) normal_level(#) test_varlist(varlist) test(type) flag(#) cmd(regression_cmd) reg_type_t(string) reg_type_gps(string) interaction(#) t_points(vector) npoints(#) delta(#) bootstrap(string) filename(filename) boot_reps(#) analysis(string) analysis_leve(#) graph(filename) flag_b(#) opt_nb(string) opt_b(varname) detail 23 our programs gpscore2 varlist , t(varname) family(string) link(string) gpscore(newvarname) predict(newvarname) sigma(newvarname) cutpoints(varname) index(string) nq_gps(#) Options t_transf(transformation) normal_test(test) normal_level(#) test_varlist(varlist) test(type) flag_b(#) opt_nb(string) opt_b(varname) detail 24 Application Data set by Imbens, Rubin and Sacerdote (2001); The winners of a lottery in Massachussets: amount of the prize (treatment) Ti earnings 6 years after winning (outcome) Yi age, gender, education, # of tickets bought, working status, earnings before winning up to 6 Xi 25 Application: flogit Fractional data: flogit model. Treatment: prize/max(prize) outcome: earnings after 6 year family(binomial) link(logit) 26 Application: flogit 5000 10000 Treatment Effect Function 0 -5000 -10000 -20000 -40000 E[year6(t)] 0 E[year6(t+.1)]-E[year6(t)] 20000 Dose Response Function 0 .2 .4 .6 Treatment level Dose Response .8 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 0 .2 .4 .6 Treatment level Treatment Effect Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 27 .8 Application: count data Count data: Poisson model. Treatment: years of college+ high school outcome: earnings after 6 year family(poisson) link(log) 28 Application: count data 20000 -20000 0 -10000 0 E[year6(t+1)]-E[year6(t)] 15000 10000 5000 -5000 E[year6(t)] Treatment Effect Function 10000 Dose Response Function 0 2 4 6 Treatment level Dose Response 8 10 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 0 2 4 6 Treatment level Treatment Effect 8 10 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 29 Application: gamma distribution Gamma distribution: Treatment: age outcome: earnings after 6 year family(gamma) link(log) 30 Application: gamma distribution 0 20 40 60 Treatment level Dose Response 80 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction -15000 0 -10000 -5000 0 E[year6(t+1)]-E[year6(t)] 100000 50000 -50000 E[year6(t)] Treatment Effect Function 5000 150000 Dose Response Function 0 20 40 60 Treatment level Treatment Effect 80 Low bound Upper bound Confidence Bounds at .95 % level Dose response function = Linear prediction 31