Stat 921 Notes 14 I. Propensity Score Caliper Matching Matching on the propensity score focuses entirely on balance and not on obtaining close matches. A comprise between obtaining close matches and good balance is propensity score caliper matching. Reference: Rosenbaum, P.R. and Rubin, D.B. (1985), “Constructing a control group by multivariate matched sampling methods that incorporate the propensity score” American Statistician. With a caliper of width w, if two individuals, say k and l, have propensity scores that differ by more than w, then the distance between these individuals is set to ; whereas if the propensity scores differ by w or less, the distance is a measure of the proximity of xk and xl . A caliper of 20% of the standard deviation of the propensity score is a common choice. A reasonable strategy is to start with a width of 20% of the standard deviation of the propensity score, and adjust the caliper to be less if needed to obtain balance on the propensity score. Within the caliper, a good measure of distance between xk and xl is the Mahalanobis distance. If ̂ is the sample covariace 1 matrix of x , then the estimated Mahalanobis distance between xk and xl is ( x x )T ˆ 1 ( x x ) . k l k l Speaking very informally, in the Mahalanobis distance, a difference of one standard deviation counts the same for each covariate in x . Even as an informal description, this is not quite correct. The Mahalanobis distance takes account of the correlations among variables. If one covariate in x were weight in pounds rounded to the nearest pound and another were weight in kilograms rounded to the nearest kilogram, then the Mahalanobis distance would come very close to counting those two covariates as a single covariate because of their high correlation. The Mahalanobis distance was originally developed for use with multivariate normal data, and for data of this type it works fine. When the data are not normal, the Mahalanobis distance can exhibit some odd behavior. If one covariate contains extreme outliers or has a long-tailed distribution, its standard deviation will be inflated, and the Mahalanobis distance will tend to ignore that covariate in matching. With binary indicators, the variance is largest for events that occur about half the time, and it is smallest for events with probability near zero and one. In consequence, the Mahalanobis distance gives greater weight to binary variables with probabilities near zero and one than to binary variables with probabilities closer to one half. If there were binary indicators for the states of the US, then the Mahalanobis distance would regard matching for Wyoming as vastly more important than matching for California, simply 2 because fewer people live in Wyoming. In many contexts, rare binary covariates are not of overriding importance, and outliers do not make a covariate unimportant, so the Mahalanobis distance may not be appropriate with covariates of this kind. A simple alternative to the Mahalanobis distance (i) replaces each of the covariates, one at a time, by its ranks, with average ranks for ties; (ii) pre-multiplies and post-multiplies the covariance matrix of the ranks by a diagonal matrix whose diagonal elements are the ratios of the standard deviations of untied ranks 1, , L to the standard deviations of the tied ranks of the covariates; and (iii) computes the Mahalanobis distance using the ranks and this adjusted covariance matrix. This is called the rank-based Mahalanobis distance. Step (i) limits the influence of outliers. After step (ii) is complete, the adjusted covariance matrix has a constant diagonal. Step (ii) prevents heavily tied covariates, such as rare binary variables, from having increased influence due to reduced variance. Penalty functions: There may be no pair matching in which the caliper on the propensity score is respected for all 21 matched pairs. For this reason, instead of using infinite distance when the propensity scores are further apart than the caliper, we use a “penalty function” which extracts a large but finite penalty for violations of the constraint, e.g., 1000*max(0,| eˆ( xk ) eˆ( xl ) | w) , so if the propensity score for units k and l are within w, no penalty is extracted but if the propensity scores are further apart than w, then the penalty is 1000*(| eˆ( xk ) eˆ( xl ) | w) . This penalty is added to the rank 3 based Mahalanobis distance for the corresponding pair. Optimal matching will try to avoid the penalties by respecting the caliper, but when that is not possible, it will prefer to match so the caliper is only slightly violated for a few matched pairs. Example: Welder data from Notes 13. # Data treatment=c(rep(1,21),rep(0,26)); age=c(38,44,39,33,35,39,27,43,39,43,41,36,35,37,39,34,35,53,38,37,38,48,63,44,4 0,50,52,56,47,38,34,42,36,41,41,31,56,51,36,44,35,34,39,45,42,30,35); african.american=c(0,0,0,1,rep(0,5),1,rep(0,11),1,rep(0,12),rep(1,4),rep(0,9)); smoker=c(rep(0,2),rep(1,4),0,1,1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,1,rep(0,8),1,0,1,1,1,0,1, 0,0,1,1,0,0,0,1); Xmat=cbind(age,african.american,smoker); # Outcome: dpc = DNA-protein cross-links in percent in white blood cells dpc=c(1.77,1.02,1.44,.65,2.08,.61,2.86,4.19,4.88,1.08,2.03,2.81,.94,1.43,1.25,2.97, 1.01,2.07,1.15,1.07,1.63,1.08,1.09,1.1,1.1,.93,1.11,.98,2.2,.88,1.55,.55,1.04,1.66,1. 49,1.36,1.02,.99,.65,.42,2.33,.97,.62,1.02,1.78,.95,1.59); # The propensity score model building and balance checking process leads us to a # propensity score model that includes all variables, interactions and squares. agesq=age^2; age.race=age*african.american; age.smoker=age*smoker; race.smoker=african.american*smoker; # Propensity score estimate model3=glm(treatment~age+agesq+african.american+smoker+age.race+age.smok er+race.smoker,family=binomial); propscore.model3=predict(model3,type="response") # Function for computing # rank based Mahalanobis distance. Prevents an outlier from # inflating the variance for a variable, thereby decreasing its importance. # Also, the variances are not permitted to decrease as ties # become more common, so that, for example, it is not more important # to match on a rare binary variable than on a common binary variable # z is a vector, length(z)=n, with z=1 for treated, z=0 for control 4 # X is a matrix with n rows containing variables in the distance smahal= function(z,X){ X<-as.matrix(X) n<-dim(X)[1] rownames(X)<-1:n k<-dim(X)[2] m<-sum(z) for (j in 1:k) X[,j]<-rank(X[,j]) cv<-cov(X) vuntied<-var(1:n) rat<-sqrt(vuntied/diag(cv)) cv<-diag(rat)%*%cv%*%diag(rat) out<-matrix(NA,m,n-m) Xc<-X[z==0,] Xt<-X[z==1,] rownames(out)<-rownames(X)[z==1] colnames(out)<-rownames(X)[z==0] library(MASS) icov<-ginv(cv) for (i in 1:m) out[i,]<-mahalanobis(Xc,Xt[i,],icov,inverted=T) out } # Rank based Mahalanobis distance distmat1=smahal(treatment,Xmat); # Function for adding propensity score caliper # caliper*standard deviation of the propensity score p is the width of the caliper addcaliper=function(dmat,z,p,caliper=0.2,penalty=1000){ #add a penalty function to dmat for violations of capliper on p sdp<-sd(p) adif<-abs(outer(p[z==1],p[z==0],"-")) adif<-(adif-(caliper*sdp))*(adif>(caliper*sdp)) dmat<-dmat+adif*penalty dmat } 5 # Add propensity score caliper distmat2=addcaliper(distmat1,treatment,propscore.model3); # Optimal pair match library(optmatch) pairmatchvec=pairmatch(distmat2); # Create a vector saying which control unit each treated unit is matched to pairs.short=substr(pairmatchvec,start=3,stop=10); pairsnumeric=as.numeric(pairs.short); notreated=sum(treatment) pairsvec=rep(0,notreated); for(i in 1:notreated){ temp=(pairsnumeric==i)*seq(1,length(pairsnumeric),1); pairsvec[i]=sum(temp,na.rm=TRUE)-i; } # Assessment of balance # Calculate standardized differences notreated=sum(treatment); Xmat=cbind(age,african.american,smoker,age.race,age.smoker,race.smoker); treatedmat=Xmat[1:notreated,]; # Standardized differences before matching controlmat.before=Xmat[(notreated+1):nrow(Xmat),]; controlmean.before=apply(controlmat.before,2,mean); treatmean=apply(treatedmat,2,mean); treatvar=apply(treatedmat,2,var); controlvar=apply(controlmat.before,2,var); stand.diff.before=(treatmean-controlmean.before)/sqrt((treatvar+controlvar)/2); # Standardized differences after matching controlmat.after=Xmat[pairsvec,]; controlmean.after=apply(controlmat.after,2,mean); # Standardized differences after matching stand.diff.after=(treatmean-controlmean.after)/sqrt((treatvar+controlvar)/2); stand.diff.before age african.american -0.6449557 -0.2734547 age.smoker race.smoker smoker age.race 0.3562784 -0.3295512 6 0.3286498 -0.2443904 stand.diff.after age african.american smoker age.race -0.275803443 0.000000000 0.381989217 -0.009209958 age.smoker race.smoker 0.400160703 0.000000000 Balance is not great on some variables; we will examine full matching later. For illustrative purposes, let’s consider making inferences assuming balance is fine. welder.dpc=dpc[1:notreated]; control.dpc=dpc[pairsvec]; boxplot(welder.dpc,control.dpc,names=c("Welder","Control")) # Inference Under Additive Treatment Effect Model wilcox.test(welder.dpc,control.dpc,paired=TRUE,conf.int=TRUE); Wilcoxon signed rank test data: welder.dpc and control.dpc V = 180, p-value = 0.02385 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: 0.095 1.195 sample estimates: (pseudo)median 0.595 If there is no hidden bias, there is strong evidence that being a welder increases inappropriate DNA-protein cross links. 7