SSW 2010 Introduction to theory and application of propensity scores Felix Thoemmes, Texas A & M University Obtaining and installing R Download the latest installation binary file from a CRAN server, e.g.: http://cran.stat.ucla.edu/bin/windows/base/R-2.11.0-win32.exe R gets updated frequently, so you might need to come back to the CRAN website to install the newest version of R once in a while. After downloading the R program, install it at a convenient location. Menu shortcuts in Windows will be generated automatically. There are several downloadable GUI’s (Graphical User Interfaces) available that run outside R and access it in the back. We will not use them in this workshop, but in case you are interested, two popular ones are Tinn-R (http://www.sciviews.org/Tinn-R/) and JGR (http://jgr.markushelbig.org/JGR.html). Obtaining and installing R packages R has many additional features that can be installed from the CRAN server, so-called packages. For this workshop we will need several packages to perform the propensity score analysis. The names of the two main packages are: “MatchIt”, and “PSAgraphics”. Also useful are “foreign”, “twang”, “Rcmdr”, and “Matching”. Packages can be installed using the graphical interface in R. Navigate to the Menu Packages -> Install Package(s) Choose a CRAN mirror site – e.g. TX, for Texas. Navigate to the name of the R package that you want to install, e.g. MatchIt. Several packages can be selected at once using the Ctrl key. The packages are installed and now need to be loaded into R, using the library command, e.g. library(MatchIt) Commands, variables names, object names, packages, etc. are CASE-SENSITIVE! Install and load MatchIt, PSAgraphics, foreign, twang, Rcmdr, and Matching. Page 1 of 4 R commander The R commander is a GUI type interface that is run within R. It performs many statistical procedures (see attached PDF), but we will mainly use it to have an interface in which we can edit code and to easily import data from other formats such as SPSS or SAS. After successful installation and loading of the package the R commander can be opened by typing library(Rcmdr) into the R console. Importing Data into R Most of us will have data stored in various formats, e.g. SPSS data files (.sav), SAS datafiles (.sas7bdat), or ASCII files (.txt, .dat). A variety of these files can be imported using the “foreign“ library. The Rcmdr can import the files for us using the foreign command. The menu can be accessed under “Data->Import data”. After the data is successfully loaded we will use the Rcmdr script window to program our analysis and will obtain results in the Rcmdr Output window. Page 2 of 4 Propensity Score Analysis in R Selection Estimation Conditioning Model Checks Effect Estimation Selection: By the time we have imported the data into R, we already made choices as to which variables to assess and which ones to ignore. Our selection is confined to the variables contained in the dataset. Unless we have theoretical reasons to exclude any variable a-priori (e.g. that variable is a mediator), we will use all variables as covariates that are contained in the dataset. Estimation / Conditioning / Model Checks: These three steps are combined in the analysis in R. We will use the package MachIt and PSAgraphics as our main instruments in the analysis. Effect estimation: The estimation of the treatment effect can be performed within R, or if we prefer we can extract the matched or otherwise conditioned sample and perform the remaining analyses in any program of our choice, e.g. SPSS or SAS. R code (also on power point slides): library(foreign) library(MatchIt) library(PSAgraphics) #read in dataset using Rcmdr# psa <- read.spss("C:/Users/fthoemmes/Desktop/ps workshop/testdatapsa.sav", use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE) #prima facie effect# pf <- lm(y~z, data=psa) summary(pf) ##matching# ##model to be used to predict z# #(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16, ##type of matching# #method= "nearest", ##discard option# #discard="both", ##caliper option# #caliper = .2, ##dataset# Page 3 of 4 #data = psa) #same code in one single line# match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method= "nearest", discard="none", caliper = .1, data = psa) #summary of the matched sample# summary(match1, interactions=FALSE, addlvariables=NULL, standardize=FALSE) # plot(match1,type="QQ") plot(match1,type="hist") plot(match1,type="jitter") #additional plot of standardized differences smatch1<-summary(match1,standardize=TRUE) plot(smatch1) #write out data dmatch1 <-match.data(match1) #additional graphics# #put variables in objects continuous<-dmatch1$X1 treatment<-dmatch1$z #create strata from estiamted PS dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions', labels=FALSE) strata<-dmatch1$strata #box plot comparing balance of variables across strata box.psa(continuous, treatment, strata) #treatment effect of matched sample m1 <- lm(y~z, data=dmatch1) summary(m1) Page 4 of 4