Obtaining and installing R packages

advertisement
SSW 2010
Introduction to theory and application of propensity scores
Felix Thoemmes, Texas A & M University
Obtaining and installing R
Download the latest installation binary file from a CRAN server, e.g.:
http://cran.stat.ucla.edu/bin/windows/base/R-2.11.0-win32.exe
R gets updated frequently, so you might need to come back to the CRAN website to install the
newest version of R once in a while.
After downloading the R program, install it at a convenient location. Menu shortcuts in Windows
will be generated automatically.
There are several downloadable GUI’s (Graphical User Interfaces) available that run outside R
and access it in the back. We will not use them in this workshop, but in case you are interested,
two popular ones are Tinn-R (http://www.sciviews.org/Tinn-R/) and JGR
(http://jgr.markushelbig.org/JGR.html).
Obtaining and installing R packages
R has many additional features that can be installed from the CRAN server, so-called packages.
For this workshop we will need several packages to perform the propensity score analysis.
The names of the two main packages are: “MatchIt”, and “PSAgraphics”. Also useful are
“foreign”, “twang”, “Rcmdr”, and “Matching”.
Packages can be installed using the graphical interface in R.
 Navigate to the Menu Packages -> Install Package(s)
 Choose a CRAN mirror site – e.g. TX, for Texas.
 Navigate to the name of the R package that you want to install, e.g. MatchIt.
 Several packages can be selected at once using the Ctrl key.
 The packages are installed and now need to be loaded into R, using the library command,
e.g.
library(MatchIt)
 Commands, variables names, object names, packages, etc. are CASE-SENSITIVE!
Install and load MatchIt, PSAgraphics, foreign, twang, Rcmdr, and Matching.
Page 1 of 4
R commander
The R commander is a GUI type interface that is run within R. It performs many statistical
procedures (see attached PDF), but we will mainly use it to have an interface in which we can
edit code and to easily import data from other formats such as SPSS or SAS. After successful
installation and loading of the package the R commander can be opened by typing
library(Rcmdr)
into the R console.
Importing Data into R
Most of us will have data stored in various formats, e.g. SPSS data files (.sav), SAS datafiles
(.sas7bdat), or ASCII files (.txt, .dat). A variety of these files can be imported using the “foreign“
library. The Rcmdr can import the files for us using the foreign command. The menu can be
accessed under “Data->Import data”.
After the data is successfully loaded we will use the Rcmdr script window to program our
analysis and will obtain results in the Rcmdr Output window.
Page 2 of 4
Propensity Score Analysis in R
Selection
Estimation
Conditioning
Model
Checks
Effect
Estimation
Selection:
By the time we have imported the data into R, we already made choices as to
which variables to assess and which ones to ignore. Our selection is confined to the variables
contained in the dataset. Unless we have theoretical reasons to exclude any variable a-priori (e.g.
that variable is a mediator), we will use all variables as covariates that are contained in the
dataset.
Estimation / Conditioning / Model Checks:
These three steps are combined in the
analysis in R. We will use the package MachIt and PSAgraphics as our main instruments in the
analysis.
Effect estimation: The estimation of the treatment effect can be performed within R, or if we
prefer we can extract the matched or otherwise conditioned sample and perform the remaining
analyses in any program of our choice, e.g. SPSS or SAS.
R code (also on power point slides):
library(foreign)
library(MatchIt)
library(PSAgraphics)
#read in dataset using Rcmdr#
psa <- read.spss("C:/Users/fthoemmes/Desktop/ps
workshop/testdatapsa.sav",
use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
#prima facie effect#
pf <- lm(y~z, data=psa)
summary(pf)
##matching#
##model to be used to predict z#
#(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16,
##type of matching#
#method= "nearest",
##discard option#
#discard="both",
##caliper option#
#caliper = .2,
##dataset#
Page 3 of 4
#data = psa)
#same code in one single line#
match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method=
"nearest", discard="none", caliper = .1, data = psa)
#summary of the matched sample#
summary(match1, interactions=FALSE, addlvariables=NULL,
standardize=FALSE)
#
plot(match1,type="QQ")
plot(match1,type="hist")
plot(match1,type="jitter")
#additional plot of standardized differences
smatch1<-summary(match1,standardize=TRUE)
plot(smatch1)
#write out data
dmatch1 <-match.data(match1)
#additional graphics#
#put variables in objects
continuous<-dmatch1$X1
treatment<-dmatch1$z
#create strata from estiamted PS
dmatch1$strata <- bin.var(dmatch1$distance, bins=5,
method='proportions',
labels=FALSE)
strata<-dmatch1$strata
#box plot comparing balance of variables across strata
box.psa(continuous, treatment, strata)
#treatment effect of matched sample
m1 <- lm(y~z, data=dmatch1)
summary(m1)
Page 4 of 4
Download