RCR.R User Guide Version 0.9 Table of Contents 1 Introduction ............................................................................................................................................. 2 1.1 Obtaining the program...................................................................................................................... 2 1.2 Installation ........................................................................................................................................ 3 1.3 Running the example file .................................................................................................................. 3 1.4 Updates and technical support ......................................................................................................... 3 2 Estimating the RCR model........................................................................................................................ 3 2.1 The basic estimation command ........................................................................................................ 3 2.2 Additional estimation options ........................................................................................................... 4 Choosing a custom range for λ............................................................................................................ 4 Cluster-robust standard errors ........................................................................................................... 4 Confidence interval options ................................................................................................................ 5 3 Creating plots ........................................................................................................................................... 5 Changing the range of values to plot .................................................................................................. 5 Plotting *λL,λH+ and *θL,θH] ................................................................................................................... 5 Including a legend ............................................................................................................................... 6 Choosing plot elements ...................................................................................................................... 6 Confidence intervals ........................................................................................................................... 6 Advanced options ............................................................................................................................... 6 4 Notes for advanced users ........................................................................................................................ 6 4.1 Class descriptions .............................................................................................................................. 6 rcr........................................................................................................................................................ 7 parameter ........................................................................................................................................... 7 Theta ................................................................................................................................................... 8 4.2 A note on standard error estimates .................................................................................................. 8 5 Version History ........................................................................................................................................ 9 References ................................................................................................................................................ 10 1 Introduction This file provides documentation for the downloadable computer program (written in R) implementing the estimation method described in my paper “Bounding a linear causal effect using relative correlation restrictions” (Krauth 2008). The program is designed to estimate the linear model: (1) where y is a scalar outcome of interest, z is a scalar variable whose effect on y we are interested in estimating, and X is a vector of control variables such that (without loss of generality) . The parameter of interest here is θ, which is being interpreted as the effect of z on y. That effect is identified and can be estimated by the OLS regression of y on (z,X) if z is exogenous given the control variables, i.e., if . However, there are many cases where it is unreasonable to believe that z is absolutely exogenous, but reasonably to believe that it is “mostly” exogenous (i.e., that is small). One convenient way of modeling “mostly” exogenous is by replacing the absolute restriction that with the weaker relative correlation restriction: (2) where Λ is some interval specified by the econometrician. This program estimates the econometric model defined by equations (1) and (2). 1.1 Obtaining the program The program distribution is available online at http://www.sfu.ca/~bkrauth/code/code.html and contains four files: rcr_r_userguide.pdf This document. rcr.r A text file containing the R functions needed to implement RCR estimation. example.r An example R program that uses the functions in rcr.r to estimate a particular model on actual data. This particular program estimates class size effects in kindergarten for Project STAR students, i.e., the first column in Table 3 of Krauth (2008). example_data.dta The data set (in Stata format) used by example.r The code uses R, an open-source statistical package based on the S language. The code will also run on commercial packages based on S (e.g., on S-Plus). R can be obtained free of charge from http://www.rproject.org/. 1.2 Installation To “install” the code, just unzip the distribution files into a directory of your choice. The text file rcr.r contains the source code for all of the necessary functions. To make those functions available during a particular R session, just execute the command: source(“pathname/rcr.r”) where pathname is the full path of the directory into which you have placed the files. For example, if you have placed the files in C:\work, then execute the command source(“c:/work/rcr.r”). 1.3 Running the example file The distribution also includes an example file named example.r. It provides several examples using the CMU function on the Project STAR data described in the paper. To run it, open R and execute the command: source(“pathname/example.r”,chdir=T) where pathname is the full path of the directory into which you have placed the files. For example, if you have placed the files in C:\work, then execute the command source(“c:/work/example.r”,chdir=T). 1.4 Updates and technical support I am happy to provide technical support by email at bkrauth@sfu.ca. Updates to the code and documentation will be available at http://www.sfu.ca/~bkrauth/code/code.html. My intention is to make this program easy to use, so I greatly appreciate any comments or suggestions. 2 Estimating the RCR model 2.1 The basic estimation command The RCR model can be estimated by simply executing the function: rcr(x,y,z) where x is an n-by-k matrix of control variables. The first column of x should be a column of ones. y is an n-by-1 matrix (or n-vector) of outcome variables. z is an n-by-1 matrix (or n-vector) of treatment variables. This function returns an object of class rcr. RCR objects have methods for several generic functions, including print, plot, and summary. Output from rcr will look something like this: Global RCR Parameter Estimates: lambdaStar thetaStar lambda0 Estimate Std. Error 2.5% 97.5% 12.31 2.10 8.19 16.4 8.17 30.61 -51.83 68.2 28.94 108.56 -183.83 241.7 The lambda(theta) function has critical points at: limit -1.00e+100 localmax thetastar- thetastar+ -1.48e+01 8.17e+00 8.17e+00 limit 1.00e+100 Estimated bounds on theta for given bounds on lambda: lambda_L lambda_H theta_L theta_H 2.5% 97.5% 0 0 5.20 5.2 3.91 6.49 0 1 5.14 5.2 3.26 6.49 Confidence intervals calculated using conservative method. The printout above has 3 sections. 1. The section titled “Global RCR Parameter Estimates” provides estimates of λ*, θ* and θ(0) as defined in the paper, along with standard errors and confidence intervals. 2. The section titled “The lambda(theta) function has critical points at:” identifies (to an approximation) all points at which the estimated λ(θ) function has either a discontinuity or a change in direction. 3. The section titled “Estimated bounds on theta for given bounds on lambda” provides estimates of *θL,θH] for a few common assumptions about λ in *λL,λH]. 2.2 Additional estimation options The RCR function has several optional inputs that may be necessary for particular applications. The example.r file includes an example that uses all of these options to produce the results for the first column in Table 3 of the paper. Choosing a custom range for λ The optional argument Lambda is a J-by-2 matrix in which each row represents a value of *λL,λH] for which to estimate *θL, θH]. For example, if you want to estimate *θL,θH] for λ in *0,0.2+ and for λ in (∞,0], execute the command: rcr(x,y,z,Lambda=matrix(c(0,-Inf,0.2,0),ncol=2)) By default, Lambda is set to include a few convenient values. Cluster-robust standard errors By default, standard errors are calculated under the assumption that the data are independent and identically distributed. To use cluster-robust standard errors use the optional argument cluster. For example, you might execute: rcr(x,y,z,cluster=clustervar) where clustervar is the name of an n-vector of cluster identifiers. Confidence interval options The optional argument level can be used to select an alternative asymptotic level for the calculation of confidence intervals. For example, if you want 99% confidence intervals (the default is 95%), execute the command: rcr(x,y,z,level=0.99) The level argument operates the same way as in the generic R function confint. The optional argument ciType is a (case-insensitive) text string indicating the method to use when calculating confidence intervals for θ. Several alternatives are currently supported: Conservative: Constructs a confidence interval for θ from the lower bound for the confidence interval of θL and the upper bound for the confidence interval of θH. Imbens and Manski (2004) show that this approach can be too conservative when θH- θL is relatively large. Imbens-Manski: Uses the method proposed by Imbens and Manski (2004). Stoye: Uses the method proposed by Stoye (2008). For example, if you want to use the Imbens-Manski method, execute the command: rcr(x,y,z,ciType=”Imbens-Manski”) The default uses the conservative method. 3 Creating plots Plots of the estimated λ(θ) function can be created by calling the generic function plot on the results of the rcr function. For example, executing the command: plot(rcr(x,y,z)) will generate a plot. The plot produced with the default options will probably not look good, but more informative plots can be made using the optional arguments below. Changing the range of values to plot xlim is a 2-vector giving the range of values to plot on the horizontal (x) axis. ylim is a 2-vector giving the range of values to plot on the vertical (y) axis. Plotting [λL,λH] and [θL,θH] The optional argument Lambda is a 2-vector giving a range for *λL,λH] for which to plot the estimated *θL,θH]. The default (Lambda=NULL) is to simply not plot such an estimate. Including a legend The optional argument placeLegend tells plot where (if anywhere) to place a legend for the plot. This option is passed directly to R’s legend command. Valid options include “topleft”, “topright”, etc. See the R documentation for the legend command for more details. The default (placeLegend=FALSE) is to have no legend at all. Choosing plot elements These optional arguments take logical values, and are used to determine whether particular statistics are plotted. plotLambdaFunction indicates whether to plot the estimated λ(θ) function (default TRUE). plotThetaStar indicates whether to plot the estimated value of θ* (default TRUE) plotLambdaStar indicates whether to plot the estimated value of λ* (default TRUE) Confidence intervals The optional argument plotCI indicates whether to plot confidence intervals for the estimated Θ(Λ) (default is FALSE). The optional argument plotLambdaCI indicates whether to plot confidence intervals for the estimated λ(θ) function (default is FALSE). The optional argument level indicates the confidence level to use. The optional argument ciType indicates the type of confidence interval to estimate for θ (see the explanation of ciType in Section 2.2 for details). Advanced options The adjustYlim argument is a logical argument that tells plot whether to adjust the range of the vertical axis to look nice. It doesn’t work very well yet. The default is (adjustYlim=FALSE). The gridsize argument indicates the number of points at which to estimate the λ(θ) function. The default is (gridsize=100) The colorScheme argument is a vector of colors to be used in the plot. See the built-in R function colors() for a list of available colors. The default is (colorScheme= c("black","blue","gray","green","lightgreen")) In addition to these options, many of the standard optional arguments to R’s plot function are accepted. 4 Notes for advanced users 4.1 Class descriptions Like many estimation methods programmed in R, this program has an object-oriented design. That is, Data tends to be packaged into (potentially elaborate) structures called objects. Each object is a member of one or more classes. An object’s class dictates its structure. There are numerous “generic” functions (e.g., print) that can be called on all sorts of different objects but whose actual behavior (“method”) depends on the class of the object that has been passed to it. In order to understand how this program works it is important to know the main classes, the structure of objects in each class, and the methods available. rcr How they are created: Objects of class rcr are created with the rcr function. What they are: A rcr object is a list with a particular set of elements that describe the results from RCR estimation: Several parameter objects (see below for explanation of parameter objects) named o moments (this is the vector of sample moments from which all other parameters are calculated) o thetaStar (the parameter θ* described in the paper) o lambdaStar (the parameter λ* described in the paper) o lambda0 (the parameter λ(0) described in the paper) o thetaSegments (the set of θ values at which λ(θ) is discontinuous or switches direction). A function named lambda (the function λ(θ) described in the paper). etc. What can be done with them: There are rcr-specific methods for the generic functions print, summary, and plot. There is also a function is.rcr that tests whether an object is a (valid) rcr object. parameter How they are created: Parameter objects are created with the function parameter. This function is used mostly internally. What they are: A parameter object is a list that describes a vector of parameter estimates, including its covariance matrix. Specifically it is a list with elements: A k-vector named E that gives the actual estimate. A j-by-j matrix named V that gives the covariance matrix of moments A k-by-j matrix named gradient that gives the gradient of the estimate with respect to moments. What can be done with them: It has methods for the generic functions print, confint, and vcov. There is also a function parameter for the creation of parameter objects, and a function is.parameter() to test whether an object is a parameter object. Finally, two or more parameter objects can be concatenated into a single object using a special method for the generic function c(). Theta How they are created: Theta objects are created by the function estimateTheta. This function is mostly used internally. What they are: A Theta object is a special kind of parameter object that specifically describes the pair of parameter estimates *θL,θH]. It inherits from the parameter class. What can be done with them: Anything that can be done for a parameter object can be done with a Theta object. In addition, Theta objects have methods for the generic functions confint and summary. 4.2 A note on standard error estimates Standard error estimates for most parameters of interest are based on application of the delta method. That is, the parameter of interest is a (usually) differentiable function h(M), where M is some easily calculated vector of asymptotically normal summary statistics with easily-estimated covariance matrix. Standard errors are estimated by application of the formula where the derivative in the above expression is approximated numerically using the finite difference method. For example, if M were a scalar, the finite difference method would approximate the derivative by: where ε is some small positive number. Unfortunately, it is not easy to determine in advance how big ε should be to get a good approximation. In principle, the approximation above gets better with smaller ε, but rounding error in the calculation of h(M) becomes a problem if ε is too small. The current value the program uses for ε is stored in bkoptions$eps, and its default value is 10-10. It is a good idea to try out somewhat smaller and somewhat larger values to see if the standard error estimates are sensitive to the choice of value. 5 Version History Version 0.9 (6/29/2008) References Imbens, Guido and Charles F. Manski , 2004. Confidence intervals for partially identified parameters. Econometrica 72: 1845-1857. Krauth, Brian, 2008. Bounding a linear causal effect using relative correlation restrictions. Working paper, Simon Fraser University. Available online at http://www.sfu.ca/~bkrauth/papers/rcr.pdf. Stoye, Jörg, 2008. More on confidence intervals for partially identified parameters. CEMMAP Working Paper. Available online at http://cemmap.ifs.org.uk/wps/cwp1108.pdf.