Causal Inference - Michigan State University

advertisement
Introduction to Causal Inference
Kenneth A. Frank
CSTAT 2-4-2011
Home
Overview
•
•
•
•
•
•
•
•
•
•
•
Alternative Causal Mechanisms and the Counterfactual
Approximations to the Counterfactual
How Regression works: Explained Variance in Regression
Concern over Missing Confound (Internal Validity)
Consider Alternate Sample (External Validity)
Defining Absorption
Analyzing Pre/post-test designs ANCOVA: Analysis of Cova...
Schools as Fixed or Random
Statistical power in multilevels
Differential Treatment Effects and Heckman’s Rationality
References on Causal Inference
Home
My Take
• Sociological
• Motivated by studies of social context
– People select themselves into contexts
– Cannot randomize
– Each context is different (effects across contexts?)
• Regression based
– Control for confounds
– Explore interactions
• Sensitivity/robustness
– What would it take to invalidate an inference?
Home
•
•
•
•
•
•
Methods Covered
Counterfactual (2 potential outcomes)
Statistical control via regression/general linear model
– Random and fixed effects
Robustness of inference
– for impact of a confounding variable (internal validity)
– for representativeness of sample (external validity)
– Robustness indices a form of sensitivity analysis
Absorption
– Randomization
– Instrumental variables
– Pre-test
Differential treatment effects
– Treatment effect for treated/for control
Propensity scores
– Attention to assignment mechanism
• Logistic regression
– Using propensity scores in analysis
• Weighting
• Control
• Strata
• matching
Home
Example: The effect of National Board Certification on the
help a teacher provides others (Frank et al)
What is National Board Certification?
The National Board (a private organization) offers a certification process for primary and
secondary teachers. The process takes approximately 1 year and involves considerable
reflection and documentation of practice. Emphasis on progressive approach to teaching
and engagement in professional leadership.
The fifth core proposition of the NBPTS states that accomplished teaching reaches outside of
the individual classroom and involves collaboration with other teachers, parents,
administrators, and others (National Board for Professional Teaching Standards, 1989)
Descriptive
Q: Do National Board certified teachers (NBCTs) provide more help to others in their schools than
non-NBCTs?
A: Yes, the average NBCT is nominated by about 1.6 others as providing help with instruction, in
contrast to about .95 for a non-NBCT.
Causal Inference
Q: Does National Board certification affect the amount of help a teacher provides?
Frank, K.A., Gary Sykes, Dorothea Anagnostopoulos, Marisa Cannata, Linda Chard, Ann Krause, Raven McCrory. Extended Influence: National Board Certified Teachers as
Help Providers. Submitted to Education, Evaluation, and Policy Analysis
Home
Policy Implications
• Board has emphasized helpfulness as one of its goals
• Other Practices of BCT’s may disseminate throughout
school
• Key goal of organizational literature has been to cultivate
more “social capital” and sense of community, where
teachers help each other more  better student
outcomes.
• Amount of help teachers receive affects implementation
of innovations (Frank, Zhao and Borman 2004; Zhao and
Frank 2003)
http://www.msu.edu/~kenfrank/research.htm#social
Incentives for more teachers within existing BCT oriented
schools to become BCT’s
 Incentives for schools and districts with few or no BCTs
to engage BCT
Home
Correlation Does Not Equal
Causation
• Estimated effect could be attributed to
unmeasured covariate  alternative
causal mechanism
• Example
Y=amount of help a teacher provides to others
s= whether or not a teacher became National
Board Certified
cv=confounding variable (e.g., inclination to be
helpful) representing alternative causal
mechanism
Home
The Impact of a Confounding Variable on a
Regression Coefficient
Board
Certified
(s)
Number
others
helped
(y)
t( β1)
rscv
rscv ×rycv
Inclination to be Helpful
(confounding variable --cv)
Home
rycv
Home
Alternative Causal Mechanisms
and the Counterfactual
1) I have a headache
2) I take an aspirin (treatment)
3) My headache goes away (outcome)
Q) Is it because I took the aspirin?
A) We’ll never know – it is counterfactual – for the
individual
This is the Fundamental Problem of Causal
Inference
Home
Treatment Effect and Missing data
for the Counterfactual
Assignment
Home
Potential Outcome
Counterfactual and Philosophers:
Hume
• spatial/temporal contiguity:
– Cause and measurement of effect apply to
single unit
• Temporal succession
– Effect assessed after treatment is applied
• Constant conjunction
– If effect is constant
• Missing: effect of one cause is relative to
effects of others
Home
Mill
• Liked the experimental paradigm
• Concommitant variation:
– Correlational smoke  causational fire ( I agree,
more later)
• Method of Difference: Yit – Yic
• Method of Residues Yab – Ya
• Method of Agreement Yit – Yic=0 implies null
effect,
– compare observed effect against null effect
• Limitation: anything can be a cause
Home
Suppes
• Prima facia cause
– Correlation
• Genuine Cause
– No confounding vaiables Liked the
experimental paradigm
• Limitation: must explain full cause of
effect, rather than small effect of particular
cause
Home
Lewis
• Named the counterfactual
• If A were the case, C would be the case” is
true in the actual world if and only if (i)
there are no possible A-worlds; or (ii)
some A-world where C holds is closer to
the actual world than is any A-world where
C does not hold.
http://plato.stanford.edu/entries/causationcounterfactual/
Home
Basic Model for the Counterfactual
9=2+4+3
5=2+3
=[2+4+3]-[2+3]=[(2-2)+(4-0)+(3-3)=4
=2+(1 or 0)x4+3
9=2+(1)x4+3
5=2+(0)x4+3
=[2+4+3]-[2+3]=[(2-2)+(4-0)+(3-3)=4
Home
Treatment Effect and Missing data
for the Counterfactual
Assignment
Home
Potential Outcome
Reflection
• What part if most confusing to you?
– Why?
– More than one interpretation?
• Talk with one other, share
• Find new partner and problems and
solutions
Home
Approximations to the
Counterfactual
• Compare repetitions within person (observe teachers
before and after certification)
• Randomly assign people to become certified or not
(Fisher/Rosenbaum)
– Randomization (with large enough n) insures that there will be
no baseline differences between those assigned to treatment
and those assigned to control
• Regression (assuming all relevant confounds have been
measured)
• Each attempts to approximate the counterfactual by
insuring no relationship between confound and
assignment to treatment condition (rx cv=0  rx cvx x rx
cv=0)
Home
Randomization often not possible,
especially for social contexts
• Logistics
– Getting people to agree
• Independence
– People within social contexts (e.g., schools) are
dependent  randomize at level of context (the school)
 $$$$$$$
• Ethics
– Assigning adolescents to friendship groups?!
• Timing: the longer the treatment intervention, the more
likely to violate assumption that control group represents
forecast for treatment group
• Exposure to confounding with small n
Home
Rubin’s (1974) response
• Was causal inference impossible prior to
randomized experiments (circa 1930)?
• Make maximum use of data
• Approximate counterfactual
– Statistical control
– propensity score matching – match those who
received treatment with similar others but who
received control (like “twins”).
Home
Yi t
Employ Statistical Control for
Confound
Home
SPSS Syntax for reading in toy
counterfactual data
DATA LIST FREE / y confound s .
Begin DATA .
9 6 1
10 7 1
11 8 1
5 3 0
6 4 0
7 5 0
End DATA .
Home
Counterfactual Predicted Values from
Regression: Effect isn’t 4, it’s 1!
Home
Regression Without Control: wrong
answer: Estimate of 4
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT y
Home
/METHOD=ENTER s .
Regression with Control: Right
answer, Estimate of 1
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
Home
/DEPENDENT y
/METHOD=ENTER s confound .
y   0  1s   2 confound
y1c  2  1 0  1 6  8
Counterfactual Predicted Values from
Regression: Effect isn’t 4, it’s 1!
Home
Keys to Statistical Control
• Need to know and measure relevant covariates
(identically independently distributed errors)
– Omitted confound  dependencies among units that
have similar values on the confound (e.g., teachers
who are similarly inclined to help)
• Assumes optimal control for covariate is linear
function of X’s
• Assumes constant treatment effect
Home
How Regression works:
Explained Variance in Regression
Circles represent variances
Y
X1
X2
X1 and X2 explain different parts of Y
X1 and X2 are independent (uncorrelated)
Home
But usually there is multicollinearity
(or the need for statistical control)
‘competition’ between the variables (in explaining Y)!
Y
X1
X2
The degree of competition depends on the amount of
Correlation (overlap) between the ‘independent’ (!) variables
Home
2
YX1
r
2
Y . X1 X 2
R
 ac
 abc
r
Y
a
pr 
ae
RY2. X 1 X 2  rYX2 2

1  rYX2 2
2
YX 1
bc
b
pr 
be
RY2. X 1 X 2  rYX2 1

1  rYX2 1
2
YX 2
e
a
b
c
X1
srYX2 1  a  RY2. X1 X 2  rYX2 2
Home
2
YX 2
X2
srYX2 2  b  RY2. X1 X 2  rYX2 1
Focus on Overlap and alternative
explanations
Home
Example: The effect of National Board Certification on the
help a teacher provides others (Frank et al)
Descriptive
Q: Do National Board certified teachers (NBCTs) provide
more help to others in their schools than non-NBCTs?
A: Yes, the average NBCT is nominated by about 1.6
others as providing help with instruction, in contrast to
about .95 for a non-NBCT.
Causal Inference
Q: Does National Board certification affect the amount of
help a teacher provides?
Home
Data
•
•
•
•
•
•
•
47 schools (in 2 states)
1583 teachers
Case studies in 4 schools
Surveys:
background
attitudes towards leadership and bct
sociometric:
• teachers were asked to list others who helped with
instruction
Home
Syntax for Descriptives
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\workshop.sav'.
DESCRIPTIVES
VARIABLES=bct leave female glevel owned yrstch nograde attracth expanseh bcttreat
leader leadna white
/STATISTICS=MEAN STDDEV .
Home
Table 1: Measures and Descriptive statistics (n=1363)
Mean
Std Dev
Variable
Number other teachers helped by respondent (attracth)
.96
1.08
number other teachers who helped respondent (expanseh)
.91
.77
Board certified teacher, 1=Yes, 0 = No
.13
.34
White (white)
.84
.37
Female (female)
.93
.25
highest grade level taught (glevel)
8.32
4.13
no grade level indicated (nograde)
.04
.19
level of own education
3.01
1.02
(yrstch)
16.12
8.64
Intention to leave (leave)
1.72
.72
perceived advantage of certification (bcttreat)
1.95
.55
enhancement through leadership
2.35
1.20
missing on enhancement of teaching (leadna)
.17
.37
number certified others in school ( nbct)
2.31
2.44
number certified others in school squared (nbctsq)
6.42
11.69
years teaching
Home
(BCT)
(owned)
(n is approximately 1208)
(leader)
Descriptives Separately for BCT
and non-BCT
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\workshop.sav'.
SORT CASES BY bct .
SPLIT FILE
LAYERED BY bct .
DESCRIPTIVES
VARIABLES=leave female glevel owned yrstch nograde attracth expanseh white bcttreat
leader leadna nbct
/STATISTICS=MEAN STDDEV .
Try it, what do you get?
Home
Yi t
Recall regression model with
statistical control for a confound
y   0  1s   2 confound
Help Provided   0  1Board Certification   2 Leadership
Home
Partialled and unpartialled (zero order) correlations
Unpartialled (zero-order, or total) variation between help provided (y)
and board certification (x) is .1762=.031
Variation between help provided (y) and board certification (x),
partialled for enhancement of teaching through leadership is
.1672=.028
Difference unpartialed and partialed is variance between board
certification (x) and help provided (y) also accounted for by
enhancement of teaching through leadership (confound):
.031-.028=.003
Home
How Regression Works: Overlapping Variances
Help provided
Help provided
Board
Certification
Variance between help provided and
board certification =.1762=.031
Home
Enhancement
Through
leadership
Board
Certification
Variance between help provided and
board certification,
Partialling for enhancement through leadership,
=.1672 =.028
How Regression Works:
Partial and Semi-Partial correlation
Partial Correlation: correlation between s and y,
where s and y have been controlled for the confounding variable
rs·y|cv 
rs·y  rs·cv  ry ·cv
1  ry2·cv 1  rs·2cv

.176  .072  .170
1  .1702 1  .0722
 .167
Semi-Partial Correlation: correlation between s and y,
where s has been controlled for the confounding variable
srs 
Home
rs y  rscv rycv
1 r
2
sxcv
 srs 
.176  .072  .170
1  .072
2
 .164
Regression and Correlation Coefficient
rs·y
sd ( y )
1.077
 1 , .176
 .557
sd ( s )
.341
T ratio for regression coefficient and correlation are identical
Home
Regression of Help Provided on Board Certification
Controlling for Enhancement of Teaching through
Leadership
Model: y=β +β c
0
rs·y|c
1
sd ( y | c)
1.075
 1|c , .167
 .534
sd ( s | c)
.336
Model: s=β0 +β1 c
Controlling for enhancement of teaching through leadership
Home
How Regression Works:
Impact of Enhancement of Teaching Through Leadership on
Correlation Between Board Certification and Help Provided
The Impact of a Enhancement of Teaching
through leadership on Correlation Between
rsy=.176
Board Certification
and Help Provided
S
Board
Certification
Y
Help
Provided
rsy=.167
rsy|cv
=.18
rscv=.17
rscv ×rycv
CV
Enhancement of
teaching through
leadership
Home
rycv =.07
Calculating Impacts:
Correlations Between BCT, Amount of Help Provided, and Covariates
Home
Impacts of Covariates on
Correlation between BCT and Help
Provided
Component Correlations
Home
Reflection
• What part if most confusing to you?
– Why?
– More than one interpretation?
• Talk with one other, share
• Find new partner and problems and
solutions
Home
Exercise
How Regression Works:
Exercise
• Calculate the correlation between board
certification and help provided
– Unpartialed
– Partialed (for something other than
leadership)
• (see basic calculations, sheet 1).
https://www.msu.edu/~kenfrank/research.htm#cau
sal
• Do same for example in a data set you
have
Home
Exercise: Find Impacts of measured Covariates on
Correlation between BCT and Help Provided
Use data file “Board Certified Teachers”
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\COURSES\causal '+
'inference\groningen\data\spass_data\workshop.sav'.
DATASET NAME DataSet6 WINDOW=FRONT.
CORRELATIONS
/VARIABLES=attracth bct expanseh white female leave glevel nograde owned yrstch leader nbct
nbctsq bcttreat leadna
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/matrix=out(forimp)
/MISSING=PAIRWISE .
GET
FILE= ' forimp'.
AUTORECODE
VARIABLES=ROWTYPE_ varname_ /INTO t n
/PRINT.
FILTER OFF.
USE ALL.
SELECT IF(t = 1 and n>=4).
EXECUTE .
COMPUTE impact = attracth * bct .
EXECUTE .
SORT CASES BY impact (D) .
SAVE
OUTFILE='impact'
/keep rowtype_ varname_ attracth bct impact
/COMPRESSED.
Home
Reminder: Motivation: If you don’t argue
scientifically, those who you disagree with
will, and your views will not be heard
Home
Concern over Missing Confound
(Internal Validity)
• Causal Inference concern: How much of the
estimate of the Board Certification effect would
have to be attributed to other factors to
invalidate the causal inference?
– Maybe NBCTS help more because they had a
previous inclination to help?
• We may never know ,but we can quantify the
concern
– What would the impact of a confound (e.g, inclination
to help) have to be to alter our Inference? (Frank,
2000)
Home
Full Regression of Help Provided Others on Board
Certification and Covariates
UNIANOVA
attracth BY school WITH bct leave female glevel owned yrstch nograde expanseh
leader white nbct nbctsq bcttreat leadna
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
Home = bct leave female glevel owned yrstch nograde expanseh leader white nbct
/DESIGN
nbctsq bcttreat leadna school .
Impact of an Unmeasured Confounding
Variable on Inference of Effect of Board
Certification on Help Provided
Board
Certified
(s)
Number
others
helped
(y)
t( 
1)
rscv
rscv ×rycv
Inclination to be Helpful
(confounding variable --cv)
Home
rycv
Home
What must be the Impact of an
Unmeasured Confounding variable
invalidate the Inference?
Step 1: Establish Correlation Between BCT
and Help Provided, partialling for all
covariates
Step 2: Define a Threshold for Inference
Step 3: Calculate the Threshold for the
Impact Necessary to Invalidate the
Inference
Step 4: Multivariate Extension, with other
Covariates
Home
Step 1: Establish Correlation
Between BCT and Help Provided,
partialling for all covariates
r 
t
(n  q  1)  t 2

6.79
(1156)  6.792
t taken from regression, =6.79
n is the sample size
q is the number of parameters estimated
N-q-1=1156
Home
 .196
Step 2: Define a Threshold for
Inference
• Define r# as the value of r that is just
statistically significant:
r 
#
t critical
(n  q  1)  t
r 
#
2
critical
1.96
(1156)  1.96
n is the sample size
q is the number of parameters estimated
tcritical is the critical value of the t-distribution for making an inference
r# can also be defined in terms of effect sizes
Home
2
 .058
Step 3: Calculate the Threshold for the
Impact Necessary to Invalidate the Inference
Define the impact: k =rx∙cv x ry∙cv and assume rx∙cv =ry∙cv (which
maximizes the impact of the confounding variable).
rx·y|cv 
rx·y  rx·cv  ry·cv
1 r
2
y ·cv
1 r
2
x·cv

rx·y  k
1 k
Set rx∙y|cv =r# and solve for k to find the threshold for the impact
of a confounding variable (TICV).
TICV 
rx·y  r #
1 | r # |
.196  .058
TICV 
 .147
1  .058
impact of an unmeasured confound > .147 → inference invalid
impact of an unmeasured confound < .147 → inference valid.
Home
Calculations made easy!
•
http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls
Home
Live Example
N-q=1131-18=1113.
T=.603/.092=6.56
Home
Impact Threshold=.142
Component correlations = .38
Frank, K.A., Gary Sykes, Dorothea Anagnostopoulos, Marisa Cannata, Linda Chard, Ann Krause, Raven McCrory. 2008. Extended
Influence: National Board Certified Teachers as Help Providers. Education, Evaluation, and Policy Analysis. Vol 30(1): 3-30.
Exercise 3: Impact Threshold Exercise
1)Identify a statistical inference from an article you
are interested in.
2) Describe possible confounds/alternative
explanations that could bias the estimate
3) Note the t-ratio and sample size
4) Calculate robustness of inference using
http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls
5) Explain your inference and how robust you think
it is. Why could your inference be challenged?
Home
Step 4: Multivariate Extension, with
Covariates
k=rx ∙cv|z× ry ∙ cv|z
Maximizing the impact with covariates z in the model implies
TICV  (1  rx2·z )(1  ry2·z )
rx·y| z  r #
1 | r # |
=.125
And
2
y ·cv
r
Home
 TICV
1 r
2
y ·z
1 r
2
x ·z
rx2·cv  TICV
1  rx2·z
1 ry2·z
SPSS Syntax for Obtaining Multivariate
Impact Threshold
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\workshop.sav'.
UNIANOVA
attracth BY school WITH leave female glevel owned yrstch nograde expanseh
bcttreat leader leadna white nbct nbctsq
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT =ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = leave female glevel owned yrstch nograde expanseh bcttreat leader leadna
white nbct nbctsq school .
UNIANOVA
bct BY school WITH leave female glevel owned yrstch nograde expanseh bcttreat
leader leadna white nbct nbctsq
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = leave female glevel owned yrstch nograde expanseh bcttreat leader leadna
Home
white nbct nbctsq school .
Obtaining R2
Home
Multivariate Calculations
•
http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls
Home
What must be the Impact of an Unmeasured
Confound to Invalidate the Inference?
If k > .125 (or .147 without covariates) then the inference is
invalid
If r x cv = ry cv, then each would have to be greater than k1/2
=.38 to alter the inference.
(multivariate correction, ry cv > .38 and r x cv >.34)
Furthermore, correlations must be partialled for covariates
z.
Impact of strongest measured covariate (perception
leadership will enhance teaching) is .012;
Impact of unmeasured confound would have to be ten
times greater than the impact of the strongest observed
covariate to invalidate the inference. Hmmm….
Home
Applications of Impact Threshold
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Frank, K.A., Gary Sykes, Dorothea Anagnostopoulos, Marisa Cannata, Linda Chard, Ann Krause,
Raven McCrory. 2008. Extended Influence: National Board Certified Teachers as Help
Providers. Education, Evaluation, and Policy Analysis. Vol 30(1): 3-30.
Frisco, Michelle, Muller, C. and Frank, K.A. 2007. Using propensity scores to study changing
family structure and academic achievement. Journal of Marriage and Family. Vol 69(3): 721–741
*Frank, K. A. and Min, K. 2007. Indices of Robustness for Sample Representation. Sociological
Methodology. Vol 37, 349-392. * co first authors.
Frank, K. 2000. "Impact of a Confounding Variable on the Inference of a Regression Coefficient."
Sociological Methods and Research, 29(2), 147-194
Crosnoe, Robert and Carey E. Cooper. 2010. “Economically Disadvantaged Children’s Transitions
into Elementary School: Linking Family Processes, School Contexts, and Educational Policy.”
American Educational Research Journal 47: 258-291.
Crosnoe, Robert. 2009. “Low-Income Students and the Socioeconomic Composition of Public
High Schools.” American Sociological Review 74: 709-730.
Maroulis, S. & Gomez, L. (2008). “Does ‘Connectedness’ Matter? Evidence from a Social
Network Analysis within a Small School Reform.” Teachers College Record, Vol. 110, Issue
9.
Cheng, Simon, Regina E. Werum, and Leslie Martin. 2007. “Adult Social Capital: How Family
and Community Ties Shape Track Placement of Ethnic Groups in Germany.” American
Journal of Education 114: 41-74.
William Carbonaro1 Elizabeth Covay1 School Sector and Student Achievement in the Era of
Standards Based Reforms. Sociology of eductaion vol. 83 no. 2 160-182 .
see also
Pan, W., and Frank, K.A. (2004). "An Approximation to the Distribution of the Product of Two
Dependent Correlation Coefficients." Journal of Statistical Computation and Simulation, 74, 419443
Home
Pan, W., and Frank, K.A., 2004. "A probability index of the robustness of a causal inference,"
Journal of Educational and Behavioral Statistics, 28, 315-337.
Consider Alternate Sample
(External Validity)
Causal Inference concern:
We cannot assert cause if the effect of Board
Certification is not constant across contexts.
Statistical Translation:
Would the inference be valid if the sample
included more of some population (e.g. teachers
in other states) for which the effect was not as
strong?
Rephrased for robustness: what must be the
conditions in the alternative sample to invalidate
the inference?
Home
Consider Alternate Sample
(External Validity)
Define  as the proportion of the sample that
is replaced with an alternate sample.
r
R
is correlation in unobserved data
is combined correlation for observed and unobserved data:
Rxy=(1-)rxy + 
Home
r
xy
.
Home
R
Thresholds for Sample
Replacement
r
Set =r# and solve for xy:
If half the sample is replaced (=.5), original
inference is invalid if
r
xy
< 2r#-rxy
Therefore, 2r#-rxy defines the threshold for replacement:
TR(=.5)
If
r
#/r
=0,
inference
is
altered
if
π
>
1-r
xy
xy .
Therefore 1-r#/rxy defines the threshold for
replacement: TR( xy=0)
r
Assumes means and variances are constant across samples, alternative calculations available.
Home
Home
Example of Thresholds for
Replacement
TR(=.5)= 2r#-rxy|z =2(.058)-.196=-.081.
Correlation between Board Certification and number
of others helped would have to be less than -.081
to alter inference if half the teachers in our sample
were replaced (e.g., with teachers from another
state).
r
TR( xy =0)= 1-r#/rxy|z =1-(.058/.196)=.71
More than 70% of teachers would have to be
replaced with others for whom Board Certification
has no effect ( xy =0) to invalidate the inference
in a combined sample.
r
Home
Calculations for Robustness of
Inference for External Validity
Home
Basis of Comparison: Separate Effects for
observed subgroups
•
•
•
•
White(n=981): .71; Non-white(n=176): .27
Female(n=1080): .63; Male (n=77): -.50 !
Compare -.504 with TR(=.5)=-.081.
Results invalidate for populations consisting of more
male (elementary) teachers.
• Only 5 males who were bct:
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\workshop.sav‘
.
CROSSTABS
/TABLES=bct BY female
/FORMAT= AVALUE TABLES
/CELLS= COUNT ROW COLUMN TOTAL
/COUNT ROUND CELL .
Home
Generally, How Much Bias Must there be to
Invalidate the Inference?
Estimate=unbiased estimate + bias:
robserved =
runbiased
+M
where runbiased is defined by E(runbiased )= relationship in population or ρ
Inference invalid if runbiased < r# . So…
1) Set runbiased < r# and solve for M. Inference invalid if
M > (robserved - r#).
2) As a proportion of initial estimate, Inference invalid if
M/ robserved > 1-r#/ robserved=TR(rxy=0)=.71
Interpretation:71% of estimate must be attributable to bias to alter the inference
(same as % replacement if r unobserved=0)
3) Rule of thumb (for large n)
% bias need to invalidate inference = 1-tcritical/tobserved
Sykes et al, % bias needed to invalidate inference = 1-1.96/6.79=.71
Home
Exercise: Robustness for Sample
Representativeness (external Validity)
1)Identify a statistical inference in your own work
or in the literature for which there is concern
about the external validity
2) Identify possible populations for which the effect
may not apply
3) Note the t-ratio and sample size
4) Calculate robustness of inference using
http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls
5) Discuss with a new partner your inference and
how robust you think it is. Partner can
challenge. Then change roles.
Home
Assumptions are the bridge between statistical and
causal inference
Assumptions
Statistical Inference
Causal Inference
Cornfield, J., & Tukey, J. W. (1956, Dec.), Average Values of Mean Squares in Factorials.
Annals of Mathematical Statistics, 27(No. 4), 907_949.
Home
In Donald Rubin’s words
“Nothing is wrong with making
assumptions; on the contrary, such
assumptions are the strands that join the
field of statistics to scientific disciplines.
The quality of these assumptions and their
precise explication, not their existence, is
the issue”(Rubin, 2004, page 345).
Home
Conclusions for Robustness
Indices
• Objections to moving from statistical to causal inference in terms of
violations of assumptions
– No unobserved confounding variables
– Treatment has same effect for all
• Robustness indices quantify how much must assumptions must be
violated to invalidate inference.
• No new causal inferences!
– robustness indices merely quantify terms of debate regarding
causal inferences.
• Can be used with any threshold.
• Can be used (theoretically) for any t-ratio
– Discuss: Statistical inference as threshold?
• Extension of sensitivity – indices are a property of original estimate
Home
Limitations
• Would like to do experiment
• Would like longitudinal data to control for
previous inclination to help
– (perhaps leverage this study to get a second
wave of data?)
• Don’t know if BCT’s are more helpful or
merely perceived as such because of
symbolic status
• Nationally representative data?
Home
Defining Absorption
• The impact of any given covariate can be
absorbed by controlling for other
covariates  the impact of covariate c on
the association between treatment x and
outcome y is reduced once controlling for
covariate a
Absorb ( a , c , x , y )
Home
 1
impact of c on x
impact of c on x given a
 1
rcx|a rcy|a
rcx rcy
The impact of confound c on the association between treatment x and
outcome y is reduced once controlling for covariate a
X
y
rsy
rscv
rscv ×rycv
a
a
Confound
Green indicates absorbed impact
Home
rycv
Home
Syntax for calculating absorption
SUBTITLE "Impact Partialing Leader".
GET
FILE=‘workshop.sav’.
PARTIAL CORR
/VARIABLES= attracth bct expanseh white female leave glevel
nograde owned yrstch nbct nbctsq bcttreat leadna BY leader
/SIGNIFICANCE=TWOTAIL
/matrix=out(forimpa.sav)
/MISSING=LISTWISE .
GET
FILE=forimpa.sav’.
AUTORECODE
VARIABLES=ROWTYPE_ /INTO t
/PRINT.
FILTER OFF.
USE ALL.
SELECT IF(t=3).
EXECUTE .
COMPUTE attracth_post=attracth.
COMPUTE bct_post=bct.
COMPUTE impact_post=attracth_post * bct_post.
EXECUTE .
SAVE OUTFILE='impactaa.sav'
/keep ROWTYPE_ VARNAME_ attracth_post bct_post impact_post
/COMPRESSED.
GET
FILE=impactaa.sav’.
AUTORECODE
VARIABLES=VARNAME_ /INTO n
/PRINT.
Home
FILTER OFF.
USE ALL.
SELECT IF(n>2).
EXECUTE .
SAVE OUTFILE='impacta.sav'
/keep ROWTYPE_ VARNAME_ attracth_post bct_post impact_post
/COMPRESSED.
GET
FILE='impact.sav'.
SORT CASES BY
VARNAME_ (A) .
SAVE OUTFILE='byn.sav'
/COMPRESSED.
GET
FILE='impacta.sav'.
SORT CASES BY
VARNAME_ (A) .
SAVE OUTFILE='byna.sav'
/COMPRESSED.
GET
FILE='byn.sav'.
MATCH FILES /FILE=*
/FILE='byna.sav'
/RENAME (ROWTYPE_ = d0)
/BY VARNAME_
/DROP= d0.
EXECUTE.
COMPUTE absorb=1-impact_post/impact .
EXECUTE .
SAVE OUTFILE='absorb.sav'
/keep ROWTYPE_ VARNAME_ absorb impact attracth bct
impact_post attracth_post bct_post
/COMPRESSED.
Extent to which Leader absorbs the impact
of other covariates on inference regarding
effect of BCT on help provided
Once controlling for leader less of a need to control for intention to
leave or years teaching
Home
Absorption Exercise
• Looking at the absorption and impact matrices can you
guess what will happen when you add female to the
model? How about when you add number of others in
the school who are board certified (nbct)
• Check using syntax:
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\workshop.sav'.
UNIANOVA
bct BY school WITH leave female glevel owned yrstch nograde
expanseh bcttreat leader leadna white nbct nbctsq
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = leave female glevel owned yrstch nograde expanseh bcttreat leader leadna
white
nbct nbctsq school .
Home
How a pre-test absorbs impact
Home
Analyzing Pre/post-test designs
ANCOVA: Analysis of Covariance
• Research questions:
• pre- versus post interacting with treatment (Not recommended): Is
there a difference between pre and post scores, and does that
difference depend on whether or not the subject participated in the
treatment?
• ANCOVA: Controlling for the pre-test, did subjects who participated
in the treatment score higher on the post test than those in the
control?
– Did the effect of the treatment depend on the level of the pre-test -- did
the treatment work better for some than others
• Difference scores: Did the subject who participated in the treatment
learn more (or grow more) from pre-test to post-test than those in
the control?
Home
Models:
pre- versus post interacting
with treatment (Not recommended):
yi  ˆ0  ˆ1dposti  ˆ2dtreatment i  ˆ3dpost x dtreatment i  eˆi
Problem: observations are not independent –
each person is measured twice, pre and post. The effects of each
person who mutually effect error terms for the same person, and
thus be correlated:
ybob pre  ˆ0  ˆ1dpostbob pre  ˆ2dtreatment bob pre  ˆ3dpost x dtreatment bob pre  eˆbob pre
ybob post  ˆ0  ˆ1dpostbob post  ˆ2dtreatment bob post  ˆ3dpost x dtreatment bob post  eˆbob post
The errors for the two models in (2) will be dependent due to the
common effect of “bobness” on each error that has not been
accounted for.
Home
ANCOVA:
•
Controlling for the pre-test, did subjects who participated in the
treatment score higher on the post test than those in the control? Did
the effect of the treatment depend on the level of the pre-test -- did
the treatment work better for some than others
post achievement i  ˆ0  ˆ1prei  ˆ2dtreatment i  ˆ3pre x treatment i  eˆi
Alternate Expression of model with factors (categorical variables),
covariates (continuous variables) and interactions
post achievement i     j  ˆ prei   j preij eˆij
Home
Difference Scores
• Construct: Δyi = yposti - yprei . This
measures the change from y-pre to y-post for
person i.
• Model
i  ˆ0 +ˆ1dtreatment i  eˆi
Advantages: only one observation/person. Essentially modeling “growth.”
Disadvantage: cannot test for interaction effect.
Home
When to use Difference scores
versus ANCOVA
•
•
•
•
•
•
•
•
•
•
Allison argues use difference scores when the pre-test is not considered a causal
predictor of either the treatment or control.
A. Pre-test “causing” outcome: Stocks versus flows
The pre-test can “cause” the post test when the outcome like a “stock” -- the
outcome has an inherent persistence over time -- such as height, which typically
cannot decrease (Allison, page 107). In this case, use ANCOVA.
The pre-test is not considered “causal” for most measures of behavior and
attitude which must be regenerated each time, like something that “flows” which can
therefore be cut off.
B. Pre-test “causing” treatment:
Examples (Allison, page 109):
(use Δ)All seniors in high school A are enrolled in the treatment, and the SAT is
administered before and after the treatment or control period. All students in High
school B serve as controls.
(use ANCOVA): The SAT is administered as a pretest to a group of high school
seniors. Those who score below 400 are enrolled in the treatment, and those who
score above 400 are in the control.
(use Δ): Seniors self-select into treatment & control before seeing the results of
a pre-test administration of the SAT.
(use ANCOVA): Seniors self-select into the program after seeing the results of
a pre-test administration of the SAT.
Home
Flow Chart for use of Difference Scores versus
ANCOVA
ANCOVA
Difference Score
i  ˆ0 +ˆ1dtreatment i  eˆi
post achievement i  ˆ0  ˆ1prei  ˆ2dtreatment i  ˆ3pre x treatment i  eˆi
Use ANCOVA
Difference score
If test has high reliability
No
Home
Pre-test cause
treatment conditions?
yes
Absorption of Impact Via Randomly
Assigned Treatment
Green area goes to zero
Home
Home
How Random Assignment Absorbs Impact
Random assignment (s)
rys|x=0
rxs
Treatment
(x)
Number
others
helped
(y)
t( 
1)
rxcv
rxcv ×rycv
Inclination to be Helpful
(confounding variable --cv)
Home
rycv
How Does Regression
Discontinuity Absorb Impact?
• Criteria for Assignment to treatment
conditions known with certainty
• Comparison of those who just exceeded
criteria with those who just missed criteria.
Home
How Regression Discontinuity
absorbs impact
Home
How Instrumental Variable Absorbs Impact
Instrumental
Variable
Fidelity
rys|x=0
rxs
Assumed
Treatment
(x)
Number
others
helped
(y)
t( 
1)
rxcv
rxcv ×rycv
Inclination to be Helpful
(confounding variable --cv)
Home
rycv
Home
Impact Thresholds and
Instrumental variables
• Can still do impact threshold.
• Define iv as the instrumental variable, cv as the
confound.
• Exclusion restriction: For any confounding
variable for which r cv y > 0, r iv cv must equal 0.
In other words, r iv y x r iv cv =0.
• But what if this doesn’t hold? Inference
invalidated by r iv y x r iv cv . This is the impact of
a confound.
• Can compare with existing relationships
between IV and other covariates.
Home
Comment on Instrumental
Variables
•
Exclusion restriction: instrument related to treatment assignment but related to
outcome only through treatment is difficult to satisfy
–
–
•
•
•
•
Attempts: draft # (Angrist et al)
Whether you’re Catholic or not for attending catholic school
A recent meta-analysis [Glazerman, Stephen, Levy, Dan and Myers, David (2003).
Nonexperimental versus Experimental Estimates of Earnings Impacts.” Annals,
AAPS (589): 63-85] found that statistical control for a prior measure most
approximated randomized experiments in a meta-analysis of effects of welfare, job
training and employment service programs on earnings.
Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the
importance of reliable covariate measurement in selection bias adjustments using
propensity scores. Journal of Educational and Behavioral Statistics.
Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (in press). The
importance of covariate selection in controlling for selection bias in observational
studies. Psychological Methods.
Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which
experiments and observational studies produce comparable causal estimates: New
findings from within-study comparisons. Journal of Policy and Management. 27 (4),
724–750.
Home
Parents of Friends as Instrument
for Friends?
Home
Reflection
1) Identify the aspects that are unclear to
you or that concern you
2) Find a partner or two and discuss your
concerns
3) Be prepared to teach others or share
concerns
Home
Schools as Fixed or Random
•
Problem: students and teachers are nested within schools (data
are multilevel)
Common problem in social science research: people nested
within organization
•
–
If no control for organizations, members of a given organization are
commomly affected by that organization
Example: All students are commonly affected by their principal
Implication: error terms are not independent, standard errors are
biased, p values are wrong!
–
–
•
•
Response: control for schools
Fixed effects: enter a dummy variable for each school (except
one) to control for school effects.
–
•
Same way one controls for gender or race
Random effects (multilevels):
–
Home
Assume there is a distribution (e.g., normal) of effects across schools,
only estimate paramters of distribution
Schools as Fixed or Random
•
Fixed: essentially using dummy variables to control each school
–
–
–
–
•
Spends degrees of freedom – 1 for each school
Focus on individual effects within contexts
Schools in the sample are the population of interest
Controls for all unobservable factors associated with school
Random: assume residual school effects are normally distributed
–
–
Only estimate mean and variance, not each one
Can estimate effects at indiviudal or school level, as well as cross-level
interactions (slopes as outcomes)
Schools are considered a sample from a larger population
Controls for all unobservable factors associated with school?
–
–
•
•
Pretty much, with careful centering (see next results)
Biggest difference is whether all predictors are adjusted for group
characteristics
–
–
Fixed effects: yes
Random effects: No
•
•
Home
(unless you group mean center all variables)
Subtract the group (school) mean from each predictor
Syntax for Schools as Random versus Fixed Effects
SORT CASES BY
school (A) .
AGGREGATE
/OUTFILE=*
MODE=ADDVARIABLES
/BREAK=school
/bct_mean = MEAN(bct) /expanseh_mean = MEAN(expanseh) /white_mean = MEAN(white) /female_mean =
MEAN(female) /leave_mean = MEAN(leave) /glevel_mean = MEAN(glevel) /nograde_mean = MEAN(nograde)
/owned_mean = MEAN(owned) /yrstch_mean = MEAN(yrstch) /leader_mean = MEAN(leader) /nbct_mean = MEAN(nbct)
/nbctsq_mean = MEAN(nbctsq) /bcttreat_mean = MEAN(bcttreat) /leadna_mean = MEAN(leadna).
COMPUTE bct= bct -bct_mean.
COMPUTE expanseh = expanseh -expanseh_mean.
COMPUTE white = white - expanseh_mean .
…………………………………
COMPUTE leadna = leadna - leadna_mean .
EXECUTE .
SAVE OUTFILE=*+ ' inference workshop\spss dataset\cen.sav'
/keep school attracth q71 bct expanseh white female leave glevel nograde owned yrstch leader nbct nbctsq bcttreat
leadna.
UNIANOVA
attracth BY school WITH bct expanseh white female
leave glevel nograde owned yrstch leader nbct
nbctsq bcttreat leadna
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct expanseh white female leave glevel
Home
nograde owned yrstch leader nbct nbctsq bcttreat
leadna school .
UNIANOVA
attracth BY school WITH bct expanseh white female
leave glevel nograde owned yrstch leader nbct
nbctsq bcttreat leadna
/RANDOM=school
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct expanseh white female leave glevel
nograde owned yrstch leader nbct nbctsq bcttreat
leadna.
Output Controlling for Schools as Random Effects
Compare with estimate of .622681 (se=.091653) from model
Home
Controlling for schools as fixed effects
Statistical power in multilevels
• How to choose
– number of cases per unit
– Number of units
• Where to allocate resources:
• Rules of thumb
– the larger the intraclass correlation (e.g., variation between
schools) the more df are based on number of units, and you
should sample more units and fewer per unit
– the smaller the intraclass correlation the more df are based on
number of observations within units, and you should sample
more observations per unit.
– 80 is good to detect moderate effect.
– You need less if you have a pretest – increases precision
Home
Optimal design software
– http://sitemaker.umich.edu/group-based/home
– Developed by Raudenbush, S
– http://sitemaker.umich.edu/groupbased/optimal_design_software
Home
References
http://sitemaker.umich.edu/group-based/references
References
Bloom, H. S., Richburg-Hayes, L., & Black, A. R. (2007). Using Covariates to Improve Precision for Studies That
Randomize Schools to Evaluate Educational Interventions. Educational Evaluation and Policy Analysis, 29(1), 30-59.
(http://epa.sagepub.com/cgi/content/abstract/29/1/30, 10-03-2007)
This article examines how controlling statistically for baseline covariates, especially pretests, improves the precision of
studies that randomize schools to measure the impacts of educational interventions on student achievement.
Empirical findings from five urban school districts indicate that (1) pretests can reduce the number of randomized
schools needed for a given level of precision to about half of what would be needed otherwise for elementary
schools, one fifth for middle schools, and one tenth for high schools, and (2) school-level pretests are as effective in
this regard as student-level pretests. Furthermore, the precision-enhancing power of pretests (3) declines only
slightly as the number of years between the pretest and posttests increases; (4) improves only slightly with pretests
for more than 1 baseline year; and (5) is substantial, even when the pretest differs from the posttest. The article
compares these findings with past research and presents an approach for quantifying their uncertainty.
Hedges, L. V., & Hedberg, E. C. (2007). Intraclass Correlation Values for Planning Group-Randomized Trials in
Education. Educational Evaluation and Policy Analysis, 29(1), 60-87.
(http://epa.sagepub.com/cgi/content/abstract/29/1/60, 10-03-2007)
Experiments that assign intact groups to treatment conditions are increasingly common in social research. In educational
research, the groups assigned are often schools. The design of group-randomized experiments requires knowledge
of the intraclass correlation structure to compute statistical power and sample sizes required to achieve adequate
power. This article provides a compilation of intraclass correlation values of academic achievement and related
covariate effects that could be used for planning group-randomized experiments in education. It also provides
variance component information that is useful in planning experiments involving covariates. The use of these values
to compute the statistical power of group-randomized experiments is illustrated.
Raudenbush, S. W. (1997). Statistical Analysis and Optimal Design for Cluster Randomized Trials. Psychological
Methods, 2(2), 173-185. (raudenbush.1997.pdf, 1854.0 kb, 10-03-2007)
Raudenbush, S. W., & Liu, X. (2001). Effects of Study Duration, Frequency of Observation, and Sample Size on
Power in Studies of Group Differences in Polynomial Change. Psychological Methods, 6(4), 387-401.
(raudenbush.liu.2001.pdf, 1551.0 kb, 10-03-2007)
Raudenbush, S. W., Martinez, A., & Spybrook, J. (2007). Strategies for Improving Precision in Group-Randomized
Experiments. Educational Evaluation and Policy Analysis, 29(1), 5-29.
(http://epa.sagepub.com/cgi/content/abstract/29/1/5, 10-03-2007)
Interest has rapidly increased in studies that randomly assign classrooms or schools to interventions. When well
implemented, such studies eliminate selection bias, providing strong evidence about the impact of the interventions.
However, unless expected impacts are large, the number of units to be randomized needs to be quite large to
achieve adequate statistical power, making these studies potentially quite expensive. This article considers when
Home
and to what extent matching or covariance adjustment can reduce the number of groups needed to achieve
adequate power and when these approaches actually reduce power. The presentation is nontechnical.
Home
Home
Differential Treatment Effects and
Heckman’s Rationality
• Individuals choose treatments they expect will
be most beneficial to them – they can anticipate
outcome of treatment.
– Treatment effect for treated > treatment effect for
control
– Attend to assignment mechanism – factors that affect
choice of treatment
• OLS estimates average treatment effect
– Invalidates paradigm of randomized experiment
because people choose treatments.
Home
Differential Treatment Effects
Home
Policy Implications
• Treatment effect for treated evaluates
effect of existing program for those who
received it.
• Treatment effect for control evaluates
effect of program if it is expanded to those
now receiving the control.
Home
Propensity scores
– Estimate differential treatment effects
– Improve covariance adjustment
– Non-monotonic relationship between propensity
and discriminant function of covariates
– Unequal variances in treatment and control group
» Dilation effect of treatment (Rosenbaum 2000)
– Motivate evaluation of assignment mechanism
• Cf. Heckman’s 2005 critique of Rubin/Holland model
– Align with counterfactual
• matched comparisons
– No need to match on all covariates
• comparisons within propensity strata
• Presentation loosely based on “
Home
– Introduction to Propensity Score Matching” Guo et al
– http://ssw.unc.edu/jif/sacws/docs/Day1a.ppt
Definition of Propensity
Propensity of receiving treatment (i.e., s=1)
given covariates x = e(x) = Pr{s = 1|x},
Note e(x) not a probability, since all subjects
have already received the treatment (1) or not
(0).
Can be obtained as predicted value from
logistic regression
Home
Impact of an Unmeasured Confounding
Variable on Inference of Effect of Board
Certification on Help Provided
Board
Certified
(s)
Number
others
helped
(y)
t( 
1)
rscv
rscv ×rycv
rycv
Inclination to be Helpful
(confounding variable --cv)
Home
Frank, K.A., Gary Sykes, Dorothea Anagnostopoulos, Marisa Cannata, Linda
Chard, Ann Krause, Raven McCrory. 2008. Extended Influence: National Board
Certified Teachers as Help Providers. Education, Evaluation, and Policy
Analysis. Vol 30(1): 3-30.
Use Propensity to Weight Analysis:
For Treatment Effect for Treated:
t (1  t )e( x)
 (t , x)  
1 1  e( x )
Where ω is the weight,
t=treatment (1or 0),
e(x) is propensity to have received the treatment
(predicted value from logistic regression)
Home
Propensity and the Relevant
Comparison: Estimate of Treatment for
those who received the treatment
Treatment Effect
for Treated
Home
For Treatment Effect for Untreated
t (1  e( x)) 1  t
 (t , x) 

e( x )
1
Where ω is the weight,
t=treatment (1or 0),
e(x) is propensity to have received the treatment
(predicted value from logistic regression)
Home
Propensity and the Relevant
Comparison: Estimate of Treatment for
those who received the control
Treatment Effect
for control
Home
Use Propensity to Weight Analysis:
Estimate of Treatment for People at the Margin of
Indifference (EOTM)
t
1 t
 (t , x) 

e( x) 1  e( x)
Where ω is the weight,
t=treatment (1or 0),
e(x) is propensity to have received the treatment
(predicted value from logistic regression)
(Hirano and Imbens 2001; Robins Rotnitzky and Zhao1995)
Home
Propensity and the Relevant
Comparison: Estimate of Treatment for
People at the Margin of Indifference (EOTM)
Estimated Effect for
People at the Margin
of Indifference (EOTM)
Home
General procedure for propensity
score analysis
• Step 1) Estimate propensity of receiving the treatment (versus
control)
– using logistic regression of factors predicting treatment versus control
– Interpret logistic regression
– Save predicted values – these are the propensities
• Step 2) Balance
– Compare distribution of propensity by treatment and control groups
– Compare treatment and control by covariates (balance) accounting for
propensity
• Either by strata or using weights
• Step 3) Estimate effect of treatment on outcome by
– propensity strata
– matching treatment and control groups on propensity
• Includes composite matches (Heckman’s Kernel functions)
– Weighting analyses by propensity (ken’s preferred)
– Controlling for propensity (Heckman’s control functions)
Home
Step 1) SPSS Syntax for propensity model and saving
predicteds
GET
FILE='F:\RA work\for Ken\causal inference\SPSS\workshop.sav'.
LOGISTIC REGRESSION bct
/METHOD = ENTER expanseh white female leave glevel nograde owned yrstch leader nbct nbctsq bcttreat
leadna
/SAVE = PRED
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
COMPUTE pbct=pre_1.
EXECUTE .
SAVE OUTFILE='F:\RA work\for Ken\causal inference\SPSS\withp.sav'
/COMPRESSED.
GET
FILE='F:\RA work\for Ken\causal inference\SPSS\withp.sav'.
COMPUTE pbct=pre_1.
IF (pbct > 0) pweight=bct/pbct + (1-bct)/(1-pbct).
IF (pbct > 0) pweightt=bct + (1-bct)/(1-pbct).
IF (pbct > 0) pweightc=bct/pbct + (1-bct).
EXECUTE .
VARIABLE LABELS pbct 'baseline propensity'.
VARIABLE LABELS pweight 'weight EOTM: those on the margin weight'.
VARIABLE LABELS pweightt 'weight for treatement effect for treated'.
VARIABLE LABELS pweightc 'weight for treatement effect for control'.
EXECUTE.
SAVE OUTFILE='F:\RA work\for Ken\causal inference\SPSS\pmp.sav'
/COMPRESSED.
Home
Table 2: Logistic Regression for Being Board Certified
Independent Variable
Estimate
Standard Error
Wald Chi-Square
Pr>ChiSq
-6.8725
1.0566
35.1514
<.0001
White
-.078
.246
.101
.751
Female
1.447
.605
5.722
.017
highest grade level taught
-.0001
.023
.0000
.996
no grade level indicated
-1.176
.776
2.297
.130
level of own education
.403
.100
16.348
<.0001
Years teaching
.003
.011
.055
.814
Intention to Leave
-.097
.131
.549
.459
perceived advantage of
certification
.136
.160
.731
.393
Enhancement of teaching
through leadership
.695
.157
19.482
<.0001
missing on enhancement
of teaching
.962
.582
2.735
.098
number other teachers
who helped respondent
0.185
.1100
2.818
0.1230
number certified others in
school
.1234
.068
3.306
.069
number certified others in
schoolHome
squared
-.013
.014
.913
.339
Intercept
Interpreting logistic regression
• Key predictors
– Level of own education
– Enhancement of teaching through leadership
• Adjusting for context through number of
others in school who were certified
• Keep in even marginal variables
• Logistic function correctly classifies 62% of
cases when classified as BCT if probability
>.13 (13% of teachers are Board certified)
Home
Step 2) Syntax for checking balance of
propensity
GET
FILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\pmp.sav'.
CROSSTABS
/TABLES=bct BY female
/FORMAT= AVALUE TABLES
/CELLS= COUNT ROW COLUMN TOTAL
/COUNT ROUND CELL .SORT CASES BY
bct (A) .
EXAMINE
VARIABLES=pbct pweight pweightt pweightc BY bct
/PLOT BOXPLOT HISTOGRAM
/COMPARE GROUP
/STATISTICS NONE
/CINTERVAL 95
/MISSING LISTWISE
Home
/NOTOTAL.
Boxplot Comparison of Distributions of Propensity between
NBCTs and non-NBCTs: Common support
Propensity
Score
Home
Other
NBCT
EOTM Weights before Trimming
Home
Code for Trimming weight and recheck
balance of propensity
RECODE
pweight (20 thru Highest=20) .
EXECUTE .
RECODE
pweight pweightc (20 thru Highest=20) .
EXECUTE .
SAVE OUTFILE='C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\mp.sav'
/COMPRESSED.
subtitle "visual of balance of weights and propensity".
EXAMINE
VARIABLES=pbct pweight pweightt pweightc BY bct
/PLOT BOXPLOT HISTOGRAM
/COMPARE GROUP
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Home
Weights after trimming
Home
Syntax for checking balance of
covariates
DESCRIPTIVES
VARIABLES=pweight
/STATISTICS=MEAN .
COMPUTE npweight = pweight /1.943653666769.
EXECUTE .
WEIGHT
BY npweight .
T-TEST
GROUPS = bct(0 1)
/MISSING = ANALYSIS
/VARIABLES = attracth expanseh white female leave glevel nograde owned yrstch leader nbct nbctsq
bcttreat leadna
/CRITERIA = CI(.95) .
WEIGHT
BY npweight .
SORT CASES BY bct .
SPLIT FILE
LAYERED BY bct .
DESCRIPTIVES
VARIABLES=attracth expanseh white female leave glevel nograde owned yrstch leader nbct nbctsq bcttreat
leadna
Home
/STATISTICS=MEAN STDDEV MIN MAX.
Testing for Balance, weighted by Propensity (EOTM)
BCT
(n=162)
Non-BCT
(n=1038)
Number other teachers helped by respondent
1.38
(3.59)
.89
(1.06)
number other teachers who helped respondent
.90
(2.09)
.91
(.81)
White
.82
(1.00)
.84
(.39)
Female
.95
(.59)
.93
(.27)
highest grade level taught
8.42
(10.21)
8.3
(4.45)
no grade level indicated*
.01
(.31)
.04
(.21)
level of own education
3.08
(2.57)
3.01
(1.10)
years teaching
15.92
(18.22)
16.1
(9.56)
Intention to leave
1.68
(1.96)
1.70
(.80)
perceived advantage of certification
1.95
(1.22)
1.94
(.61)
enhancement through leadership
2.43
(3.13)
2.35
(1.29)
number certified others in school
2.43
2.31
Variable
Home
Exercise: What is Balance without
weights?
GET
FILE=‘C:\Documents and Settings\kenfrank\My Documents\MyFiles\sykes\mp.sav’.
subtitle "checking for balance among covariates".
T-TEST
GROUPS = bct(0 1)
/MISSING = ANALYSIS
/VARIABLES = attracth expanseh white female leave glevel nograde owned yrstch leader
nbct nbctsq bcttreat leadna
/CRITERIA = CI(.95) .
subtitle "checking for balance among covariates".
SORT CASES BY bct .
SPLIT FILE
LAYERED BY bct .
DESCRIPTIVES
VARIABLES=attracth expanseh white female leave glevel nograde owned yrstch leader
nbct nbctsq bcttreat leadna
/STATISTICS=MEAN STDDEV MIN MAX.
SPLIT
FILE
Home
OFF.
Step 3) syntax for estimating effects with weights
subtitle "weighted by pweight, EOTM".
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweight
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
subtitle "weighted by pweightt, for treated".
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweightt
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
subtitle "weighted by pweightc, for control".
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweightc
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = ETASQ PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
Home
*notes for syntax to get npweight, npweightt, npweightc
Syntax and Output for Treatment
Effect for Treated
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweightt
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
Home
Syntax and Output for Treatment
Effect for Control
Home
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweightc
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
Syntax and Output for EOTM
UNIANOVA
attracth BY school WITH bct
/REGWGT = npweight
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school .
Home
Table 3: Estimated Effect of Board Certification on Amount of Help Provided
Non bootstrap standard errors in ()
Model*
Coefficient
Std error
t-ratio
Weighted by propensity
(EOTM)
.569
.138
4.12
<.001
Weighted by propensity
(treatment effect for the treated)
.598
.130
4.60
<.001
Weighted by propensity
(treatment effect for the control)
.562
.138
4.07
<.001
Unweighted, with covariates a
.603
.092
6.56
<.001
Unweighted with covariates, using
multiple imputation
.621
.092
6.75
<.001
Unweighted, no covariates
.583
.092
6.35
<.001
Unweighted, no control for school
.540
.092
5.88
<.001
NBPTS certified teacher versus other
teachers who applied, EOTM (n=280,
bct=160, non-bct=120)
.577
.167
3.46
<.001
NBPTS certified teacher versus other
teachers who did not apply
EOTM (n=1017, bct=160, non-bct=857)
.562
.139
4.04
<.001
*Schools
P
Value
controlled for with fixed effects in all models unless otherwise stated.
n=1131
unless otherwise stated. Standard errors based on 500 bootstrap replications.
.
a R2=.21 for standard model with covariates.
Home
Interpretation
• Propensity weighting did not make much of a
difference!
• Allowed for focus on different treatment effects
• In paper, applied robustness indices to
estimates based on propensities
• Schools controlled for with fixed effects
– Accounts for any factor that can be attributed to
schools
• Principal, student composition, unmeasured factors
Home
Criticisms of propensity scores
• No better than the covariates that go into it
• no control for unobservables
• Ambivalent about quality of propensity model
• Group overlap must be substantial
• Propensity model should not fit too well!
• implies confounding of covariates and treatment
• not good enough implies poorly understood treatment
mechanism – poor control
• Short-term biases (2 years) are substantially less
than medium term (3 to 5 year) biases—the value of
comparison groups may deteriorate
Home
Reflection
1) Identify the aspects that are unclear to
you or that concern you
2) Find a partner or two and discuss your
concerns
3) Be prepared to teach others or share
concerns
Home
Alternative to Weighting by
Propensity
• Matching (Rosenbaum and Rubin 1983;
Morgan 2001)
• Analyses by Strata (Morgan 2001)
• Kernal Matching (Heckman et al.)
• Control for propensity (Heckman and
Robb’s control function – see Winship and
Morgan 677).
Home
Matching, Propensity strata and
Regression Adjustment
• Heckman refers to regression adjustment as same as
matching and propensity strata. Here’s why:
• infinite number of strata  matching:
– One pair of observations, in treatment and control, within
each stratum
• Implies that strata level is not related to treatment –
there’s a treatment and control in each stratum.
• Estimate from matching would be mean difference
between treatment and control groups
Home
Matching, Propensity strata and
Regression Adjustment
If there is one case in each stratum, estimate
from regression would be mean difference
between treatment and control because:
rx·y|cv 
rx·y  rx·cv  ry ·cv
1  ry2·cv 1  rx2·cv
But rx cv=0 (because there is
one case within each stratum),
therefore rx y| cv =rx y which will
generate same estimate as from regression.
Home
Syntax for Propensity by Strata
GET
FILE=C:\Documents and Settings\kenfrank\My
Documents\MyFiles\sykes\forstrata.sav'.
RANK
VARIABLES=rpbct (A) /RANK /NTILES (5)
/PRINT=YES
/TIES=MEAN .
RECODE
Nrpbct
(1=0) (2=1) (3=2) (4=3) (5=4)
(SYSMIS=SYSMIS) INTO rpbct .
EXECUTE .
SAVE OUTFILE=C:\Documents and
Settings\kenfrank\My
Documents\MyFiles\sykes\strata.sav'
/COMPRESSED.
SORT CASES BY
rpbct (A) .
SAVE OUTFILE='C:\Documents and
Settings\kenfrank\My
Documents\MyFiles\sykes\strata_s.sav'
/COMPRESSED.
GET
FILE='C:\Documents and Settings\kenfrank\My
Documents\MyFiles\sykes\strata_s.sav'.
Home
subtitle "checking for balance".
SORT CASES BY rpbct .
SPLIT FILE
LAYERED BY rpbct .
T-TEST
GROUPS = bct(0 1)
/MISSING = ANALYSIS
/VARIABLES = attracth expanseh white female leave glevel nograde owned yrstch
leader nbct nbctsq bcttreat leadna
/CRITERIA = CI(.95) .
SPLIT FILE
OFF.
SORT CASES BY rpbct .
SPLIT FILE
LAYERED BY rpbct .
DESCRIPTIVES
VARIABLES=attracth expanseh white female leave glevel nograde
owned yrstch leader nbct nbctsq bcttreat leadna
/STATISTICS=MEAN STDDEV MIN MAX.
SPLIT FILE
OFF.
subtitle "estimate by strata".
SORT CASES BY rpbct .
SPLIT FILE
LAYERED BY rpbct .
UNIANOVA
attracth BY school WITH bct
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = bct school.
SPLIT FILE
OFF.
Estimates by Strata (including
controls for school)
Strata
Est
se
1
.22
.33
2
.74
.21
3
.40
.21
4
.62
.19
Average
Home
.50
Exercise
Identify an inference regarding an effect in your
own work that might benefit from using
propensity scores:
1) Is the “treatment” dichotomous
2) Are you interested in differential treatment
effects (e.g., for the control and for the
treated)?
3) Do you know what factors affect treatment
choice?
4) Which propensity approach appeals to you?
Home
Substantive Conclusion
• We infer that National Board certification
has an effect on the amount of help
teachers provide to others
– Effect is at least .5 a standard deviation
– Largest effect more than 1-to-1 diffusion – for
every BCT, 1.5 receives help (e.g., 4 BCTs 
help to 6 others, a total of 10 in school
affected by the process).
– lets debate in quantitative terms of robustness
indices.
Home
Policy Implications
• Extra Benefit of Board Certification
– Contribute to social capital
– Spread ideas of board certification
– Help other teachers innovate
• Offer incentives for Board Certification
– Can advocate policy because inferences
robust
Home
Methodological Conclusion
• Propensity scores narrow the estimate
• Robustness Indices quantify threats to
validity
• Robustness Indices more informative than
propensity scores?
Home
Methods Reviewed
•
•
Counterfactual (2 possible outcomes)
Statistical control
–
•
Robustness of inference
–
–
–
•
Randomization
Instrumental variables
Pre-test
Differential treatment effects
–
•
for impact of a confounding variable (internal validity)
for representativeness of sample (external validity)
Robustness indices a form of sensitivity analysis
Absorption
–
–
–
•
Random and fixed effects
Treatment effect for treated/for control
Propensity scores
–
Attention to assignment mechanism
•
–
Using propensity scores in analysis
•
•
•
•
Home
Logistic regression
Weighting
Control
Strata
matching
References on Causal Inference
•
Holland, P. W. (1986), Statistics and causal inference. Journal of the American Statistical
Association, 81, 945_970.
•
Rubin, D. B. (1974), Estimating causal effects of treatments in randomized and non_randomized
studies. Journal of Educational Psychology, 66, 688_701.
•
Rubin, D.B. (2004). “Teaching Statistical Inference for Causal Effects in Experiments and
Observational Studies.”Journal of Educational and Behavioral Statistics, Vol 29(3): 343-368.
•
Winship, C., & Morgan, S. (1999). The Estimation of Causal Effects from Observational Data.
Annual Review of Sociology, 25, 659_707.
•
Winship, C. and Sobel, M. (2004) “Causal Inference in Sociological Studies”. Chapter 21 in
Handbook of Data Analysis (Hardy, Melissa., and Bryman, Alan, ed.). London: Sage Publications.
•
Heckman, James. (2005). “The Scientific Model of Causality.” Sociological Methodology.”
•
Masnki, Charles F. 1995. Identification Problems in the Social Sciences. Cambridge, Ma:
Harvard University Press.
•
Rosenbaum, Paul R. (2002). Observational Studies. New York: Springer.
On the Web
•
http://www.wjh.harvard.edu/soc/faculty/winship/CFA_site.html (Winship’s portal)
•
http://www.ets.org/research/dload/AERA_2004-Holland.pdf (recent Paul Holland)
•
http://bayes.cs.ucla.edu/jp_home.html (Judea Pearl)
•
http://plato.stanford.edu/entries/causation-counterfactual/ (philosophy of counterfactual)
•
http://sekhon.berkeley.edu/causalinf/causalinf.pdf syllabus on causal inference
Home
Technical Appendix B for calculating Impact Thresholds
t
critical
n
r#
observed t
r (x,y)
ITCV
r(x,cv)
r(y,cv)
1.96
12
95
=+A2/SQRT(A2*A2+B
2-3)
7.34
=+D2/SQRT(B22+D2*D2)
=+(E2-C2)/(1-C2)
=+SQRT(F2)
=+SQRT(F2)
Multivariate (with other covariates, z, in
model)
t
critical
nu
m
z
r#
R2 (x,z)
R2 (y,z)
ITCV
r(x,cv)
r(y,cv)
1.96
45
=+A7/(SQRT(A7*A7+
B2-B7-3))
0.15
0.13
=+F2*SQRT((1-D7)*(1-E7))
=SQRT(+F7*SQR
T((1-D7)/(1-E7)))
=SQRT(+F7*SQRT((1
-E7)/(1-D7)))
User enters values in yellow boxes
Indices calculated in pink
User can replace threshold value, r#, in green. Default is defined by
statistical significance
Note that R2 (x,z) and R2 (y,z) only need to be entered to correct ITCV calculations in F7-H7.
Can be downloaded from http://www.msu.edu/~kenfrank/
Home
Questions for Scotte Page
• When a group makes a decision from statistical
evidence
– are they making a causal inference?
– How much of discourse is : you didn’t control for xxx?”
• Ken says: How strong would unmeasured factor have to be
to invalidate inference
– Do you believe in statistical controls?
• Class project: Evaluating effect of NCLB
sanctions on Michigan schools
– Compare schools just above cutoff for sanctions with
those just below cutoff for sanctions
Home
Download