Uploaded by rui

Econometrics Exam: Causal Inference & Regression Analysis

(60 points) 1. State and discuss, respectively, the identification strategy
(assumptions) used to identify the causal effect in the following 5 approaches:
a) randomized controlled trials
In randomized controlled trials, causal effect is identified by comparing trials
between samples that receive intervention and samples that don't. Samples
would be randomly divided into different groups, one of them would be the one
that did not receive any interventions, known as control group, and the others
would be intervention groups. As an example, one might use RCT to identify
causal effects between wearing sunscreen and hydrated skin. In a simple RCT,
samples would be divided into two groups, intervention group, where samples
are required to wear sunscreen every day, and control group, where samples
don’t have to wear sunscreen. After the testing period ends, data can then be
gathered on how much on average are the skin more hydrated. If for example the
control group and intervention group both show similar results, it would mean that
the two variables (wearing sunscreen and skin hydration) have little to no causal
effect. On the other hand, if the samples from the intervention group show better
results and have better skin hydration, it would mean that the two variables have
a positive causal effect.
Randomized controlled trials are effective in identifying causal effects due to
being able to provide unbiased estimates of the mean effect of a program. It is
best to use with a large sample size. However, this approach is costly and timeconsuming.
b) regression
With regression, causal effect is identified by analyzing the data between a
dependent variable (outcome x), and an independent variable (y). There are
multiple types of regression but the most common way is with a linear regression.
Simple linear regression is usually used to identify the causal effect between one
independent variable (x), on the predicted outcome of the dependent variable (y).
The first step to find out the causal effect relationship with simple linear
regression is to plot out your data. Making a scatter plot chart with your data will
give you a good visualisation on the relationship between the dependent and
independent variable. Then, you can go ahead and draw the regression line and
find out the slope that explains the causal effect of the two variables. Regression
is very handy to use as it gives you concrete formulas of the relationship between
variables that lets you easily predict how one increment change of one variable
would affect the other variable. This method is best suited for forecasting and
c) instrumental variable
It’s a method whereby we’re trying to replicate the extended type of experiments
of a randomized trial, we’re going to have something that does the randomization
naturally, which will give us some randomization in our treatment, which we can
then use to get a causal effect on our outcome variable.
This is the general structure to have in mind for an instrumental variable which is
the Z. The intuition is that Z will cause some changes in D, which will then
translate into some changes in Y, and that will hopefully be able to identify some
causal effect, using that variation in D, that's specific to Z, and not dependent on
U. So in other words, the Z is the instrument, which will have an effect on the
exposure, which will then affect the outcome.
d) difference-in-differences method
Difference in differences method identifies causal effects by comparing two
sample groups, controlled and experimental. The experimental group would have
an intervention (usually passage of law or new program implementation), and the
controlled group would act as an adjustment so that external variables could be
taken into account. It identifies the causal effect by comparing the changes in
outcomes over time between the controlled and experimental group. One of the
most important assumptions to fulfill when doing the difference-in-difference
method is the parallel trend assumption. This assumption requires the controlled
and experimental group to constantly have similar trends before the treatment
happens. This is crucial since the changes after the treatment are what is being
e) regression discontinuity design
Regression discontinuity design identifies causal effects by comparing two
different sets of groups slightly below and above the treatment. This makes RDD
a very unbiased approach because by only taking the samples slightly above and
below, the treatment would be the only thing setting them apart and other
external variables would be relatively similar. For example, to identify a causal
effect between a policy that aids the job search for people 25 years old and
younger and the impact it has on job finding, RDD looks at samples just slightly
above 25 years old and slightly below 25 years old and compares it. People aged
24 and 26 would have relatively similar circumstances like incomes, education,
relationship/family status, etc. The only thing setting them apart would be their
eligibility in participating in the program, any other external factor would be most
likely random. The difference between the outcomes (in this case the
employment rates) between the two groups are called the RDD estimates and it
can be associated with the treatment variable and hence be used to identify the
causal effect between the treatment variable and the outcome.
(42 points) 2. The data file TeachingRatings contains data on course evaluations,
course characteristics, and professor characteristics for 463 courses at the
University of Texas at Austin. A detailed description is given in
TeachingRatings_Description. One of the characteristics is an index of the
professor’s “beauty” as rated by a panel of six judgers. In this exercise you will
investigate how course evaluations are related to the professor’s beauty.
a) Construct a scatterplot of average course evaluations Course_Eval on the
professor’s beauty Beauty. Does there appear to be a relationship between
There seems to be a very weak direct relationship between course_eval and
beauty. The slope moves slightly upwards hence it is positive and the positive
correlation is 0.189.
b) Run a regression of average course evaluations Course_Eval on the
professor’s beauty Beauty. What is the estimated intercept? What is the
estimated slope? Explain why the estimated intercept is equal to the sample
mean of Course_Eval.
Course_eval = 3.998 + 0.133 x beauty. The estimated intercept is the mean of
the dependent variable x (course_eval) minus the estimated slope (in this case
it’s 0.133) times the mean of the regressor (beauty). In this case the variable
beauty has a mean of 0, making the estimated intercept equal to the sample
mean of course_eval.
c) Professor Chen has an average value of Beauty, while Professor Mori’s value
of Beauty is one standard deviation above the average. Predict Professor Mori’s
and Professor Chen’s course evaluations.
Standard deviation of beauty = 0.789
Professor Chen -> 3.998 + 0.133*0 = 3.998
Professor Mori -> 3.998 + 0.133 * (0+0.789) = 4.102937
d) Comment on the size of the regression’s slope. Is the estimated effect of
Beauty and Course_Eval large or small? Explain what you mean by “large” and
“small”. Is the estimated effect a causal effect? Explain.
The estimated effect of beauty and course_eval is small. One standard deviation
increase in beauty can increase course evaluation by 0.133*0.789 = 0.105. The
correlation between the two variables are too small for it to be a causal effect.
e) Does Beauty explain a large fraction of the variance in evaluations across
courses? Explain.
The R^2 is 0.036 meaning beauty explains 3.6% of the variance in course
evaluations. This means beauty explains a small fraction only of the variance in
evaluations across courses.
f) In order to control for course characteristics and professor characteristics, we
regress Course_Eval on Beauty with several control variables in the regression
specification including Intro!OneCredit!Female!Minority and NNEnglish.
Based on this multiple regression, what is the effect of Beauty on Course_Eval?
Does the simple regression in part (2) include the omitted variable bias?
Based on this multiple regression with the variables, beauty affects course_eval
by explaining around 15.5% of the variance in course_eval. The standard error is
higher because of correcting for heteroskedasticity. The coefficient of multiple
correlation (R) equals 0.393153. It means that there is a weak direct relationship
between the predicted data and the observed data.
The simple regression in part 2 included omitted variable bias because it omitted
several significant variables (female and onecredit).
g) Professor Smith is an average looking black male whose native language is
English. He teaches 3 credits of advanced courses. Predict Professor Smith’s
class evaluations.
Course_eval = 4.068 + 0.166beauty - 0.173 female - 0.166 minority + 0.635
onecredit - 0.244nnenglish + 0.011 intro
Professor Smith’s class eval = 4.068 - 0.166 = 3.902