Econometrics (60 points) 1. State and discuss, respectively, the identification strategy (assumptions) used to identify the causal effect in the following 5 approaches: a) randomized controlled trials In randomized controlled trials, causal effect is identified by comparing trials between samples that receive intervention and samples that don't. Samples would be randomly divided into different groups, one of them would be the one that did not receive any interventions, known as control group, and the others would be intervention groups. As an example, one might use RCT to identify causal effects between wearing sunscreen and hydrated skin. In a simple RCT, samples would be divided into two groups, intervention group, where samples are required to wear sunscreen every day, and control group, where samples don’t have to wear sunscreen. After the testing period ends, data can then be gathered on how much on average are the skin more hydrated. If for example the control group and intervention group both show similar results, it would mean that the two variables (wearing sunscreen and skin hydration) have little to no causal effect. On the other hand, if the samples from the intervention group show better results and have better skin hydration, it would mean that the two variables have a positive causal effect. Randomized controlled trials are effective in identifying causal effects due to being able to provide unbiased estimates of the mean effect of a program. It is best to use with a large sample size. However, this approach is costly and timeconsuming. b) regression With regression, causal effect is identified by analyzing the data between a dependent variable (outcome x), and an independent variable (y). There are multiple types of regression but the most common way is with a linear regression. Simple linear regression is usually used to identify the causal effect between one independent variable (x), on the predicted outcome of the dependent variable (y). The first step to find out the causal effect relationship with simple linear regression is to plot out your data. Making a scatter plot chart with your data will give you a good visualisation on the relationship between the dependent and independent variable. Then, you can go ahead and draw the regression line and find out the slope that explains the causal effect of the two variables. Regression is very handy to use as it gives you concrete formulas of the relationship between variables that lets you easily predict how one increment change of one variable would affect the other variable. This method is best suited for forecasting and predictions. c) instrumental variable It’s a method whereby we’re trying to replicate the extended type of experiments of a randomized trial, we’re going to have something that does the randomization naturally, which will give us some randomization in our treatment, which we can then use to get a causal effect on our outcome variable. This is the general structure to have in mind for an instrumental variable which is the Z. The intuition is that Z will cause some changes in D, which will then translate into some changes in Y, and that will hopefully be able to identify some causal effect, using that variation in D, that's specific to Z, and not dependent on U. So in other words, the Z is the instrument, which will have an effect on the exposure, which will then affect the outcome. d) difference-in-differences method Difference in differences method identifies causal effects by comparing two sample groups, controlled and experimental. The experimental group would have an intervention (usually passage of law or new program implementation), and the controlled group would act as an adjustment so that external variables could be taken into account. It identifies the causal effect by comparing the changes in outcomes over time between the controlled and experimental group. One of the most important assumptions to fulfill when doing the difference-in-difference method is the parallel trend assumption. This assumption requires the controlled and experimental group to constantly have similar trends before the treatment happens. This is crucial since the changes after the treatment are what is being measured. e) regression discontinuity design Regression discontinuity design identifies causal effects by comparing two different sets of groups slightly below and above the treatment. This makes RDD a very unbiased approach because by only taking the samples slightly above and below, the treatment would be the only thing setting them apart and other external variables would be relatively similar. For example, to identify a causal effect between a policy that aids the job search for people 25 years old and younger and the impact it has on job finding, RDD looks at samples just slightly above 25 years old and slightly below 25 years old and compares it. People aged 24 and 26 would have relatively similar circumstances like incomes, education, relationship/family status, etc. The only thing setting them apart would be their eligibility in participating in the program, any other external factor would be most likely random. The difference between the outcomes (in this case the employment rates) between the two groups are called the RDD estimates and it can be associated with the treatment variable and hence be used to identify the causal effect between the treatment variable and the outcome. (42 points) 2. The data file TeachingRatings contains data on course evaluations, course characteristics, and professor characteristics for 463 courses at the University of Texas at Austin. A detailed description is given in TeachingRatings_Description. One of the characteristics is an index of the professor’s “beauty” as rated by a panel of six judgers. In this exercise you will investigate how course evaluations are related to the professor’s beauty. a) Construct a scatterplot of average course evaluations Course_Eval on the professor’s beauty Beauty. Does there appear to be a relationship between variables? There seems to be a very weak direct relationship between course_eval and beauty. The slope moves slightly upwards hence it is positive and the positive correlation is 0.189. b) Run a regression of average course evaluations Course_Eval on the professor’s beauty Beauty. What is the estimated intercept? What is the estimated slope? Explain why the estimated intercept is equal to the sample mean of Course_Eval. Course_eval = 3.998 + 0.133 x beauty. The estimated intercept is the mean of the dependent variable x (course_eval) minus the estimated slope (in this case it’s 0.133) times the mean of the regressor (beauty). In this case the variable beauty has a mean of 0, making the estimated intercept equal to the sample mean of course_eval. c) Professor Chen has an average value of Beauty, while Professor Mori’s value of Beauty is one standard deviation above the average. Predict Professor Mori’s and Professor Chen’s course evaluations. Standard deviation of beauty = 0.789 Professor Chen -> 3.998 + 0.133*0 = 3.998 Professor Mori -> 3.998 + 0.133 * (0+0.789) = 4.102937 d) Comment on the size of the regression’s slope. Is the estimated effect of Beauty and Course_Eval large or small? Explain what you mean by “large” and “small”. Is the estimated effect a causal effect? Explain. The estimated effect of beauty and course_eval is small. One standard deviation increase in beauty can increase course evaluation by 0.133*0.789 = 0.105. The correlation between the two variables are too small for it to be a causal effect. e) Does Beauty explain a large fraction of the variance in evaluations across courses? Explain. The R^2 is 0.036 meaning beauty explains 3.6% of the variance in course evaluations. This means beauty explains a small fraction only of the variance in evaluations across courses. f) In order to control for course characteristics and professor characteristics, we regress Course_Eval on Beauty with several control variables in the regression specification including Intro!OneCredit!Female!Minority and NNEnglish. Based on this multiple regression, what is the effect of Beauty on Course_Eval? Does the simple regression in part (2) include the omitted variable bias? Based on this multiple regression with the variables, beauty affects course_eval by explaining around 15.5% of the variance in course_eval. The standard error is higher because of correcting for heteroskedasticity. The coefficient of multiple correlation (R) equals 0.393153. It means that there is a weak direct relationship between the predicted data and the observed data. The simple regression in part 2 included omitted variable bias because it omitted several significant variables (female and onecredit). g) Professor Smith is an average looking black male whose native language is English. He teaches 3 credits of advanced courses. Predict Professor Smith’s class evaluations. Course_eval = 4.068 + 0.166beauty - 0.173 female - 0.166 minority + 0.635 onecredit - 0.244nnenglish + 0.011 intro Professor Smith’s class eval = 4.068 - 0.166 = 3.902