EC338: Assignment 1 Adrian Pulchny rm(list=ls()) Section A Question 1: V̂ const = V̂ homosk PN 2 We start by showing that: i=1 (Wi − W ) = N W (1 − W ) PN PN 2 2 2 2 i=1 (Wi − W ) = i=1 (Wi − 2Wi W + W ) = N W − 2W N W + W = N W (1 − W ) We are now moving to: V̂ const V̂ const = s2 ( N1c + 1 Nt ) Plugging in s2 = 1 2 N −2 (sc ∗ (Nc − 1) + s2t ∗ (Nt − 1) = 1 N −2 ( obs i:W i=0 (Yi P obs − Y c )2 + obs i:W i=1 (Yi P obs − Y t )2 ) We now focus on the denominator: ( N1c + 1 Nt ) N Nc Nt = Putting it all together results in: P obs 2 P obs 1 N −2 ( i:W i=0 (Yi −Y c ) + obs 2 i:W i=1 (Yiobs −Y t ) ) Nc Nt N Now we can manipulate the denominator PN Nc Nt 2 t = N NNc N i=1 (Wi − W ) N N = N W (1 − W ) = So in the end s2 equals : P P obs obs 1 (Yiobs −Y c )2 + (Yiobs −Y t )2 ) N −2 ( i:W i=0 i:W i=1 PN 2 i=1 (Wi −W ) Now we have shown what V̂ const looks like. Let’s switch to V̂ homosk , because we have already shown that the denominator is the same i will focus on the Numerator. We plug in for epsilon: PN 2 PN obs obs 1 1 − Y c ) ∗ (Wi − W ))2 i=1 ϵi = N −2 i=1 (Yi − Y − (Y t N −2 Using the definition of Y , splitting the sum and multiplying out we become: P PN obs obs obs obs obs 1 − (1 − W )Yc + W Yt − W Yc )2 + i:W i=1 (Yi − W Yt − (1 − i:W i=0 (Yi − W Y t N −2 [ W )Yc obs − Yt obs + W Yt obs + Yc obs − Yc W )2 ] Many things fall away and we are left with (here i include the denominator): PN obs 2 PN obs 2 1 ( (Yi −Yc ) + (Yi −Yt ) i:W i=1 V̂ homosk = N −2 i:W i=0 PN ) 2 i=1 (Wi −W ) 1 Which means that V̂ homosk = V̂ const Question 2 s̃t s̃c +N V̂ neyman = V̂ hetero where V̂ neyman = N c t PN PN 2 2 OLS OLS ) (Y −α −β Wi )2 ∗(Wi −W )2 ϵ (W −W i i P = i=1 V̂ hetero = Pi=1 i 2 2 2 2 ( i=1 (Wi −W ) ) ( i=1 (Wi −W ) ) Where the denominator can be written as: P t Nc 2 ( i=1 (Wi − W )2 )2 = (N W (1 − W ))2 = (N N N N ) = Nt2 N c2 N2 We plug in for the denominator and numerator PN obs obs 2 2 i=1 (Yi −Y −(Y t obs i:W i=0 Nt 2 N ) )∗(Wi −W )) ∗(Wi −W ) N 2 N c2 t N2 P ( −Y c (Yi −W Y t PN i:W i=0 −(1−W )Yc obs c 2 (Yi −Yc )2 +( N N ) +W Yt obs PN N 2 N c2 t N2 i:W i=1 −W Yc obs 2 PN ) ∗(−W )2 + i:W i=1 N 2 N c2 t N2 (Yi −W Yt obs −(1−W )Yc obs −Yt obs +W Yt obs +Yc obs (Yi −Yt )2 The denominator cuts out completely and we can shorten the last expression to : PN PN 1 1 2 2 i:W i=1 (Yi − Yt ) i:W i=0 (Yi − Yc ) + N 2 N2 c t s˜c 2 Nc which is equal to: + s˜t 2 Nt We have now shown that: V̂ neyman = V̂ hetero Question 3 Because variances are the same under homoskedasticity, we can use a more precise estimator by pooling the variance and weighting it with Nc and Nt this is not the case when we have heterogeneous treatment effects. Here we need to use the Neyman Variance estimator, because the variances are different we can’t build a pooled variance like we did with homogeneous treatment effects. How does this refer to the treatment effect? In the case of homoskedasticity all treatment effects are the same, because we have no differences in the variance of the control and treated group, which yields us the same treatment effect. In case of heterogeneity we have different variances in the control and treatment group, which then yield different treatmeant effects. Section B Our effect size is τ s = 0.207 = τ = 0.207s Using the formula we were given: τ0 P (Z < zα/2 − se(τ̂ ) | r = r0 ) + P (Z > z1−α/2 − τ0 zα/2 − se(τ̂ ) | r = r0 ) τ0 se(τ̂ ) | r = r0 ) = 1 − P (Z < z1−α/2 − τ0 se(τ̂ ) | r = r0 ) + P (Z < If we assume that attrition is independent of treatment status we get that Nt = 320 and Nc = 480. We can now plug in all the values and then compute the power. Here i use the normal distribution and phi represents the cdf of the normal distribution. 1−P (1.96− p 0.207s )+P (−1.96− p 20.207s ) s2 ∗( N1c + N1 ) s ∗( N1c + N1 ) t = 1−P (1.96− √ t 0.207 )+P (−1.96− √ 0.207 ) 1 1 1 1 ( 320 + 480 ) ( 320 + 480 ) 1 − ϕ(−0.91) + ϕ(−4.83) = 1 − (1 − ϕ(0.91) + (1 − ϕ(4.83) = 1 − 0.1814 + 0 = 0.8186 ≈ 0.82 We can also show this result using an R-package: We have a two sided test with different sample sizes. 2 = −Yc W )2 ∗(1−W )2 pwr.2p2n.test(h = 0.207, n1 = 320, n2 = 480, sig.level = 0.05, power = NULL, alternative = "two.sided") ## ## difference ## ## h ## n1 ## n2 ## sig.level ## power ## alternative ## ## NOTE: different of proportion power calculation for binomial distribution (arcsine transformation) = = = = = = 0.207 320 480 0.05 0.818144 two.sided sample sizes Section C Set observations data_generation <- data.frame(matrix(ncol=0, nrow=1000)) Set parameters y0 <- 1.2 y1 <- 0.015 y2 <- -0.02 y3 <- -0.01 set seed set.seed(333) Generate Covariates and error term data_generation$error <- rnorm(1000,0,0.55) data_generation$age <- floor(46*runif(1000)+20) data_generation$female <- rbinom(1000, 1, 0.5- (0.25*ln(data_generation$age - 19))/ln(46)) Generate Y0 data_generation$yi0 <- y0 + y1*data_generation$age + y2*data_generation$female + y3*data_generation$age Create the heterogenous treatment effect data_generation$tau <- rnorm(1000, mean = 0.02 + 0.06*ifelse(data_generation$age > 43, 1, 0), sd = 0.01) Treatment status data_generation$treat <- rbinom(1000, 1, 0.25 + (0.5*ln(data_generation$age - 19))/ln(46)) Generate Y1 data_generation$yobs <- data_generation$yi0 + data_generation$treat * data_generation$tau Estimating the model 1000 times set.seed(100) mat_1 <- replicate(1000, { df <- data.frame(matrix(ncol=0, nrow=1000)); df$error <- rnorm(1000,0,0.55); df$age <- floor(46*runif(1000)+20); 3 df$female <- rbinom(1000, 1, 0.5- (0.25*ln(df$age - 19))/ln(46)); df$yi0 <- y0 + y1*df$age + y2*df$female + y3*df$age * df$female + df$error; df$tau <- rnorm(1000, mean = 0.02 + 0.06*ifelse(df$age > 43, 1, 0), sd = 0.01); df$treat <- rbinom(1000, 1, 0.25 + (0.5*ln(df$age - 19))/ln(46)); df$yobs <- df$yi0 + df$treat * df$tau; lmodel <- glm(treat ~ age, family=binomial(logit), data= df); prop = predict.glm(lmodel, newdata = df, type="response"); lambda = 1/(propˆdf$treat * (1-prop)ˆ(1-df$treat)); df$agesat <- as.factor(df$age); reg1 <- lm(yobs ~ treat, data=df); reg2 <- lm(yobs ~ treat + age + female + age*female, data=df); reg3 <- lm(yobs ~ treat + agesat + 0, data=df); reg4 <- lm(yobs ~ treat, weights = lambda, data=df); reg5 <- lm(yobs ~ treat + agesat , weights = lambda, data=df); coef <- c(reg1$coefficients[2],reg2$coefficients[2], reg3$coefficients[1], reg4$coefficients[2], reg5$ }, simplify = "array") mat_2 <- t(mat_1) colnames(mat_2) <- c("beta1", "beta2", "beta3", "beta4", "beta5") summary(mat_2) ## ## ## ## ## ## ## ## ## ## ## ## ## ## beta1 Min. :0.01521 1st Qu.:0.10559 Median :0.13182 Mean :0.13259 3rd Qu.:0.15817 Max. :0.26449 beta5 Min. :-0.07243 1st Qu.: 0.02099 Median : 0.04933 Mean : 0.04903 3rd Qu.: 0.07591 Max. : 0.17550 beta2 Min. :-0.05270 1st Qu.: 0.02056 Median : 0.04734 Mean : 0.04645 3rd Qu.: 0.06977 Max. : 0.16964 beta3 Min. :-0.07374 1st Qu.: 0.01911 Median : 0.04643 Mean : 0.04655 3rd Qu.: 0.07328 Max. : 0.16815 beta4 Min. :-0.06858 1st Qu.: 0.01934 Median : 0.04695 Mean : 0.04644 3rd Qu.: 0.07178 Max. : 0.16847 describe(mat_2) ## ## ## ## ## ## beta1 beta2 beta3 beta4 beta5 vars 1 2 3 4 5 n 1000 1000 1000 1000 1000 mean 0.13 0.05 0.05 0.05 0.05 sd median trimmed mad min max range skew kurtosis se 0.04 0.13 0.13 0.04 0.02 0.26 0.25 0.14 -0.01 0 0.04 0.05 0.05 0.04 -0.05 0.17 0.22 0.12 -0.05 0 0.04 0.05 0.05 0.04 -0.07 0.17 0.24 0.07 -0.17 0 0.04 0.05 0.05 0.04 -0.07 0.17 0.24 0.11 -0.11 0 0.04 0.05 0.05 0.04 -0.07 0.18 0.25 0.05 -0.16 0 ggplot(as.data.frame(mat_2), aes(x=beta1)) + geom_density(aes(x=beta1, color="Beta1"), size = 1) + geom_density(aes(x=beta2, color="Beta2"),size = 1) + geom_density(aes(x=beta3, color="Beta3"), size = 1 ) + geom_density(aes(x=beta4, color="Beta4"), size = 1 ) + geom_density(aes(x=beta5, color="Beta5"), size = 1) + geom_vline(xintercept=0.05) + labs(title="kernel density plot") 4 kernel density plot 9 colour density Beta1 6 Beta2 Beta3 Beta4 Beta5 3 0 0.0 0.1 0.2 beta1 It clearly can be seen that something is wrong with Model 1, because it overestimates the treatment effect, which was specified to be 0.05 but is on average 0.13 in this model. We have an OVB in Model 1. In this case we can determine where the OVB comes from, because the omitted variables are observed (age and female) now we need to figure out from which variable it comes from. The OVB exists, when the following 2 conditions are fulfilled. 1) the omitted variable needs to be a determinant of Yobs , what means that the beta of the omitted variable shouldn’t be 0. And second: The included covariate needs to be correlated with the omitted variable. If those 2 conditions are fulfilled we have a OVB. We will now establish where the OVB comes from set.seed(1000) mat_11 <- replicate(1000, { dff <- data.frame(matrix(ncol=0, nrow=1000)); dff$error <- rnorm(1000,0,0.55); dff$age <- floor(46*runif(1000)+20); dff$female <- rbinom(1000, 1, 0.5- (0.25*ln(dff$age - 19))/ln(46)); dff$yi0 <- y0 + y1*dff$age + y2*dff$female + y3*dff$age * dff$female + dff$error; dff$tau <- rnorm(1000, mean = 0.02 + 0.06*ifelse(dff$age > 43, 1, 0), sd = 0.01); dff$treat <- rbinom(1000, 1, 0.25 + (0.5*ln(dff$age - 19))/ln(46)); dff$yobs <- dff$yi0 + dff$treat * dff$tau; reg6 <- lm(yobs ~ treat, data=dff); reg7 <- lm(yobs ~ treat + age, data=dff); reg8 <- lm(yobs ~ treat + female, data=dff); coef <- c(reg6$coefficients[2],reg7$coefficients[2], reg8$coefficients[2]) }, simplify = "array") mat_22 <- t(mat_11) colnames(mat_22) <- c("beta1", "beta2", "beta3") 5 summary(mat_22) ## ## ## ## ## ## ## beta1 Min. :0.02103 1st Qu.:0.10646 Median :0.13395 Mean :0.13359 3rd Qu.:0.16023 Max. :0.25837 beta2 Min. :-0.07710 1st Qu.: 0.01969 Median : 0.04689 Mean : 0.04657 3rd Qu.: 0.07391 Max. : 0.15132 beta3 Min. :0.02077 1st Qu.:0.09595 Median :0.12071 Mean :0.12130 3rd Qu.:0.14738 Max. :0.22218 describe(mat_22) ## ## ## ## ## ## ## ## vars n mean sd median trimmed mad min max range skew kurtosis 1 1000 0.13 0.04 0.13 0.13 0.04 0.02 0.26 0.24 -0.01 -0.27 2 1000 0.05 0.04 0.05 0.05 0.04 -0.08 0.15 0.23 -0.02 -0.19 3 1000 0.12 0.04 0.12 0.12 0.04 0.02 0.22 0.20 -0.05 -0.26 se beta1 0 beta2 0 beta3 0 beta1 beta2 beta3 ggplot(as.data.frame(mat_22), aes(x=beta1)) + geom_density(aes(x=beta1, color="Beta1"), size = 1) + geom_density(aes(x=beta2, color="Beta2"),size = 1) + geom_density(aes(x=beta3, color="Beta3"), size = 1 ) + geom_vline(xintercept=0.05) + labs(title="kernel density plot") kernel density plot 10.0 7.5 density colour Beta1 5.0 Beta2 Beta3 2.5 0.0 0.0 0.1 0.2 beta1 We can see that including the age as a covariate the OVB is significantly reduced, we can also observe that the Average treatment effect from beta2 is 0.04657, so very close to the real treatment effect of 5%. The third 6 regression is also interesting, because it shifts the graph more towards the true parameter of 0.05. The reason the second graph (where we included age) and the third graph (with female) are so different is because of the different correlations with the “treat” covariate. We can compute those: cor(data_generation$treat, data_generation$age) ## [1] 0.1616059 cor(data_generation$treat, data_generation$female) ## [1] 0.007368619 cor(data_generation$yobs, data_generation$female) ## [1] -0.359865 cor(data_generation$yobs, data_generation$age) ## [1] 0.3136586 Age is heavily correlated with the treatment variable, which is logical because we defined the treatment effect using the variable age. For female we see there is basically a zero correlation with treat but it is a determinant of Yobs . We see that age is causing an OVB so we should include it in the model as a covariate. But there is one case where we would not include the age variable. If age would be an outcome of the model, which means that the treatment variable is causal for age. I will now go back and interpret the other 4 models (reg2:reg5). The second regression matches the CEF of Yiobs , which is the perfect model, we can also see that if plotted it yields us a ATE of 5%. Model 3 is a saturated model and it is actually very similar to the second model but it lacks the female variable. But a lot of the female variation is caught in the age variable, because they are correlated with each other. Model 4 is an interesting model, because we use inverse probability weights. We have seen in model 1 that not including age leads to a OVB, while here we didn’t include age but have no OVB. The reason behind is that inverse probability weighting can remove confounding and therefore can reduce the bias of the unweighted estimators. The last model is the saturated model with inverse probability weighting. As we seen before model 4 had a ATE of 5% already and adding the additional dummy variables in model 5 does not improve the estimation. Section D a) The minimum wage raise in Ontario creates a good treated and control group. The treated group being the state “Ontario”, which experiences a significant increase in the minimum wage. The control group being the neighboring states or potentially other states in Canada. It is important to find a good control group, which exhibits the same characteristics as Ontario. But the problem is the assignment to treatment, which is not determined randomly but is aimed at low skilled workers or seasonal workers, who earn the minimum wage. This can lead to some bias, because the treated control group may not be directly comparable to the control group. Is the SUTVA fulfilled? We need to ask ourselves whether there are spillover effects between the treated and control group. An increase in the minimum wage could lead to some spillover effects if low skilled workers from neighboring states would change their place of residence to Ontario to profit from the increase. And we also need to ask ourselves whether there are any hidden variations in the treatment level. There is variation, because wages of liquor servers or people under 18 got a different raise but it was documented so we don’t have to worry about that. But there is one problem: The new minimum wage rates may increase every year in October 1 but they are announced usually on April in this case it was announced on June 2017, meaning that there could be some pre-emptive behaviour, as an example business owners could preemptively react to the change in the minimum wage in october by already firing some of their workers. Which would happen in the pre-treatment period and could therefore bias our treatment effect. Furthermore we would want a control group, which is not treated but every state got a increase in minimum wage. 7 b) Reading, filtering and grouping the data. data = read_dta("lfs_2010_2019_ages1564_20per.dta") df = data %>% filter(agegrp >= 1, agegrp<3, year >=2010, year<=2019, province >=5, province<=7, empstat >=1, empstat<=4) dff <- df %>% group_by(province,empstat,year) %>% summarise(n = n()) dfg <- df %>% group_by(province,empstat,year,month) %>% summarise(n = n()) Lets compute the yearly employment rates and plot them as a time series. employment_rates1 <- numeric(10) for (i in 1:10){ } employment_rates1[i] = (dff[i,4] + dff[10 + i,4])/(dff[i,4]+ dff[10 + i,4] + dff[20 + i,4] + dff[30 employment_rates2 <- numeric(10) for (i in 1:10){ } employment_rates2[i] = (dff[40+ i,4] + dff[50 + i,4])/(dff[40+i,4]+ dff[50 + i,4] + dff[60 + i,4] + df employment_rates3 <- numeric(10) for (i in 1:10){ } employment_rates3[i] = (dff[80+ i,4] + dff[90 + i,4])/(dff[80+i,4]+ dff[90 + i,4] + dff[100 + i,4] + d employment_ratesc <- data.frame(matrix(ncol=0, nrow=10)) employment_ratesc$QUE <- employment_rates1 employment_ratesc$ONT <- employment_rates2 employment_ratesc$MAN <- employment_rates3 empy <- ts(employment_ratesc, frequency = 1, start = 2010) dygraph(empy) %>% dyOptions(axisLineWidth = 2.0) 8 0.65 QUE ONT MAN 0.6 0.55 0.5 2010 Now we will compute the monthly employment rates and plot them. employment_ratesque <- numeric(120) for (i in 1:120){ } employment_ratesque[i] = (dfg[i,5]+dfg[120+i,5])/(dfg[i,5]+ dfg[120+i,5]+dfg[240 + i, 5] + dfg[360 + i employment_ratesont <- numeric(120) for (i in 1:120){ } employment_ratesont[i] = (dfg[480+i,5]+dfg[600+i,5])/(dfg[480+i,5]+ dfg[600++i,5]+dfg[720 + i, 5]+dfg[ employment_ratesman <- numeric(120) for (i in 1:120){ } employment_ratesman[i] = (dfg[960+i,5]+dfg[1080+i,5])/(dfg[960+i,5]+ dfg[1080+i,5]+dfg[1200 + i, 5] + monthly_emp_rates <- data.frame(matrix(ncol=0, nrow=120)) monthly_emp_rates$QUE <- employment_ratesque monthly_emp_rates$ONT <- employment_ratesont 9 monthly_emp_rates$MAN <- employment_ratesman empmonth <- ts(monthly_emp_rates, frequency = 12, start = c(2010,1)) dygraph(empmonth) %>% dyOptions(axisLineWidth = 2.0) QUE ONT MAN 0.7 0.6 0.5 2010 Looking at the first graph we have to say that the parallel trends assumption does not really hold. From 2010 to 2013 it doesn’t look bad but in the year 2013 to 2014 we can see that Manitoba follows a different trend also in the year 2012 and 2013 Manitoba follows a different trend. For Quebec we can say that parallel trends doesn’t hold as well, especially in the year before the treatment in 2017 we see a huge difference in trends. But if we switch to the more detailed second graph where we plotted monthly employment rates, we can observe that both states experience similar trends with Ontario. All 3 states experience huge spikes in employment rates, the first one in May, the second one in June and the third and last one in July. And they also experience downward spikes in employment in August and september. The other month do not really look promising, because sometimes we have some deviations but they are not that bad so we can actually say that parallel trends holds. Not perfectly but perfect parallel trends are basically impossible. We can see that the youth employment goes up significantly in May, June and July and then drops a little in August and goes back to the same level as in April in September. The spike in employment rate is caused by the summer break in Canada, which starts in June. There is also a Mid-winter break in canada but it varies, sometimes it is in March and sometimes it is in April but there are no significant changes in employment due to the mid winter break. The trend we observe is clearly positive the employment rate grows over the years slightly. c) Here i decided to use Quebec as a control group. I think Quebec can be a good control group, because we have seen previously in the monthly employment graph that both Quebec and Ontario followed a similar 10 trend in employment. I tried to match on specific characteristics like population: Quebec is the second biggest state while Ontario is the biggest one. Furthermore Ontario has 917741 square kilometres of land and Quebec has 1356128 square kilometres of land and most of its population lives in urban areas. I didn’t decide to go for another control group, because as an example Manitoba only has a population of around 1.3 Million, while still having a great amount of land. Manitoba has lots of landscapes and forests with less economic activity and heavily relies on agriculture. Also it is important to notice that Quebec also got a minimum wage increase in 2018, which can make it a bad control group, because we actually want a never treated control group but neither of the states fulfills this requirement. Manitobas minimum wage increase is minimal but in my opinion the characteristics are just not optimal, because as already said it mostly relies on agriculture and also tourism. As a covariate i included the edugrp, because it was correlated with the outcome and the regressor and is not a outcome of the model. Furthermore i decided to include the Familytype because it is also correlated with our outcome and the regressor. dv <- data %>% filter(year >=2017, year<=2018, province == 6 | province == 5, agegrp >= 1, agegrp<3) dv$yobs <- ifelse(dv$empstat<=2, 1, 0) dv$ontariodummy <- ifelse(dv$province == 6, 1, 0) dv$timedummy <- ifelse(dv$year == 2018,1,0) dv$month <- as.factor(dv$month) reg11 <- lm(yobs~ontariodummy*timedummy, data=dv) reg22 <- lm(yobs~ontariodummy*timedummy + month , data=dv) reg33 <- lm(yobs~ontariodummy*timedummy + month + edugrp + efamtype , data=dv) cov(dv$efamtype, dv$ontariodummy*dv$timedummy) ## [1] 0.08276152 coef <- c(reg11$coefficients[4], reg22$coefficients[15], reg33$coefficients[17]) mat <- matrix(coef) tmat <- t(mat) colnames(tmat) <- c("beta1", "beta2", "beta3") table(tmat) ## tmat ## -0.0456354130768801 -0.0456127081540193 -0.0420036810225033 ## 1 1 1 We can see that the there is a big difference when including the covariates, especially Family type, because its correlation was very high. After including the Family type we see a significant decline of our beta. Which may indicate a failure of parallel trends d) db <- data %>% filter(year>=2014, year<=2019, province == 6 | province == 5, agegrp == 1 | agegrp == 2) db$yobs <- ifelse(db$empstat<=2, 1, 0) db$ontariodummy <- ifelse(db$province == 6,1,0) db$dynamic1 <- ifelse(db$year == 2014, 1 ,0) db$dynamic2 <- ifelse(db$year == 2015, 1, 0) db$dynamic3 <- ifelse(db$year == 2016, 1, 0) db$dynamic4 <- ifelse(db$year == 2018, 1, 0) db$dynamic5 <- ifelse(db$year == 2019, 1, 0) db$month <- as.factor(db$month) 11 reg100 <- lm(yobs ~ ontariodummy*(dynamic1 + dynamic2 + dynamic3 + dynamic4+ dynamic5) + month, data=db) matdyn <- data.frame(matrix(ncol=0, nrow=6)) matdyn$year <- 2014:2019 reg100$coefficients ## (Intercept) ontariodummy dynamic1 ## 0.541158416 -0.038948549 -0.004995016 ## dynamic2 dynamic3 dynamic4 ## 0.012667454 0.010348675 0.046227427 ## dynamic5 month2 month3 ## 0.054705267 0.001172655 0.009290986 ## month4 month5 month6 ## 0.004596648 0.058852933 0.104667802 ## month7 month8 month9 ## 0.140261242 0.118475538 0.029928940 ## month10 month11 month12 ## 0.032917646 0.022443061 0.028307612 ## ontariodummy:dynamic1 ontariodummy:dynamic2 ontariodummy:dynamic3 ## -0.009550772 -0.035321280 -0.021708430 ## ontariodummy:dynamic4 ontariodummy:dynamic5 ## -0.045609421 -0.061449180 matdyn$beta <- c(coef(reg100)[19], coef(reg100)[20], coef(reg100)[21], 0, coef(reg100)[22], coef(reg100) matdyn$se <- c(coef(summary(reg100))[19 , 2], coef(summary(reg100))[20 , 2], coef(summary(reg100))[21 coef(summary(reg100))[23 , 2]) matdyn$conf <- qnorm(0.975)*matdyn$se matdyn ## ## ## ## ## ## ## 1 2 3 4 5 6 year 2014 2015 2016 2017 2018 2019 beta -0.009550772 -0.035321280 -0.021708430 0.000000000 -0.045609421 -0.061449180 se 0.01155037 0.01177449 0.01186172 0.00000000 0.01209383 0.01200575 conf 0.02263830 0.02307757 0.02324855 0.00000000 0.02370346 0.02353083 summary(reg100) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = yobs ~ ontariodummy * (dynamic1 + dynamic2 + dynamic3 + dynamic4 + dynamic5) + month, data = db) Residuals: Min 1Q -0.7361 -0.5305 Coefficients: (Intercept) ontariodummy dynamic1 dynamic2 dynamic3 Median 0.3542 3Q 0.4503 Max 0.5204 Estimate Std. Error t value Pr(>|t|) 0.541158 0.008591 62.990 < 2e-16 *** -0.038949 0.008439 -4.615 3.93e-06 *** -0.004995 0.009235 -0.541 0.588576 0.012667 0.009385 1.350 0.177090 0.010349 0.009451 1.095 0.273523 12 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## dynamic4 0.046227 0.009635 dynamic5 0.054705 0.009549 month2 0.001173 0.007955 month3 0.009291 0.007973 month4 0.004597 0.007947 month5 0.058853 0.007972 month6 0.104668 0.007968 month7 0.140261 0.007997 month8 0.118476 0.008012 month9 0.029929 0.007995 month10 0.032918 0.007998 month11 0.022443 0.007973 month12 0.028308 0.008013 ontariodummy:dynamic1 -0.009551 0.011550 ontariodummy:dynamic2 -0.035321 0.011774 ontariodummy:dynamic3 -0.021708 0.011862 ontariodummy:dynamic4 -0.045609 0.012094 ontariodummy:dynamic5 -0.061449 0.012006 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 4.798 5.729 0.147 1.165 0.578 7.383 13.136 17.539 14.787 3.743 4.116 2.815 3.533 -0.827 -3.000 -1.830 -3.771 -5.118 1.61e-06 1.01e-08 0.882804 0.243928 0.562990 1.56e-13 < 2e-16 < 2e-16 < 2e-16 0.000182 3.86e-05 0.004878 0.000411 0.408307 0.002702 0.067235 0.000163 3.09e-07 *** *** *** *** *** *** *** *** ** *** ** . *** *** 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4927 on 91018 degrees of freedom Multiple R-squared: 0.01399, Adjusted R-squared: 0.01376 F-statistic: 58.72 on 22 and 91018 DF, p-value: < 2.2e-16 ggplot(matdyn, aes(x = year, y = beta))+ geom_errorbar(aes(ymin = beta - conf, ymax = beta + conf), width = 0.2)+ geom_line(col="red")+ geom_point(col="blue")+ geom_vline(xintercept = 2017.5, linetype = "dotted") 13 0.00 beta −0.03 −0.06 2014 2015 2016 2017 2018 2019 year The policy impact seams reasonable. After such a big minimum wage increase we expect the employment rate to decline. In fact when ontario raised its minimum wage many disabled workers lost their job, many daycare centers closed as well, because costs for childcare increased by 10.6%, which lowered the demand for child care. Also businesses, which mostly employ low skilled labor had to shut down. 1 The parallel trend assumtion doesn’t hold here. The trend should be flat if we would assume parallel trends, this is clearly not the case. Although the confidence intervals in 2014 and 2016 include the 0 in the graph the beta in 2015 clearly doesn’t. In the second graph in B) it looked like the parallel trends assumption could hold, because the differences between the treatment and control group looked constant, both had spikes and reductions at the same time, they were also similar in magnitude. But is it really a failure of parallel trends or is it pre-emptive behaviour? We can see that when the policy was announced in June 2017 the employment rate decreased significantly, so there definitely is some pre-emptive behaviour involved from 2017 to 2018, which destroys our estimate. e) Now with monthly data we need a new parallel trends assumption. We could use the Conditional parallel trends assumption where we would assume parallel trends conditional on months, we would include fixed effects for months to account for the seasonality. f) We could use the formula from the lecture slides regarding No pre-emptive behaviour: E[Yit |Di = 1, t = t0 + j] = E[Yit (0)|Di = 1, t = t0 + j] g) We could split the sample and use the older population aged 24-60 and treat them as a control group. The minimum wage increase mostly influences the employment in young unskilled and seasonal workers. So the 1 Matthew Lau, Ontario’s Minimum Wage Hike Has Been Disastrous, Especially for Disabled Workers, 2018 14 increase in minimum wage shouldn’t affect the employment rates in those higher agegroups. The remaining sample can used to trace out the counterfactual, where at the end we would do an DiD estimation and we could estimate the Average Treatment Effect. h) I am not really convinced by the estimates in this exercise, because we have seen that parallel trends is questionable here, which leads to a biased estimator, but i think the outcome is still plausible and i provide some arguments for that. The minimum wage raise was very high and also the article i cited states that the employment for unskilled workers suffered heavily and businesses, which had mostly employed low skilled workers had to either quit or raise the prices. Even if minimum wage increases do not lead to any impact in the employment they still change the economy as seen in the article where the inflation for child care and housekeeping increased by 10.6%. Furthermore we have to question whether the finding in the studies regarding the minimum wage are really comparable with the outcomes in Ontario. The paper (Card and Krueger 1994) analyzes New Jersey minimum wage increase from 4.25 to 5.05, which is an 18.9% increase and find no significant employment effects in the fast food industry. The Ontario increase was higher, it was 20%. So do we think that the findings from Card and Krueger have external validity? Probably not, provinces in Canada differ significantly from New Jersey. So we can’t generalise the findings of the study to our situation in Ontario. Even if my estimate is kinda wrong, because of the questionable parallel trends assumption the real coefficient can still be negative. 2 Section E In this section i didn’t code the exercise, because of the wrong outputs that R generate. Therefore i provide only arguments here and i executed the code in Stata. I hope this is okay. In model 1 NFL receives a very high weight, which seems quite odd, because NFL is one of the smallest provinces in Canada with the important industries being financial services, oil and manufacturing companies. We are basically constructing a control group, which mostly consists of NFL, which is just from the economical intuition not appropriate. When inspecting the graph we can also see that it barely matches our treated group. In the second model we receive a weight of 1 on NB, which means that the whole control group is just NB, which also seems very unreasonable considering that it is similar to NFL, being one of the smallest states in Canada, where the primary sector is dominant. Concluding the second specification is bad, because we wouldn’t like to choose NB as a control group. In the third model we get a very interesting graph, because the trend seems from our synthetic control group seems to match the trend of Ontario. Also we can see the 3 weights on the states NFL, PEI and BC all very small states, which i would not use as a control group. But they seem to match over the covariates well. I still decided to stay with my control group as being Quebec, the graphs didn’ really match well with the trend of Ontario, so it didn’t convince me to switch my control group. But in general it can be a good idea to try and compute a synthetic control group and then use the weights to determine which state really matches the outcome or the covariates of ontario the best. You can then choose the control group in the DID setting. The Synthetic control definitely should be used when there are problems with the parallel trends assumption. It generally always creates a better pre-trend by weighting multiple units. While the DiD estimate would be biased in this scenario, because the parallel trends assumption doesn’t hold. 2 Card and Krueger, 1994 15