Marinare Nasser & Aida Tafesse Belachew Assignment 1 - Econometrics Our results from Stata: Marinare Nasser & Aida Tafesse Belachew Questions on Summary statistics: 1. Create a Summary statistic table including variables “classize”, “avgmath”, “avgverb”, and “disadvantaged”. Describe the feature of the variables, e.g., the center and the variability. After creating the Summary statistic table we were able to calculate the following: Median: (67,29599 + 74,37997) / 2 = 70,83798 Mean value: 29,92868 + 67,29599 + 74,37997 + 14,11639 = 185,72103 → 185,72103 / 4 = 46,4302575 Variability: classize 44 - 5 = 39 avgmath 93,93 - 27,69 = 66,24 avgverb 93,86 - 34,8 = 59,06 disadvantaged 76 - 0 = 76 Marinare Nasser & Aida Tafesse Belachew The variability is for describing the values distribution. Our variability is not extremely high, which means the data points are more similar and we have no extreme values. Since the variability isn't that big it is easier to make predictions and assumptions about the data. With classize having the least spread while disadvantage has the most. The standard deviation is used to describe how dispersed the data is in comparison to the mean. The higher the standard deviation is, the more it’s spread out, and if the standard deviation is lower the closer to the mean it gets. When comparing our standard deviation results for classize, avgmath, and avgverb with their respective mean results, we see that the standard deviation results are not close to the respective means which indicates a high standard deviation. 2. Using the same four variables in question 1, find the correlation for all pairs of variables. Give a short analysis of the correlation (e.g., choose three or four pairs you think are interesting). The disadvantage has a negative correlation with all the variables meaning it would have a negative slope on a graph with all the variables but significantly with avgverb at -0,6052. Students that have a disadvantaged background are shown to have a negative effect on the correlation. If you have that background you are more likely to have an inferior score in avgmath and avgverb. Thus the correlation between disadvantage and avgmath is only low to moderate; it is still negative. Avgmath and avgverb show a significant positive correlation at 0,7807 close to 1,0000. A graph would show an upward slope and the variables move in the same direction. The correlation signifies that a student who is good in math will also be good with their reading skills. In a way, it states that a student usually is good in more subjects if they are good in one. Questions on the Linear regression model: 1. Run a linear regression model with OLS estimation. Test heteroskedasticity for this model. Explain whether there is a heteroskedasticity problem or not. The result when you test for heteroskedasticity is a graph with an unequal spread of the residuals. it will have more of a cone shape with less spread in the beginning and spreading more the further out you get. This is caused by the variability being unequal/non-constant. H0: α1 = α2=0 NR2 = the number of observations x r2 NR2 = 2,019 x 0,1628 = 0,3286932 Degrees of freedom (df) = 4 - 1 = 3 → Table B.7 Critical values of Chi-square with a level of significance at 5% and degrees of freedom at 3 we get 7,815. NR2 (0,3286932) < 7,815 and we reject the null hypothesis Marinare Nasser & Aida Tafesse Belachew 2. Re-estimate the previous model with robust standard errors. Interpret the estimated coefficients and the statistical significance for “classize”, “disadvantaged”, and “religious”, respectively. In addition, interpret �2. βs are seen to be unchanged but the standard errors are slightly higher but only by a small fraction. T-value is almost the same but slightly lower. When we re-estimate the model we get new values. We can see that when the other coefficients are held constant the disadvantage coefficient has a negative value of 0.1149929. This also tells us that if classize increases disadvantages will have a negative impact. We can also say the same about religion having a significant negative value of 3.484334. R2 has a value between 0 and 1, with a higher value closer to 1 meaning the closer our estimated regression equation fits the data. Our results of R2 (0,1628) is on the lower end of values and could be a sign that our estimated regression equation doesn't fit the data. Our R2 with a percentage of 16%, the data points would tend to fall further from the regression line since the value isn't as high or close to 1.