Lecture 12: The classical normal linear model BUEC 333 Professor David Jacks 1 Previously, we explored the six assumptions of linear regression which define the classical linear regression model (CLRM). We also learned that when those six assumptions are satisfied, then the least squares estimator is BLUE (Best Linear Unbiased Estimator). This explains why OLS serves as the baseline estimator in so many applications in economics Recap on the classical model 2 Now, we will add one more assumption to the CLRM, namely that the error term is normally distributed…gives us the CNLRM. Also gives us an exact sampling distribution for the least squares estimator (reflecting that a different sample would yield different values). Once we have a sampling distribution for our estimator, we can develop testable hypotheses Recap on the classical model 3 Seventh assumption: error terms are independent and identically distributed as normal RVs with mean 0 and variance σ2, or εi ~ N(0,σ2). Actually, a pretty strong assumption. Previously, we assumed that the errors have mean zero (A2), are uncorrelated across observations (A5), and have a constant variance (A6). Assumption 7: Normal errors 4 But A7 gives us more than that; also specifies the complete probability distribution of the errors not just their mean and variance (due to normality). NB: do not need normality to use OLS; from the GMT, OLS is BLUE even without it. But when errors are normally distributed, an even stronger result: OLS is BUE. Assumption 7: Normal errors 5 The sampling distribution of OLS estimator: reflects distribution of the values of ˆ j that we would observe if we drew many different samples from the same population and estimated the regression model on each sample. OLS estimators are unbiased, so we already know the mean of their sampling distribution: the (true) population values of the coefficients… The sampling distribution of the OLS estimator 6 Suppose we have the following simple linear regression model, Yi = β0 + β1Xi + εi where εi ~ N(0,σ2). We also know the least squares estimators are given as follows: ˆ 1 X X Y Y X X i i i 2 i and βˆ0 Y ˆ1 X i Now, need to think about what the normality The sampling distribution of the OLS estimator 7 Because εi ~ N(0,σ2) and Yi is a linear function of εi, A7 implies that Yi is normally distributed too. From Lecture 5: a linear function of a normally distributed random variable is itself normally distributed. Likewise, Y-bar is also normally distributed The sampling distribution of the OLS estimator 8 Notice what this will imply about OLS estimates: these too will be normally distributed! That is, ˆ0 ~ N 0 ,Var ˆ0 and ˆ1 ~ N 1 ,Var ˆ1 Why? Refer to slide 41 in lecture 11… As it turns out, this a pretty powerful result: if errors and, thus, OLS estimators are normally distributed The sampling distribution of the OLS estimator 9 So, knowing the sampling distribution of the OLS estimators means we can test hypotheses about the population betas themselves without ever knowing their “true” value. NB: this result generalizes for any number of independent variables. What normal errors buys us 10 It can be shown that the sampling variance of the slope coefficient in the regression model with one independent variable is Var ˆ1 = X i i 2 X 2 Intuition: when we have more variation in ε, the OLS estimate is less precise. The sampling variance of the OLS estimator 11 What about the opposite? The sampling variance of the OLS estimator 12 Upon closer inspection, the sampling variance of the OLS estimator contains an unknown, the variance of the unobserved error term or σ2. So, how do we get around this? As it turns out, an unbiased estimator of σ2 is s 2 e 2 i i n k 1 The sampling variance of the OLS estimator 13 With substitution, we get an unbiased estimator of the variance of the OLS estimator: ˆ ˆ Var 1 s2 X i i X 2 The square root of this estimator is called the standard error of beta-hat (standard because of the square root; error because…). The sampling variance of the OLS estimator 14 Bringing it all together: e / n k 1 ˆ ˆ Var 2 1 i i X i i X 2 Naturally, we prefer smaller over larger values. Intuitively, this could entail: 1.) a smaller numerator (a higher value of n or lower values of k or ei ) The sampling variance of the OLS estimator 15 Previously stated that when errors are normally distributed, the sampling distribution of the OLS estimator is also normal. But what if the errors are not normally distributed? As long as the errors are “well behaved”, then we can rely on the Central Limit Theorem. Implications from the CLT 16 Main implications from the CLT: as the sample size gets larger (as n → ∞), the sampling distribution of the least squares estimator is well approximated by a normal distribution. So even if the errors are not normal, the sampling distribution of the beta-hats is approximately normal in large samples. Implications from the CLT 17 CLT: the sum (and hence, the mean) of a number of independent, identically distributed random variables will tend to be normally distributed, regardless of their underlying distribution, if the number of different RVs is large enough. Consider (yet again) the case of a six-sided dice: There are 6 possible outcomes {1, 2, 3, 4, 5, 6} Each with an associated probability of 1/6 The pdf looks like the following… Demonstrations of the CLT 18 Demonstrations of the CLT 19 Simulation: using a computer to “throw the dice” many times (N). We can then look at the sampling distribution of the average and consider what happens as N increases. Demonstrations of the CLT 20 Run it one time: Run it another time: OK, one more time: Demonstrations of the CLT 21 Give me a billion! Now let’s plot the histogram… Demonstrations of the CLT 22 We know the population mean is equal to 3.5… so pretty close, but how can we get closer? Demonstrations of the CLT 23 For N = 100… Demonstrations of the CLT 24 For N = 1000… Demonstrations of the CLT 25 In fact, for a large enough (infinite) sample, the histogram is the sampling distribution. So what does this have to do with OLS? Well, it would be nice if our beta-hats were normally distributed as is the case when: 1.) the error term is normally distributed (A7) or Demonstrations of the CLT 26 Just as in the case of the dice, we can do a simulation for OLS estimates. That is, drawn a random sample of 100 observations on Y, X, and ɛ (which are not normally distributed). The underlying data generating process for the two variables is linear where β1 equals 1.02. Demonstrations of the CLT 27 In the first sample, Demonstrations of the CLT 28 In the second sample, Demonstrations of the CLT 29 In the third sample, Demonstrations of the CLT 30 Demonstrations of the CLT 31 Demonstrations of the CLT 32 For the purposes of hypothesis testing, turns out that there are precisely two ways to “skin a cat”: 1.) Assume population errors are normally distributed, or 2.) Invoke the Central Limit Theorem Both result in (approximately) normally distributed beta-hats Conclusion 33