CHAPTER 3 THE SIMPLE REGRESSION MODEL 1. 2. 3. 4. 5. The Relationship Between Two Variables—Deterministic versus Stochastic Relationship 1.1. Stochastic Relationship Between Dependent and Independent Variables. 1.2. Regression Line as a Locus of Mean Values of y for Different Values of x 1.3. The Relationship Between the Variance of y and Variance of the Disturbance Term u 1.4. Summary of Assumptions About the Regression Model The Estimated Regression Equation and Regression Line 2.1. The Least Squares Method of Obtaining the Coefficients of Estimated Regression Equation 2.1.1. Alternative Expression for the π2 formula Statistical Properties of Least Squares Estimators π1 and π2 —The Gauss-Markov Theorem 3.1. Sampling Distributions of Coefficients of Regression Equation 3.2. Coefficients π1 and π2 are the Best Linear Unbiased Estimators 3.2.1. π1 and π2 are linear functions of y. 3.2.2. π1 and π2 are unbiased estimators of population parameters β1 and β2 3.2.2.1. E(π2 ) = β2 3.2.2.2. E(π1 ) = β1 3.2.3. As Estimators of Parameters π½1 and π½2 , π1 and π2 Have the Minimum Variance 3.2.3.1. Variance of π2 3.2.3.2. Covariance of b1 and b2 3.2.4. The Covariance Matrix 3.2.5. π2 is the Best Linear Unbiased Estimator of π½2 The Estimator of the Variance of the Prediction Error 4.1. var(π) is an unbiased estimator of σ2π’ 4.2. The Standard Error of Estimate Nonlinear Relationships 5.1. Quadratic Model 5.2. Log-Linear Model 5.3. Regression with Indicator (Dummy) Variables 1. The Relationship Between Two Variables—Deterministic versus Stochastic Relationship In discussing the relationship between two variables π₯ and π¦ we observed that the two variables can be independent. That is the changes in π₯ have no influence on the variations in π¦. Although theoretically significant, in practice, two independent variables are of little interest to us. In most economic studies (and studies in other disciplines) we are interested in the relationship between two or more variables. For example, how does quantity demanded of gasoline respond to changes in its price? How is consumption expenditure affected by the variations in household income? For a more down-to-earth example, consider the variations in students’ test scores on the statistics final tests. What do you think is the most important factor affecting the variations? How about the number of hours studied for the final? We can express the relationship between test scores and study hours as a simple linear equation as follows. Let π₯ be the study hours and π¦ be the test scores. π¦ = β1 + β2 π₯ The variable π₯ is the explanatory (independent or control) variable and π¦ is the explained (dependent or response) variable. β1 and β2 are the parameters of the equation. The coefficient of π₯, β2 , is the slope 3-Simple Regression Model 1 of 31 parameter. It indicates the change in scores per unit change in study time (increase in test score per each additional hour studied). The intercept β1 shows the score for zero hour of study, when the student purely guesses the answers for multiple choice questions. Statistically, the probability that the student guesses all questions incorrectly is very low.1 Therefore, a zero vertical intercept in this model is very unlikely. To illustrate the relationship further, let β1 = 20 and β2 = 8. Then, π¦ = 20 + 8π₯ Thus, according to this equation, if a student studies 5 hours, the test score would be 60. Graphically, this relationship is shown as a straight line. The graph also shows the π¦ value when π₯ = 3, 5, and 7. y y = 20 + 8x 76 60 44 20 0 3 5 7 x 1.1. Stochastic Relationship Between Dependent and Independent Variables. Although intuitively the above depiction of the relationship between scores and study hours makes sense— that is, the more one studies, the better the score—the model is oversimplified and unrealistic. The relationship shown here is purely deterministic. It implies that all students who study, say, 5 hours will score 60. We all know that rarely students who study the same amount and with the same intensity have identical scores. In reality one may observe different scores for the same number of study hours. There are other unobserved or unobservable factors, such as unmeasurable individual attributes, that may affect the individual test scores. We can summarize these unobserved factors by the variable π’ and incorporate it in the model. π¦ = β1 + β2 π₯ + π’ The variable π’ is called the disturbance term. When it is incorporated in the model, the relationship between π₯ and π¦ changes from a deterministic to a statistical (or stochastic) relationship. The change to a stochastic relationship between π₯ and π¦ implies that the variations in the dependent variable π¦ is not totally explained by the independent variable π₯. The disturbance term, the random variable π’, affects the value of π¦ also. Thus, the values of π¦ are also randomly determined. If π’ takes on randomly determined values, so does π¦. The following table shows how the value of the dependent variable π¦ (test scores) varies with u for a given level of π₯ (study hours). If there are 25 questions each with 5 choices, the expected number of correct guesses is 25 × 0.2 = 5. From the scale of 100, the score would be 5 × 4 = 20. 1 3-Simple Regression Model 2 of 31 Different Values of y for a Given Value of π₯ When the Relationship is π¦ = 20 + 8π₯ + π’ Value of π₯ 5 5 5 5 5 5 5 5 5 5 Non Random Component 20 + 8π₯ 60 60 60 60 60 60 60 60 60 60 Random Component π’ -12 4 -20 12 16 -12 8 4 20 16 Value of π¦ 48 64 40 72 76 48 68 64 80 76 We can repeat these calculations assigning other values for π₯. The resulting π¦ values can all be split between the non-random component β1 + β2 π₯ and the random component π’. For each value of π₯ there are many different values of π¦. The following diagram shows the various values of π¦, test scores, for the three different values of π₯, the hours of study. y y = 20 + 8x 76 60 44 0 3 5 7 x Each set of y values for a given π₯, expressed as π¦|π₯π , has its own independent probability distribution with the density function π(π¦|π₯π ). Three of such distributions are shown below, for π¦|π₯ = 3, π¦|π₯ = 5, and π¦|π₯ = 7. 3-Simple Regression Model 3 of 31 This is the two-dimensional depiction of the three density functions. A three-dimensional depiction is presented in the diagram below. y 60 44 76 3 5 7 x 1.2. Regression Line as a Locus of Mean Values of y for Different Values of x This diagram shows clearly that for each value of π₯ (hours of study) there are many different π¦ values (test scores). The test scores for each value of π₯ are normally distributed. The mean, the expected value, of each distribution is the nonrandom component of π¦: E(π¦|π₯ = 3) = 20 + 8(3) = 44 E(π¦|π₯ = 5) = 20 + 8(5) = 60 E(π¦|π₯ = 6) = 20 + 8(7) = 76 Thus, even though there are different test scores for each individual value of π₯, the expected or average score is uniquely determined by the slope and intercept parameters alone. For this to hold, the expected value of the disturbance terms π’π must be zero. E(π¦|π₯π ) = E(π½1 + π½2 π₯π + π’π ) 3-Simple Regression Model 4 of 31 E(π¦|π₯π ) = E(π½1 + π½2 π₯π ) + E(π’π ) E(π¦|π₯π ) = π½1 + π½2 π₯π + E(π’π ). 2 Thus, E(π¦|π₯π ) = π½1 + π½2 π₯π only if E(π’π ) = 0. The regression line π¦ = π½1 + π½2 π₯ is, therefore, the locus of all the mean values of y for different values of x. 1.3. The Relationship Between the Variance of y and Variance of the Disturbance Term u Note that for each value of π₯ the values of the dependent variable π¦ are dispersed around the center of gravity, which is the mean value of π¦|π₯. Thus, each π¦ value consists of a fixed component, µπ¦|π₯ , and a random component π’: π¦|π₯π = µπ¦|π₯π + π’ Taking the variance of both sides, we have var(π¦|π₯π ) = var(µπ¦|π₯π + π’π ) = var(π’π ) ≡ σ2π’ Since µπ¦|π₯π is non-random, then var(µπ¦|π₯π ) = 0. The diagram also shows that the density functions are similarly shaped. This implies that regardless of the value of π₯, the π¦ values for each π₯ are similarly dispersed about the mean. This is the “equal-variance” or homoscedasticity condition: var(π¦|π₯1 ) = var(π¦|π₯2 ) = β― = var(π¦|π₯π ) = var(π’1 ) = var(π’2 ) = β― = var(π’π ) = σ2π’ Another feature of the model is that the disturbance term π’ for each π₯ are independent. In the test-scorehours-of-study model, this implies that the variations in test scores when, say, π₯ = 5, are not affected by the score variations when π₯ = 4 or any other π₯ value. Thus, the covariance of any two disturbance terms π’π and π’π for each given π₯ is zero. cov(π’π , π’π ) = E[(π’π − 0)(π’π − 0)] = E(π’π π’π ) = 0 1.4. Summary of Assumptions About the Regression Model The various assumptions regarding the regression model are summarized as follows: 1. The regression line is the locus of the mean values of π¦ for each given value of π₯. The random component of y is the disturbance term π’. The expected value of π’π is zero. π¦ = β1 + β2 π₯π + π’π Note that here the π₯π are not treated as random variables. The π₯π are given, and the objective is to see how the π¦ values respond to the given values of π₯π . This is why E(π½1 + π½2 π₯π ) = π½1 + π½2 π₯π . 2 3-Simple Regression Model 5 of 31 E(π¦|π₯π ) = β1 + β2 π₯π πΈ(π’π ) = 0 2. Since π’ is the random component of π¦, then the variance of π’ and π¦ are the same. Furthermore, per homoscedasticity assumption, the variance of π’ remains the same for all values of π₯. var(π¦|π₯π ) = var(µπ¦|π₯π + π’π ) = σ2π’ var(π¦|π₯1 ) = var(π¦|π₯2 ) = β― = var(π¦|π₯π ) = var(π’1 ) = var(π’2 ) = β― = var(π’π ) = σ2π’ var(π¦) = var(π’) or σ2π¦ = σ2π’ 2. The variations of π’ for a given value of π₯ do not affect the variations of π’ for any other value of π₯. That is, all π’π are independent random variables, making the covariance zero. cov(π’π , π’π ) = E(π’π − 0)(π’π − 0) = E(π’π π’π ) = 0 3. The Estimated Regression Equation and Regression Line The above explanation of the relationship between π₯ and π¦ is based on the assumption that we have access to all population data. It is the theoretical framework for regression. In practice, the population data is rarely available. We must therefore resort to sampling. Using the sample data we can then determine an estimate of the population regression equation. The estimated regression equation is: π¦ = π1 + π2 π₯ + π Here π¦ represents the estimator of a π πππππ value of π¦ for a given π₯ value. The coefficients π1 and π2 are the estimators of the parameters β1 and β2 , and π is the estimator of π’. This equation is not the equation for the estimated regression line. The regression line is represented by: π¦Μ = π1 + π2 π₯ where π¦Μ (π¦-hat) is the estimator of the mean value of π¦ for each π₯ in the population. In a sample regression, π¦ is the observed value and π¦Μ is the predicted value for a given π₯. The difference between the observed and predicted value is called the prediction error or the residual: π¦ − π¦Μ = π. To explain the determination of the sample regression line, suppose a random sample of 10 students is selected and the following data regarding hours studied and the test score are obtained. 3-Simple Regression Model 6 of 31 Score π¦ 52 56 56 72 72 80 88 92 96 100 Hours Studied π₯ 2.5 1.0 3.5 3.0 4.5 6.0 5.0 4.0 5.5 7.0 To provide a preliminary indication of the relationship between π₯ and π¦, we plot the data as a scatter diagram: 100 90 Test score (y) 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 Hours studied (x) Now we need a method to fit the estimated regression line to the scatter diagram. Using the visual method one may draw any number of lines through the scatter diagram. They would all represent reasonably good fits. The question is which one is the best fit? We are therefore interested in the best fitting estimated regression line. 3.1. The Least Squares Method of Obtaining the Coefficients of Estimated Regression Equation The mathematical approach to find the best fitting line to the scatter diagram is the Least Squares method. The estimated regression line determined through the least squares method is the best fitting line because it minimizes the sum of squared deviations of each observed (scattered) π¦ from the corresponding point on the fitted line for each π₯. In the diagram below three such deviations are shown. Since each point on the regression line corresponding to a given π₯ is denoted by π¦Μ, then the deviation (residual) between the observed π¦ and π¦Μ is: π = π¦ − π¦Μ 3-Simple Regression Model 7 of 31 Test score (y) Hours studied (x) The general form of the equation for the estimated regression line is π¦Μ = π1 + π2 π₯ We need to find the values for the coefficients π1 and π2 , the intercept and slope coefficients, in order to draw a line such that the sum of squared residuals, ∑ππ2 = ∑(π¦π − π¦Μπ )2 is minimized. The least squares method involves minimizing this sum of squared residuals. The following process involves a mathematical operation called partial differentiation. First rewrite the sum of squared deviations by substituting for π¦Μ so that π1 and π2 are explicitly stated. ∑ππ2 = ∑(π¦ − π1 − π2 π₯)2 Find the partial derivative once with respect to π1 and then with respect to π2 and set the results equal to zero. (In calculus this is how the minimum or maximum value of a function is obtained—by setting the first derivative equal to zero.) The two resulting equations are called the normal equations. (Not to be confused with normal distribution.) π∑π 2 ⁄ππ1 = −2∑(π¦ − π1 − π2 π₯) = 0 3 π∑π 2 ⁄ππ2 = −2∑π₯(π¦ − π1 − π2 π₯) = 0 Using the two normal equations we solve for the two unknowns π1 and π2 . Using the properties of summation, we can write the normal equations as: ∑π¦ − ππ1 − π2 ∑π₯ = 0 When taking the partial derivative with respect to π1 , we are treating π1 as the variable and the remaining terms as constants. Let π¦ − π2 π₯ ≡ π, then ∑(π¦ − π1 − π2 π₯)2 = ∑(π − π1 )2 = ∑(π 2 − 2ππ1 + π12 ) Now take the derivative with respect to b1: ∑(−2π + 2π1 ) = −∑2(π − π1 ) = −∑2(π¦ − π1 − π2 π₯) 3 3-Simple Regression Model 8 of 31 ∑π₯π¦ − π1 ∑π₯ − π2 ∑π₯ 2 = 0 Since π1 and π2 are the “unknowns” or the variables, we can represent the equation system as: (∑)ππ1 + π₯(∑π₯)π2 = ∑π¦ (∑π₯)π1 + (∑π₯ 2 )π2 = ∑π₯π¦ We can solve for π1 and π2 two ways. Using the matrix notation, the equation system is written as πΏπ = π, where πΏ=[ π ∑π₯ ∑π₯ ] ∑π₯ 2 “Coefficient” matrix π π = [ 1] π2 “Variable” matrix ∑π¦ π=[ ] ∑π₯π¦ “Constant” matrix Thus, [ π ∑π₯ ∑π₯ π1 ∑π¦ ][ ] = [ ] ∑π₯π¦ ∑π₯ 2 π2 The solutions for π1 and π2 can be found using Cramer’s rule. First we find the solution for π2 . π ∑π¦ | ∑π₯ ∑π₯π¦ π2 = π ∑π₯ | | ∑π₯ ∑π₯ 2 | π2 = π ∑ π₯π¦ − ∑ π₯ ∑ π¦ π ∑ π₯ 2 − (∑ π₯)2 π2 = π ∑ π₯π¦ − (ππ₯Μ )(ππ¦Μ ) π ∑ π₯ 2 − (ππ₯Μ )2 Dividing the numerator and the denominator by π, we have, π2 = ∑ π₯π¦ − ππ₯Μ π¦Μ ∑ π₯ 2 − ππ₯Μ 2 Now for π1 , ∑π¦ ∑ π₯ | ∑π₯π¦ ∑π₯ 2 π1 = π ∑π₯ | | ∑π₯ ∑π₯ 2 | π2 = ∑ π¦ ∑ π₯ 2 − ∑ π₯ ∑ π₯π¦ π ∑ π₯ 2 − (∑ π₯)2 3-Simple Regression Model 9 of 31 Dividing the numerator and the denominator by π, we have, π1 = π¦Μ ∑ π₯ 2 − π₯Μ ∑ π₯π¦ ∑ π₯ 2 − ππ₯Μ 2 Now add ±ππ₯Μ 2 π¦Μ to the numerator. π1 = π¦Μ ∑ π₯ 2 − ππ₯Μ 2 π¦Μ − π₯Μ ∑ π₯π¦ + ππ₯Μ 2 π¦Μ ∑ π₯ 2 − ππ₯Μ 2 π1 = π¦Μ (∑ π₯ 2 − ππ₯Μ 2 ) (∑ π₯π¦ − ππ₯Μ π¦Μ )π₯Μ − ∑ π₯ 2 − ππ₯Μ 2 ∑ π₯ 2 − ππ₯Μ 2 π1 = π¦Μ − π2 π₯Μ The solutions for π1 and π2 can also found by finding the inverse of the coefficient matrix and the postmultiplying the inverse by the constant matrix. π = π −1 π To find πΏ−π, first find the determinant of πΏ. |π| = π∑π₯ 2 − (∑π₯)2 = π∑π₯ 2 − (ππ₯Μ )2 |π| = π(∑π₯ 2 − ππ₯Μ 2 ) = π∑(π₯ − π₯Μ )2 Next find the Cofactor matrix, [πΆ] = [ ∑π₯ 2 −∑π₯ −∑π₯ ] π Since the square matrix is symmetric about the principal diagonal, the Adjoint matrix, which is the transpose of the cofactor matrix, is the same as [πΆ]. The inverse matrix πΏ−π is then, π −1 = π −1 1 ∑π₯ 2 [ |π| −∑π₯ −∑π₯ ] π ∑π₯ 2 π∑(π₯ − π₯Μ )2 = π₯Μ − [ ∑(π₯ − π₯Μ )2 π₯Μ ∑(π₯ − π₯Μ )2 1 ∑(π₯ − π₯Μ )2 ] − It appears that finding the inverse matrix to solve for π1 and π2 is too complicated. However, this approach is far more practical when applied in multiple regression, where the same pattern is used to solve for the coefficients of the regression function. 1.1.1. Alternative Expression for the ππ formula In chapter 1 it was shown that the numerator of the sample variance formula, the sum of square deviations, can be written as: ∑(π₯ − π₯Μ )2 = ∑π₯ 2 − ππ₯Μ 2 3-Simple Regression Model 10 of 31 And the numerator of the covariance of π₯ and π¦ as: ∑(π₯ − π₯Μ )(π¦ − π¦Μ ) = ∑π₯π¦ − ππ₯Μ π¦Μ Thus the formula to compute π2 can be written either as π2 = ∑π₯π¦ − ππ₯Μ π¦Μ ∑π₯ 2 − ππ₯Μ 2 or, π2 = ∑(π₯ − π₯Μ )(π¦ − π¦Μ ) ∑(π₯ − π₯Μ )2 The latter expression will be used in some subsequent proofs. Now we can compute the estimated regression line for test scores and hours studied: π2 = π¦ 52 56 56 72 72 80 88 92 96 100 π₯ 2.5 1.0 3.5 3.0 4.5 6.0 5.0 4.0 5.5 7.0 π₯π¦ 130 56 196 216 324 480 440 368 528 700 π₯2 6.25 1.00 12.25 9.00 20.25 36.00 25.00 16.00 30.25 49.00 π¦Μ = 76.4 π₯Μ = 4.2 ∑π₯π¦ = 3438 ∑π₯ 2 = 205.00 3438 − (10)(4.2)(76.4) = 8.01399 205 − (10)(4.2)2 π1 = 76.4 − (8.01399)(4.2) = 42.74126 The estimated regression equation is then, π¦Μ = 42.74126 − 8.01399π₯ Using this estimated regression line now we can predict the mean score for a given number of hours of study. Let π₯ = 3. π¦Μ = 42.74126 − 8.01399(3) = 66.78 The following table shows all the predicted values and the deviations. It also shows the computation of the sum of squared residuals: 3-Simple Regression Model 11 of 31 π₯ 2.5 1.0 3.5 3.0 4.5 6.0 5.0 4.0 5.5 7.0 π¦ 52 56 56 72 72 80 88 92 96 100 π¦Μ 62.78 50.76 70.79 66.78 78.80 90.83 82.81 74.80 86.82 98.84 π = π¦ − π¦Μ -10.78 5.24 -14.79 5.22 -6.80 -10.83 5.19 17.20 9.18 1.16 ∑π = ∑(π¦ − π¦) = 0.00 π 2 = (π¦ − π¦Μ)2 116.13 27.51 218.75 27.21 46.30 117.18 26.92 295.94 84.31 1.35 ∑π 2 = ∑(π¦ − π¦)2 = 961.59 The table also shows that ∑π = ∑(π¦ − π¦Μ) = 0. That is, the sum of residuals is zero. The mathematical proof follows: ∑(π¦ − π¦Μ) = ∑(π¦ − π1 − π2 π₯ ) ∑(π¦ − π¦Μ) = ∑π¦ − ππ1 − π2 ∑π₯ ∑(π¦ − π¦Μ) = π(π¦Μ − π1 − π2 π₯Μ ) Substituting for π1 we have, ∑(π¦ − π¦Μ) = π(π¦Μ − π¦Μ + π2 π₯Μ − π2 π₯Μ ) = 0 The value ∑π 2 = ∑(π¦ − π¦)2 = 961.59 indicates that any other line fitted to the scatter diagram would yield a sum of squared residuals that would be greater than 961.59. This is our least squares value. 4. Statistical Properties of Least Squares Estimators ππ and ππ —The Gauss-Markov Theorem 4.1. Sampling Distributions of Coefficients of Regression Equation To determine the regression equation we compute the coefficient of the regression π1 and π2 from a randomly selected sample. Therefore, π1 and π2 are summary characteristics obtained from sample data, and, as such, each is sample statistic, functioning as estimators of the population parameters π½1 and π½2 , the population intercept and slope coefficients. Being sample statistics, π1 and π2 have the same features as the sample statistic π₯Μ , the sample mean. Take π2 , for example. Since there are infinite number of possible samples of size π out there, then the number of π2 estimate is also infinite. Given certain requirements, explained below, the sampling distribution of π2 is normal with a center of gravity of E(π2 ) = π½2 and a measure of dispersion of se(π2 ). The following diagram shows the comparison, the similarities, between the sampling distribution of π₯Μ and the sampling distribution of π2 . 3-Simple Regression Model 12 of 31 Sampling Distribution of π₯Μ Sampling Distribution of π2 4.2. Coefficients ππ and ππ are the Best Linear Unbiased Estimators A well-known term in regression analysis is that π1 and π2 are BLUE. They are the Best Linear Unbiased Estimators. This is why these estimators are preferred to estimators that may be obtained through other methods. The mathematical proof of each of these statistical properties is shown below. 4.2.1. ππ and ππ are linear functions of y. The significance of the linear relationship between the coefficients and the dependent variable y will become clear when we conduct statistical inference with respect to the population parameters β1 and β2 , using the sample statistics π1 and π2 , respectively. As sample statistics used as estimators of population parameters, the distributions of π1 and π2 must be normal for inferential statistics purposes. If we show that π1 and π2 are linear functions of π¦, given that π¦ is normally distributed, then π1 and π2 are also normally distributed. But first, what do we mean when we say the relationship between any two variables is linear? Generally, consider any two variables π₯ and π¦. A linear relationship between π¦ and π₯ is expressed as: π¦ = π + ππ₯ The linearity of the relationship is established by the fact that the coefficients π and π are constants and the exponent of π₯ is 1. For a given values of π (the intercept) and π (the slope), the relationship between π₯ and π¦ is reflected in a straight (non-curved) curve. Thus when we say that the functional relationship in the regression model between π1 and π¦, and between π2 and π¦ is linear, we need to show that this relationship can be expressed as π1 = π1 π¦ and π2 = π2 π¦ where π1 and π2 are two constants relating π¦ to π1 and π2 , respectively. First, let’s show that π2 is a linear function of π¦. Using the alternative expression of the formula to compute the π2 , π2 = ∑(π₯ − π₯Μ )(π¦ − π¦Μ ) ∑(π₯ − π₯Μ )2 π2 = ∑(π₯ − π₯Μ )π¦ − π¦Μ ∑(π₯ − π₯Μ ) ∑(π₯ − π₯Μ )2 Since ∑(π₯ − π₯Μ ) = 0, then the right-hand-side is simplified into 3-Simple Regression Model 13 of 31 π2 = ∑(π₯ − π₯Μ )π¦ ∑(π₯ − π₯Μ )2 To make presentation of the proof simpler, define π€= π₯ − π₯Μ ∑(π₯ − π₯Μ )2 Then, π2 = ∑(π₯ − π₯Μ )π¦ = ∑π€π¦ ∑(π₯ − π₯Μ )2 π2 = π€1 π¦2 + π€2 π¦2 + β― π€π π¦π Thus π2 is a linear function (linear combination) of π¦π because π€π are fixed constants in repeated sampling. 4 The proof that π1 is also a linear function π¦ is simple. Note that π1 = π¦Μ − π2 π₯Μ Since π¦Μ and π₯Μ are fixed for each sample, then π1 is a linear function of π2 , which makes it, in turn, a linear function of π¦. 4.2.2. ππ and ππ are unbiased estimators of population parameters ππ and ππ Any sample statistic is an unbiased estimator of a population parameter if the expected value of the sample statistic is equal to the population parameter. Thus, we need to show that: E(π2 ) = β2 and E(π1 ) = β1 . 4.2.2.1. π(ππ ) = ππ We want to show that E(π2 ) = β2 . In all the mathematical proofs regarding expected values in regression you should keep in mind that all the terms involving π₯ are treated as non-random, since π₯ is a non-stochastic (control) variable. Thus the expected value of π₯ or any term involving π₯ is the π₯ or the term itself. That is, E(π₯) = π₯ or E[∑(π₯ − π₯Μ )2 ] = ∑(π₯ − π₯Μ )2 . We just showed that, π2 = ∑(π₯ − π₯Μ )π¦ = ∑π€π¦ ∑(π₯ − π₯Μ )2 Therefore, E(π2 ) = E(∑π€π¦) 4 Each π€π is a function of π₯π and π₯ is a control variable, whose values are assigned rather than randomly obtained. 3-Simple Regression Model 14 of 31 E(π2 ) = E(π€1 π¦2 + π€2 π¦2 + β― π€π π¦π ) E(π2 ) = E(π€1 π¦2 ) + E(π€2 π¦2 ) + β― E(π€π π¦π ) E(π2 ) = π€1 E(π¦2 ) + π€2 E(π¦2 ) + β― π€π E(π¦π ) = ∑π€E(π¦) Substituting for π¦ in ∑π€E(π¦), E(π2 ) = ∑[π€E(β1 + β2 π₯ + π’)] E(π2 ) = ∑[π€(β1 + β2 π₯) + π€E(π’)] E(π2 ) = β1 ∑π€ + β2 ∑π€π₯ + ∑π€E(π’) E(π2 ) = β2 ∑π€π₯ Note that ∑π€ = ∑(π₯−π₯Μ ) ∑(π₯−π₯Μ )2 = 0 [the numerator, ∑(π₯ − π₯Μ ) = 0] and E(π’) = 0. Thus, E(π2 ) = β2 ∑π€π₯ = β2 ∑(π₯ − π₯Μ )π₯ ∑(π₯ − π₯Μ )2 Now note that in the denominator of the right-hand-side expression can be stated as ∑(π₯ − π₯Μ )2 = ∑(π₯ − π₯Μ )(π₯ − π₯Μ ) = ∑(π₯ − π₯Μ )π₯ − π₯Μ ∑(π₯ − π₯Μ ) = ∑(π₯ − π₯Μ )π₯ then ∑π€π₯ = ∑(π₯ − π₯Μ )π₯ =1 ∑(π₯ − π₯Μ )π₯ Thus, E(π2 ) = β2 . 4.2.2.2. E(b1) = β1. Prove that the expected value of the intercept coefficient is equal to the population intercept parameter. π1 = π¦Μ − π2 π₯Μ = π1 = ∑π¦ − π2 π₯Μ π 1 ∑(π½1 + π½2 π₯ + π’) − π2 π₯Μ π 1 π1 = π½1 + π½2 π₯Μ + ∑π’ − π2 π₯Μ π 1 E(π1 ) = E[π½1 + π½2 π₯Μ + ∑π’ − π2 π₯Μ ] π 3-Simple Regression Model 15 of 31 1 E(π1 ) = π½1 + π½2 π₯Μ + E ( ∑π’) − E(π2 π₯Μ ) π 1 Note that since ∑π’ = 0, then E ( ∑π’) = 0, and E(π2 π₯Μ ) = π₯Μ E(π2 ) = π½2 π₯Μ . π Thus, E(π1 ) = π½1 + π½2 π₯Μ − π½2 π₯Μ = π½1 4.2.3. As Estimators of Parameters π·π and π·π , ππ and ππ Have the Minimum Variance 4.2.3.1. Variance of ππ It is important to understand what the variance of the regression slope coefficient π2 represents. Note that π2 , as the estimator of π½2 , is a sample statistic whose value is obtained through a random sampling process, thus making π2 a random variable. The sample statistic π2 has a sampling distribution whose expected value is the population parameter π½2 and its (squared) measure of dispersion is the variance of π2 , denoted by var(π2 ). The same argument goes for π1 . To show that π1 and π2 are the best linear unbiased estimators, we need to determine the formulas for the variance of the two estimators. First, let’s attend to the variance of the slope coefficient π2 . The variance of the random variable π2 is defined as the expected value of the squared deviation of the random variable from its expected value. The squared deviation of interest here is (π2 − π½2 )2 . Therefore, var(π2 ) = E[(π2 − π½2 )2 ] The following steps show how the formula for var(π2 ) is obtained. π2 = ∑(π₯ − π₯Μ )π¦ = ∑π€π¦ ∑(π₯ − π₯Μ )2 π2 = ∑π€(π½1 + π½2 π₯ + π’) π2 = π½1 ∑π€ + π½2 ∑π€π₯ + ∑π€π’) π2 = π½2 + ∑π€π’ Note: ∑π€ = 0 and ∑π€π₯ = 1 Thus, π2 − π½2 = ∑π€π’ The variance of π2 is then, var(π2 ) = E[(π2 − π½2 )2 ] = E[(∑π€π’)2 ] E[(∑π€π’)2 ] = E(π€12 π’12 + π€22 π’22 + β― + π€π2 π’π2 + 2π€1 π€2 π’1 π’2 + β― + 2π€π−1 π€π π’π−1 π’π ) E[(∑π€π’)2 ] = π€12 E(π’12 ) + π€22 E(π’22 ) + β― + π€π2 E(π’π2 ) + 2π€1 π€2 E(π’1 π’2 ) + β― + 2π€π−1 π€π E(π’π−1 π’π ) Since the expected value of the disturbance term u is zero, that is, E(π’π ) = 0, then var(π’π ) = E{[π’π − E(π’π )]2 } = E(π’π2 ) 3-Simple Regression Model 16 of 31 By the homoscedasticity assumption, E(π’π2 ) = ππ’2 , and by the independence-of-disturbance-terms assumption, we have cov(π’π π’π ) = E(π’π π’π ) = 0. Therefore, var(π2 ) = π€12 ππ’2 + π€22 ππ’2 + β― + π€π2 ππ’2 var(π2 ) = ππ’2 ∑π€ 2 Note that, π€= π₯ − π₯Μ ∑(π₯ − π₯Μ )2 π€2 = (π₯ − π₯Μ )2 [∑(π₯ − π₯Μ )2 ]2 Summing over all π₯π , we have ∑π€ 2 = ∑(π₯ − π₯Μ )2 1 = 2 2 [∑(π₯ − π₯Μ ) ] ∑(π₯ − π₯Μ )2 Thus, var(π2 ) = ππ’2 ∑(π₯ − π₯Μ )2 4.2.3.2. Variance of b1 The variance for the estimated regression intercept coefficient is defined as, var(π1 ) = E[(π1 − π½1 )2 ] The relationship, between var(π1 ) and the variance of u is as follows: var(π1 ) = ∑π₯ 2 π∑(π₯ − π₯Μ )2 ππ’2 For the derivation of the formula see the Appendix at the end of this chapter. 4.2.3.3. Covariance of b1 and b2 The covariance of the regression coefficients is defined as: cov(π1 , π2 ) = E[(π1 − π½1 )(π2 − π½2 )] Again, the relationship between the covariance of the regression coefficients and the variance of π’ is as follows: 3-Simple Regression Model 17 of 31 cov(π1 , π2 ) = − π₯Μ π2 ∑(π₯ − π₯Μ )2 π’ Note that cov(π1 , π2 ) is obtained by simply multiplying var(π2 ) by −π₯Μ . The derivation of this relationship is explained in the Appendix. 4.2.4. The Covariance Matrix We can use the inverse matrix X −1 to obtain the variances and the covariance for the regression coefficients. π −1 ∑π₯ 2 π∑(π₯ − π₯Μ )2 = π₯Μ − [ ∑(π₯ − π₯Μ )2 π₯Μ ∑(π₯ − π₯Μ )2 1 ∑(π₯ − π₯Μ )2 ] − The covariance matrix is obtained by the scalar multiplication of the inverse matrix with ππ’2 [ var(π1 ) cov(π1 , π2 ) ∑π₯ 2 cov(π1 , π2 ) π∑(π₯ − π₯Μ )2 ] = ππ’2 var(π2 ) π₯Μ − [ ∑(π₯ − π₯Μ )2 π₯Μ ∑π₯ 2 π2 ∑(π₯ − π₯Μ )2 π∑(π₯ − π₯Μ )2 π’ = 1 π₯Μ − π2 2 ) ∑(π₯ − π₯Μ ] [ ∑(π₯ − π₯Μ )2 π’ − π₯Μ π2 ∑(π₯ − π₯Μ )2 π’ 1 π2 ∑(π₯ − π₯Μ )2 π’ ] − This procedure will become very handy in multiple regression with several regression coefficients. 4.2.5. ππ is the Best linear unbiased estimator of π·π The term “best” means that no other estimator of π½2 has a smaller variance than π2 . This is shown below: As derived above, π2 = ∑(π₯ − π₯Μ )π¦ = ∑π€π¦ ∑(π₯ − π₯Μ )2 Also, π2 = π½2 + ∑(π₯ − π₯Μ )π’ = π½2 + ∑π€π’ ∑(π₯ − π₯Μ )2 Now let πΜ2 be an alternative estimator of π½2 such that πΜ2 = ∑ππ¦ Where π = π€ + π. The term π€ is defined as before, and π is an arbitrary constant. Substituting for π¦ = π½1 + π½2 π₯ + π’ in the above relationship, we have πΜ2 = ∑π(π½1 + π½2 π₯ + π’) πΜ2 = π½1 ∑π + π½2 ∑ππ₯ + ∑ππ’ The expected value of πΜ2 is then, 3-Simple Regression Model 18 of 31 E(πΜ2 ) = π½1 ∑π + π½2 ∑ππ₯ + E(∑ππ’) E(πΜ2 ) = π½1 ∑π + π½2 ∑ππ₯ since E(∑ππ’) = ∑πE(π’) = 0 πΜ2 would be an unbiased estimator if and only if ∑π = 0, and ∑ππ₯ = 1 ∑π = ∑π€ + ∑π Since ∑π = ∑π€ = 0, then ∑π = 0 ∑ππ₯ = ∑π€π₯ + ∑ππ₯ Since ∑ππ₯ = ∑π€π₯ = 1, then ∑ππ₯ = 0 Having established that for πΜ2 to be unbiased, ∑π = ∑ππ₯ = 0, now we can attend to the variance of πΜ2 , var(πΜ2 ). πΜ2 = π½2 + ∑ππ’ πΜ2 − π½2 = ∑ππ’ 2 var(πΜ2 ) = E [(πΜ2 − π½2 ) ] = E[(∑ππ’)2 ] var(πΜ2 ) = σ2π’ ∑π 2 The term ∑π2 can be written as, ∑π 2 = ∑(π€ + π)2 = ∑π€ 2 + ∑π 2 + 2∑π€π ∑π€π = π∑π€ = 0 ∑π 2 = ∑π€ 2 + ∑π 2 Thus, var(πΜ2 ) = σ2π’ ∑π 2 = σ2π’ (∑π€ 2 + ∑π 2 ) var(πΜ2 ) = σ2π’ ∑π€ 2 + σ2π’ ∑π 2 var(πΜ2 ) = var(π2 ) + σ2π’ ∑π 2 Since the second term on the right-hand-side, σ2π’ ∑π 2 > 0, then var(πΜ2 ) > var(π2 ) 5. The Estimator of the Variance of the Prediction Error As explained above, in the simple linear regression model, the disturbance (error) term π’ represents the random component of the dependent variable π¦ for a given π₯: π¦ = π½1 + π½2 π₯ + π’. One of the major assumptions of the model is that the error terms are normally distributed about the mean π¦ for a given π₯, µπ¦|π₯ , and under the homoscedasticity assumption, the variance of π’, ππ’2 , remains unchanged for values of π₯. When sample data is used to estimate the population regression equation, among the important summary characteristics computed from the sample data is the standard error of estimate, se(π). This sample statistic is the square root of the sample variance of the error term, var(π), which is the estimator of the population ππ’2 . 3-Simple Regression Model 19 of 31 The comparison of the population regression equation with the sample regression equation shows the relationship between π’ and π. Population: Sample: π¦ = π½1 + π½2 π₯ + π’ π¦ = π1 + π2 π₯ + π The equation for the sample regression line, which is fitted to the scatter diagram through the least squares method, is π¦Μ = π1 + π2 π₯ Substituting for π¦Μ in the sample regression function π¦ = π1 + π2 π₯ + π, we have π¦ = π¦Μ + π Thus, the error term or the residual is the difference between the observed π¦ and the predicted π¦, π¦Μ, in the sample: π = π¦ − π¦Μ The variance of π is, like all variances, is defined as the mean squared deviation from the mean. Here the squared deviation of π is: (π − πΜ )2 , where πΜ = ∑π π Since, as shown previously, ∑π = 0, then πΜ = 0. Therefore, the squared deviation of π is simply π 2 . To find the mean squared deviation we divided the sum of squared deviation by the sample size π. But if sum of squared deviation of π is divided by π alone the result is a sample variance which is a biased estimator of the population variance. To obtain an unbiased estimator of σ2π’ , the formula for the sample variance of π is: var(π) = ∑π 2 π−2 5.1. π―ππ«(π) is an unbiased estimator of πππ For var(π) to be an unbiased estimator of σ2π’ , we must prove that, E[var(π)] = E ( ∑π 2 π−2 ) = σ2π’ The proof is shown in the Appendix. 5.2. The Standard Error of Estimate The square root of the var(π) is called the standard error of estimate: se(π). se(π) = √ ∑π 2 ∑(π¦ − π¦Μ)2 =√ π−2 π−2 3-Simple Regression Model 20 of 31 Using ∑π 2 = ∑(π¦ − π¦Μ)2 = 961.59 as computed above for the test score-hours of study example, se(π) = √ 961.59 = 10.9635 8 The standard error of the estimate is a measure of the dispersion of the observed y about the regression line. The more scattered the points in a scatter diagram, the bigger the standard error of estimate. Thus, se(π) provides an estimate of the dispersion of the disturbance term in the population, σπ’ . The larger the random component of the dependent variable in the population, the more scattered the sample data will be about the regression line. The standard error of estimate, therefore, provides a measure of the strength of the relationship by the dependent and the independent variables. However, since value of se(π) is affected by the scale of the data, the absolute size of the standard error of estimate does not necessarily reflect how closely π¦ and π₯ are related. Example: Household food expenditure and weekly income The data and other calculations are in the Excel file CH3 DATA.xlsx (“food” tab). The data (π = 40) shows the weekly food expenditure of 40 households in dollars and weekly income in hundreds of dollars ($100). Let π₯ = π€πππππ¦ ππππππ and π¦ = π€πππππ¦ ππππ ππ₯ππππππ‘π’ππ. The coefficients of the sample regression equation π¦Μ = π1 + π2 π₯ Let’s use the matrix method. First determine the elements of the matrix π and π π=[ π=[ π ∑π₯ ∑π₯ ] ∑π₯ 2 40 784.19 ∑π¦ π=[ ] ∑π₯π¦ 784.19 ] 17202.64 11342.94 π=[ ] 241046.8 Using Excel, you can easily find the π −1, the inverse matrix. 0.23516 π −1 = [ −0.01072 −0.01072 ] 0.00055 Then, π 0.23516 [ 1 ] = π −1 β π = [ π2 −0.01072 −0.01072 11342.94 ][ ] 0.00055 241046.8 π 83.4160 [ 1] = [ ] π2 10.2096 Thus, π¦Μ = 83.416 + 10.2096π₯ The value π2 = 10.21 implies that for each additional $100 income, we estimate that weekly expenditure is expected to rise by $10.21. The following diagram shows the estimated regression line fitted to the scatter diagram. 3-Simple Regression Model 21 of 31 The Fitted Regression y = weekly food expenditure ($) 700 600 500 Point of the means (π₯Μ , π¦Μ ) yΜ 400 300 200 100 0 0 10 20 30 40 x = weekly income ($100) Point of the means Note that the regression line goes through the point of the means. This means that when π₯ = π₯Μ , then π¦Μ = π¦Μ . π¦Μ = π1 + π2 π₯ substitute π₯Μ for π₯ and for π1 = π¦Μ − π2 π₯Μ in the regression equation. π¦Μ = π¦Μ − π2 π₯Μ + π2 π₯Μ π¦Μ = π¦Μ When π₯ = π₯Μ = 19.6, then π¦Μ = 83.416 + 10.2096(19.6) = 283.57 = π¦Μ . Income elasticity of food expenditure Income elasticity measures how sensitive food expenditure is to changes in income. Elasticity shows the proportionate (percentage) change in food expenditure relative to a proportionate or percentage change in income π= ππ¦Μ⁄π¦Μ ππ¦Μ π₯ π₯ = = π2 ππ₯ ⁄π₯ ππ₯ π¦Μ π¦Μ Find the elasticity at the point of the means. π = 10.2096 × 19.6 = 0.71 283.57 This shows that at the point of the means, it is estimated that food expenditure rises by 0.71% for each 1% rise in income. 3-Simple Regression Model 22 of 31 Variance of “e” var(π) = ∑π 2 ∑(π¦ − π¦Μ)2 = π−2 π−2 var(π) = 304505.18 = 8013.294 38 The covariance matrix The covariance matrix shows the variance of regression coefficients and their covariance. [ var(π1 ) cov(π1 , π2 ) cov(π1 , π2 ) ] var(π2 ) The individual elements of the matrix can be computed using the respective formulas: var(π1 ) = var(π) × var(π2 ) = ∑π₯ 2 17202.64 = 8013.294 × = 1884.442 2 π∑(π₯ − π₯Μ ) 40 × 1828.788 var(π) 8013.294 = = 4.382 ∑(π₯ − π₯Μ )2 41828.788 cov(π1 , π2 ) = var(π) × [ var(π1 ) cov(π1 , π2 ) −π₯Μ −19.605 = 8013.294 × = −85.903 ∑(π₯ − π₯Μ )2 1828.788 cov(π1 , π2 ) 1884.442 ]=[ var(π2 ) −85.903 −85.903 ] 4.382 We can obtain this matrix through the scalar multiplication of the inverse matrix π −1 by var(π). var(π)π −1 = 8013.294 × [ 0.23516 −0.01072 −0.01072 1884.442 ]=[ 0.00055 −85.903 −85.903 ] 4.382 2. Nonlinear Relationships In explaining the simple linear regression model we have assumed that the population parameters π½1 and π½2 are linear—that is, they are not expressed as, say, π½22 , 1⁄π½2 , or any form other than π½2 —and also the impact of the changes in the independent variable on y works directly through x rather than through expressions such as, say, π₯ 2 or ln(π₯). In this section, we will continue assuming that the regression is linear in parameters, but relax the assumption of linearity of the variables. In many economic models the relationship between the dependent and independent variables is not a straight line relationship. That is the change in π¦ does not follow the same pattern for all values of π₯. Consider for example an economic model explaining the relationship between expenditure on food (or housing) and income. As income rises, we do expect expenditure on food to rise, but not at a constant rate. In fact, we should expect the rate of increase in expenditure on food to decrease as income rises. Therefore the relationship between income and food expenditure is not a straight-line relationship. The following is an outline of various functional forms encountered in regression analysis. 3-Simple Regression Model 23 of 31 2.1. Quadratic Model In a quadratic model the explanatory variable appears as a squared quantity. π¦Μ = π1 + π2 π₯ 2 Example: House Price and Size The data and other calculations are in the Excel file CH3 DATA.xlsx (“ππ” tab). The data show the prices of 1080 houses sold in Baton Rouge, Louisiana, in the 2005. The explanatory variable is the size of the house (in square feet). Estimate the following model Μ = π1 + π2 πππΉπ 2 ππ πΌπΆπΈ The process of finding the values of the coefficients is exactly the same as before. Once the explanatory variables are squared, then they are treated as just another set of π₯ data. But when the coefficients are estimated, beware of the significant difference in the interpretation of their values compared to the linear regression coefficients. π¦Μ = 55776.566 + 0.0154π₯ 2 Now note that the regression function is not linear. Therefore, unlike a linear function, the slope is not a constant. The slope here is defined for each point on the graph and it is the first derivative of the function. ππ¦Μ = 2π2 π₯ ππ₯ The table below and the following diagram show that when size is 2000 square feet the price rises by $61.69 per square foot, for a house with 4,000 square feet, price rises by $123.37, and by $185.06 for each additional square foot for a size of 6,000 square feet. π₯ = πππΉπ 2,000 4,000 6,000 3-Simple Regression Model π2 = 0.0154 ππ¦Μ⁄ππ₯ = 2π2 π₯ $61.69 123.37 185.06 24 of 31 1000000 yΜ 900000 800000 Sales price ($) 700000 185.06 600000 500000 400000 123.37 300000 200000 61.69 100000 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Total square feel Elasticity π= ππ¦Μ⁄π¦Μ ππ¦Μ π₯ π₯ = = (2π2 π₯) ππ₯ ⁄π₯ ππ₯ π¦Μ π¦Μ π₯ 2,000 4,000 6,000 π¦ 117,462 302,517 610,943 π2 = 0.0154 π = 2π2 π₯ 2 ⁄π¦Μ 1.05 1.63 1.82 For example, for a house with 4,000 feet, a 1% increase in the size adds 1.63% to the price. 2.2. Log-Linear Model The log-linear model takes the form of Μ = π1 + π2 π₯ ln(π¦) To run the model you must first change the π¦ values to ln(π¦). The procedure is then as usual. Again, beware of the change in the interpretation of the π2 coefficient and the required adjustment to find the predicted value. Use the same data from the previous example in the ππ tab. The result of the regression is: Μ ln(ππ πΌπΆπΈ) = 10.8386 + 0.00041πππΉπ Prediction When you plug in values for πππΉπ in the regression equation, the predicted value will be in terms of ln(π¦). You must then take the exponent to find π¦Μ. 3-Simple Regression Model 25 of 31 Μ Μ ln (π¦) 11.66 12.48 13.31 π₯ 2,000 4,000 6,000 π¦Μ = π ln(π¦) 115,975.47 263,991.38 600,915.40 Interpretation of the ππ coefficient In the regression equation, π2 = 0.00041. What does this figure indicate? Consider the calculations in the following table. The table shows that for 1 square-feet increase in size, from π₯0 = 2,000 to π₯1 = 2,001, the house price is predicted to rise by 0.0004 or 0.04%. π1 = 10.8386 π2 = 0.000411 Μ ln (π¦) = π¦Μ = βπ₯ = π₯1 − π₯0 = βπ¦Μ = π¦1 − π¦0 = βπ¦Μ⁄π¦Μ0 = βπ¦Μ% = π₯0 π₯1 1 2,000 11.6611 115,975.5 1 2,001 11.6615 116,023.2 1 47.7 0.0004 0.04% Slope of the regression function To explain the slope of a log-linear function, consider the following diagram. Note that like a quadratic function, the slope of the log-linear function is not constant. The slope is defined for each point on the continuous graph as the first derivative evaluated for each π₯. 1200000 yΜ 1000000 Sales price ($) 800000 600000 247.14 400000 108.57 200000 47.70 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Total square feel Μ Here is how you find the first derivative of log-linear function ln (π¦) = π1 + π2 π₯. First find the exponent of both sides of the equation. π¦Μ = π π1+π2 π₯ 3-Simple Regression Model 26 of 31 Then take the derivative of π¦Μ with respect to π₯. ππ¦Μ = π2 π π1+π2 π₯ = π2 π¦Μ ππ₯ For example, when π₯ = 2,000 πππΉπ, the slope is ππ¦Μ⁄ππ₯ = 0.00041 × 115,975.5 = 47.70. This slope figure implies that when the house size is 2,000 πππΉπ, for each additional square-feet the house price rises by $47.70. Or, we can also say that for a house with a predicted price of, say, $100,000, the estimated increase in price for an additional square foot is π2 π¦Μ = 0.00041 × 100,000 = $41.00. Elasticity Let us start with the general definition of elasticity: π= ππ¦Μ⁄π¦Μ ππ¦Μ π₯ = ππ₯ ⁄π₯ ππ₯ π¦Μ Substituting for π = π2 π¦Μ ππ¦Μ ππ₯ = π2 π¦Μ in the elasticity formula, we have: π₯ = π2 π₯ π¦Μ π₯ 2,000 4,000 6,000 Μ ln (π¦) 11.66 12.48 13.31 π¦Μ 115,975.47 263,991.38 600,915.40 ππ¦Μ⁄ππ₯ = π2 π¦Μ 47.70 108.57 247.14 π = π2 π₯ 0.823 1.645 2.468 For example, when π₯ = 2,000 square feet, π = 0.823 implies that the price is predicted to rise by 0.823% for a 1% increase in size. 2.3. Regression with Indicator (Dummy) Variables An indicator variable is a binary variable which take on values of 0 or 1 only. This variable is used when it is of qualitative characteristic: gender, race, or location. For example, the price of a house is significantly related to the qualitative characteristic location. Two houses with the same size will have different prices according to where each is located. In regression if the variable has the characteristic of interest, then it is assigned a “1”, otherwise, a “0”. If the house is located in a specific location which we anticipate would impact its price, then it is assigned 1, otherwise 0. Example: Location Impact on the Price of Houses Located Near a University The Excel file CH3 DATA.xlsx contains the UTOWN data in the tab “π’π‘ππ€π”. The data provides prices of 1,000 houses (π¦ = ππ πΌπΆπΈ) according to their location, where π₯ = πππππ = 1 for house located near a major university campus, and π₯ = πππππ = 0 for houses in other neighborhoods. The regression equation is, Μ = π1 + π2 πππππ ππ πΌπΆπΈ Using the regular regression formulas to estimate the coefficients, we have Μ = 215.732 + 61.5091πππππ ππ πΌπΆπΈ 3-Simple Regression Model 27 of 31 The following table shows the impact of location on the price of a house. The estimated price of a house near the university campus is $277,241, and that of house in other locations is $215,372. Note that the intercept represents the estimated mean price of a house away from the campus. The slope π2 = 61.509 is the amount added to the price of a house located near the campus: $215.372 + $61.509 = $277,241. bβ = bβ = 215.732 61.509 ππ πΌπΆπΈ = π₯ππ πΌπΆπΈ = π₯=0 1 0 215.732 π₯=1 1 1 277.241 61.509 Also note that we can find the intercept, the sample mean price of non-university houses, and the sample mean price of near-university houses, directly from the data. π¦Μ π₯=0 = 215.732 π¦Μ π₯=1 = 277.241 Appendix Variance of ππ To determine the variance b1 we start with basic formula to compute the intercept coefficient of regression. π1 = π¦Μ − π2 π₯Μ Substituting for π¦Μ and π2 , we have 1 ∑(π₯ − π₯)π¦ π1 = ∑ π¦ − π₯Μ π ∑(π₯ − π₯)2 1 π1 = ∑ π¦ − ∑π€π¦π₯Μ π 1 π1 = ∑ ( − π₯Μ π€) π¦ π 1 π1 = ∑ ( − π₯Μ π€) (π½1 + π½2 π₯ + π’) π 1 1 1 π1 = ∑ ( π½1 + π½2 π₯ + π’ − π₯Μ π€ π½1 − π₯Μ π€ π½2 π₯ − π₯Μ π€π’) π π π 1 π1 = π½1 + π½2 π₯Μ + ∑π’ − π½1 π₯Μ ∑π€ − π½2 π₯Μ ∑π€π₯ − π₯Μ ∑π€π’ π 1 1 π1 = π½1 + ∑π’ − π₯Μ ∑π€π’ = π½1 + ∑ ( − π₯Μ π€) π’ π π 1 π1 − π½1 = ∑ ( − π₯Μ π€) π’ π 3-Simple Regression Model 28 of 31 2 1 var(π1 ) = E[(π1 − π½1 )2 ] = E {[∑ ( − π₯Μ π€) π’] } π 2 2 2 1 1 1 var(π1 ) = E [( − π₯Μ π€1 ) π’12 + ( − π₯Μ π€2 ) π’22 + β― + ( − π₯Μ π€π ) π’π2 + β― ] π π π 2 1 var(π1 ) = ∑ ( − π₯Μ π€1 ) ππ’2 π Simplify the first term on the right hand side: 1 2 ∑ ( − π₯Μ π€1 ) = ∑ ( π2 π 1 2 ∑ ( − π₯Μ π€1 ) = π 1 2 ∑ ( − π₯Μ π€1 ) = π var(π1 ) = 1 1 π + +π₯ Μ 2 π€2 − 2 π π₯ Μ π€) = 1 π +π₯ Μ 2 ∑π€2 − 2 π π₯ Μ ∑π€ π₯ Μ 2 ∑(π₯ − π₯Μ )2 ∑(π₯ − π₯Μ )2 + ππ₯Μ 2 ∑π₯2 = π∑(π₯ − π₯ Μ )2 π∑(π₯ − π₯ Μ )2 ∑π₯ 2 π∑(π₯ − π₯Μ )2 ππ’2 Covariance of ππ and ππ cov(π1 , π2 ) = −π₯Μ ∑(π₯ − π₯Μ )2 ππ’2 It is a straightforward exercise to arrive at the above covariance formula from this definition by substituting the terms used in deriving the var(π1 ) and var(π2 ). 1 π1 − π½1 = ∑ ( − π₯Μ π€) π’ π and π2 − π½2 = ∑π€π’ in the above expectation operation 1 cov(π1 , π2 ) = E {[∑ ( − π₯Μ π€) π’] (∑π€π’)} π The covariance is: cov(π1 , π2 ) = −π₯Μ ∑(π₯ − π₯Μ )2 ππ’2 Proof of E[var(π)] = E ( ∑π 2 ) = σ2π’ π−2 Starting with the definition of the error term, 3-Simple Regression Model 29 of 31 π = π¦ − π¦Μ = π¦ − π1 − π2 π₯ substitute for π1 = π¦Μ − π2 π₯Μ . π = π¦ − π¦Μ + π2 π₯Μ − π2 π₯ π = (π¦ − π¦Μ ) − π2 (π₯ − π₯Μ ) (1) π¦ = π½1 + π½2 π₯ + π’ (2) Summing for all π, we have ∑π¦ = ππ½1 + π½2 ∑π₯ + ∑π’ Divide both sides of the equation by π, π¦Μ = π½1 + π½2 π₯Μ + π’Μ Subtracting from (2), (π¦ − π¦Μ ) = π½2 (π₯ − π₯Μ ) + (π’ − π’Μ ) and substituting for (π¦ − π¦Μ ) in (1): π = π½2 (π₯ − π₯Μ ) + (π’ − π’Μ ) − π2 (π₯ − π₯Μ ) π = −(π₯ − π₯Μ )(π2 − π½2 ) + (π’ − π’Μ ) Square both sides, π 2 = (π₯ − π₯Μ )2 (π2 − π½2 )2 + (π’ − π’Μ )2 − 2(π₯ − π₯Μ )(π2 − π½2 )(π’ − π’Μ ) and sum for all π: ∑π2 = (π2 − π½2 )2 ∑(π₯ − π₯Μ )2 + ∑(π’ − π’Μ )2 − 2(π2 − π½2 )∑(π₯ − π₯Μ )(π’ − π’Μ ) Find the expected value from both sides E(∑π2 ) = E[(π2 − π½2 )2 ∑(π₯ − π₯Μ )2 ] + E[∑(π’ − π’Μ )2 ] − 2E[(π2 − π½2 )∑(π₯ − π₯Μ )(π’ − π’Μ )] Consider the three components of the RHS of the equation separately, ο· E[(π2 − π½2 )2 ∑(π₯ − π₯Μ )2 ] = ∑(π₯ − π₯Μ )2 E[(π2 − π½2 )2 ] E[(π2 − π½2 )2 ∑(π₯ − π₯Μ )2 ] = ∑(π₯ − π₯Μ )2 var(π2 ) E[(π2 − π½2 )2 ∑(π₯ − π₯Μ )2 ] = σ2π’ ο· E[∑(π’ − π’Μ )2 ] = E[∑(π’2 − π’Μ 2 − 2π’π’Μ )] E[∑(π’ − π’Μ )2 ] = E[∑π’2 − ππ’Μ 2 ] 1 E[∑(π’ − π’Μ )2 ] = E [∑π’2 − (∑π’)2 ] π 3-Simple Regression Model 30 of 31 1 E[∑(π’ − π’Μ )2 ] = ∑E(π’2 ) − ∑E(π’2 ) π E[∑(π’ − π’Μ )2 ] = πσ2π’ − σ2π’ = (π − 1)σ2π’ ο· E[(π2 − π½2 )∑(π₯ − π₯Μ )(π’ − π’Μ )] = E[(π2 − π½2 )∑(π₯ − π₯Μ )π’] Note: ∑(π₯ − π₯ Μ )(π’ − π’Μ ) = ∑π’(π₯ − π₯Μ ) − π’Μ ∑(π₯ − π₯Μ ) = ∑(π₯ − π₯Μ )π’ Substituting for π2 − π½2 = ∑(π₯ − π₯Μ )π’ ∑(π₯ − π₯Μ )2 on the right-hand-side, E[(π2 − π½2 )∑(π₯ − π₯Μ )π’] = E [ ∑(π₯ − π₯Μ )π’ ∑(π₯ − π₯Μ )π’] ∑(π₯ − π₯Μ )2 E[(π2 − π½2 )∑(π₯ − π₯Μ )π’] = 1 E[(∑(π₯ − π₯Μ )π’)2 ] ∑(π₯ − π₯Μ )2 E[(π2 − π½2 )∑(π₯ − π₯Μ )π’] = 1 ∑(π₯ − π₯Μ )2 E(π’2 ) = σ2π’ ∑(π₯ − π₯Μ )2 Finally, E(∑π 2 ) = σ2π’ + (π − 1)σ2π’ − 2σ2π’ = (π − 2)σ2π’ E( ∑π 2 π−2 ) = σ2π’ which proves that E[var(π)] = σ2π’ . That is, var(π) is an unbiased estimator of σ2π’ . 3-Simple Regression Model 31 of 31