Chapter 3 Multiple Regression 1 Outline 3.1. Definitions 3.1.1 Multiple Regression Model 3.1.2 Population regression function 3.1.3 Sample regression function 3.2. OLS Estimator in multiple regression model 3.2.1 Ordinary least square estimators 3.2.2 Assumptions of multiple regression model 3.2.3 Unbiased and efficient properties 3.3. Measure of fit 2 3.1 Definition 3.3.1 Multiple regression model revise chapter 2: - The error u arises because of factors, or variables, that influence Y but are not included in the regression function. - The key assumption 3 – that all other factors affecting y are uncorrelated with x – is often unrealistic -> difficult to draw ceteris paribus conclusions about how x affects y 3 Example: compared 2 model 𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑎𝑠𝑠𝑒𝑡 + 𝑢𝑖 (1) 𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢𝑖 (2) ^ We know that the simple regression coefficient 𝛽2 (2) does not ^ usually equal the multiple regression coefficient 𝛽2 (1). There ^ ^ are two distinct cases where 𝛽2 (1) and 𝛽2 (2) are identical: 1. The partial effect of x2 on y is zero in the sample. That is 𝛽3^= 0. 2. x1 and x2 are uncorrelated in the sample. 4 3.3.1 Multiple regression model • is more amenable to ceteris paribus analysis • Add more factors to our model -> more of variation in y can be explained -> better model for predicting the dependent variable. 5 MRM can incorporate fairly general function from relationships. 𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 2 + 𝑢𝑖 6 MRM is the most widely used vehicle for empirical analysis in economics and other social sciences Wage=f(edu, exper) Q=f(K,L) 7 3.1.2 Population regression function Yi 1 2 X 2i .... k X ki ui • Y = One dependent variable (criterion) • X = Two or more independent variables (predictor variables). • ui the stochastic disturbance term • Sample size: >= 50 (at least 10 times as many cases as independent variables) • 1 is the intercept • k measures the change in Y with respect to Xk, holding other factors fixed. 8 9 10 3.1.3 The Sample Regression Function (SRF) • Population regression function E (Y / X i ) f ( X i ) 1 2i X 2i 3i X 3i • Sample regression function • • • • Yˆi ˆ1 ˆ2 i X 2 i ˆ3i X 3i 𝑌 i = estimator of E(Y | Xi) 𝛽መ 1 = estimator of β1 𝛽መ2 = estimator of β2 𝛽መ3 = estimator of β3 • An estimator, also known as a (sample) statistic, is simply a rule or formula or method that tells how to estimate the population parameter from the information provided by the sample. • A particular numerical value obtained by the estimator in an application is known as an estimate. 11 Example- Multiple regression function • Problem 3.2: (Suppose that there are only 2 independent variables in the MRM) A labor economist would like to examine the effects of job training on worker productivity. In this case, there is little need for formal economic theory. Basic economic understanding is realizing that factors such as education and experience affect worker productivity. Also, economists are well aware that workers are paid commensurate with their productivity. 12 Example- Multiple regression function Model: wage = f(educ,exper ) Where: wage = hourly wage educ: years of formal education exper: years of workforce experience PRF: 𝑤𝑎𝑔𝑒 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝑢 13 3.1.3. The Sample Regression Function (SRF) A sample of Y values corresponding to some fixed X’s. Can we estimate the PRF from the sample data? So, if we have a data for problem 3.1, we can have the SRF: 𝑤𝑎𝑔𝑒 ෟ = 𝛽መ1 + 𝛽መ2 𝑒𝑑𝑢 + 𝛽መ3 𝑒𝑥𝑝𝑒𝑟 + 𝑢ො 𝑖 14 3.2. The OLS estimator in multiple regression model • 3.2.1 Ordinary least square estimators • 3.2.2 Assumptions of MRM • 3.2.3 Unbiased and efficient properties 15 3.2.1 OLS Estimators Considered the three-variable model • To find the OLS estimators, let us first write the sample regression function (SRF) as follows: 7.4.1 Yi ˆ1 ˆ2 X 2i ˆ3 X 3i uˆi • The residual sum of squares (RSS) ∑uˆ2i is as small as possible 2 uˆ (Y Yˆ ) 2 min i i i ˆ ˆ X ˆ X ˆ u Y i 1 2 2i 3 3i 2 i 2 min 16 3.2.1 OLS Estimators • Partial derivative Yi nˆ1 ˆ2 X 2i ˆ3 X 3i 2 ˆ ˆ ˆ X Y X X 2i i 1 2i 2 2i 3 X 2 i X 3i 2 ˆ ˆ ˆ X 3iYi 1 X 3i 2 X 2i X 3i 3 X 3i 17 3.2.1 OLS Estimators • If we denote: yi Yi Y x2 i X 2 i X 2 x3i X 3i X 3 x X x X y Y 2 2i 2 2i 2 3i 2 3i 2 i 2 i n X 2 2 n X 3 2 nY 2 x x X X nX X y x Y X n.Y .X y x Y X n.Y .X 2i 3i 2i 3i 2 i 2i i 2i 2 i 3i i 3i 3 3 18 3.2.1 OLS Estimators • We will obtain: ˆ1 Y ˆ2 X 2 ˆ3 X 3 ˆ yx x x x y x x x x x y x x x x y x x x x x 2 3 2 2 ˆ 3 2 2 2 3 2 3 2 2 3 2 2 3 2 2 3 2 3 2 3 2 2 2 3 19 3.2.1 OLS Estimators • Example: We have a following data Y 20 19 18 17 16 15 15 14 14 12 X2 8 7 6 5 5 5 4 4 3 3 X3 3 4 5 5 6 6 7 7 8 9 20 3.2.1 OLS Estimators • We obtain Y 160 X 50 X 60 i 2i 3i Y 16 X2 5 X3 6 Y 2616 X 274 X 390 Y X 835 Y X 920 X X 274 2 i 2 2i 2 3i i 2i i 3i 2i 3i 21 3. OLS Estimators • and y Y nY 56 x X nX 24 x X nX 30 y x Y X nY X 35 y x Y X nY X 40 x x X X nX X 26 2 i 2 2 i 2 2i 2 2i 2 3i 2 3i 2 2 2 3 i 2i i 2i 2 i 3i i 3i 3 2 i 3i 2i 3i 2 3 22 3.2.1 OLS Estimators • and ˆ 2 0.2272 ˆ3 1.1363 ˆ1 21.6818 Yˆi 21.6818 0.2272 X 2 1.1363 X 3 23 3.2.1 OLS Estimators • Variances and Standard Errors of OLS Estimators 1 ˆ Var (1 ) ( n X 22 x32 X 32 x22 2 X 2 X 3 x2 x3 x x ( x x ) x Var ( ˆ ) x x ( x x ) 2 x 2 2 ˆ Var ( 3 ) 2 2 2 x2 x3 ( x2 x3 ) 2 2 2 3 2 3 2 3 2 2 2 2 3 2 ) 2 2 2 2 3 24 3. OLS Estimators • or, equivalently, Var ( ˆ ) 2 Var ( ˆ3 ) x 2 2 2 se(ˆ2 ) var( ˆ2 ) (1 r ) 2 23 2 se( ˆ3 ) var( ˆ3 ) 2 2 x ( 1 r 3 23 ) where r23 is the sample coefficient of correlation between X2,X3 • In all these formulas σ2 is the variance of the population disturbances ui 2 ˆ ˆ 2 2 u i n3 7.4.19 25 Example- Stata output • Model: wage = f(educ,exper ) 26 3.2.2 The Three-Variable Model: Notation and Assumptions Assumptions Yi 1 2 X 2i 3 X 3i ui 1. Linear regression model, or linear in the parameters. 2. X values are fixed in repeated samplings. X is assumed to be nonstochastic. 3. Zero mean value of disturbance ui: E(ui|X2i, X3i) = 0. Then we have Zero covariance between ui and each X variable cov (ui, X2i) = cov (ui,X3i) = 0 4. Homoscedasticity or constant variance of ui: Var(ui)=σ2 5. No serial correlation between the disturbances: Cov(ui,uj) = 0, i ≠ j 6. The number of observations n must be greater than the number of parameters to be estimated. 7. Variability in X values. The X values in a given sample must not all be the same. 8 No specification bias or the model is correctly specified. 9. No exact collinearity between the X variables. 27 Assumption 3: Zero mean value of disturbance ui: E(ui|X2i, X3i) = 0. This Ass can fail if: - the functional relationship between the explained and explanatory variables is misspecified in equation - omitting an important factor that is correlated with any of x1,x2, …xk When xj is correlated with u for any reason, then xj is said to be an endogenous explanatory variable 28 Assumption 9: No exact collinearity between the X variables. • If an independent variable in the regression is an exact linear combination of the other independent variables, then we say the model suffers from perfect collinearity, and it cannot be estimate (chapter5) • Note that Ass 9 allow independent variables to be correlated, they just cannot perfectly correlated. If we did not allow for any correlation, then multiple regression would be of very limited use for econometric analysis. 29 3.2.3. Unbiased and efficient properties Gauss-Markov Theorem ˆ1 , ˆ2 ,...., ˆk are the best linear unbiased estimators (BLUEs) of 1 , 2, ......, k • An estimator 𝛽መ𝑗 is an unbiased estimator of j if 𝐸(𝛽መ𝑗 ) = 𝛽𝑗 • An estimator of 𝛽መ𝑗 is linear if and only if it can be expressed as a linear function of the data on the n dependent variable: ~ j wij y i i 1 • “best” is defined as smallest variance. 30 3.2.3. Unbias and efficient properties • The sample regression line (surface) passes through the means of (Y , X 2 ,..., X k ) • The mean value of the estimated Yi is equal to the mean value of the actual Yi. ˆ Y Y • Sum of residuals is equal to 0 n uˆ i 1 i • The residuals are uncorrelated with Xki : • The residuals are uncorrelated with Yˆi 0 n X i 1 ki uˆ i 0 n Yˆ uˆ i 1 i i 0 31 ̂ 3.2.3. Unbias and efficient properties Standard errors of the OLS estimators • An unbiased estimator of n 2 2 E ( u ) u : i /n 2 2 i 1 This is not a true estimator because we can not observe the ui. 2 ˆ u i RSS • The unbiased estimator of : ˆ nk nk 2 2 distribution with df = number of RSS / follows 2 2 observations – number of estimated parameters = n-k ̂ Positive is called the standard error of the regression (SER) (or Root MSE). SER is an estimator of the standard deviation of the error term. 32 3.2.3. Unbias and efficient properties Var ( ˆ j ) 2 TSS j (1 R 2j ) n TSS j ( xij x j ) 2 • Where is total sample i 1 2 R variation in xj and is the R-squared from j regressing xj on all other independent variables (and including an intercept). • Since is unknown, we replace it with its estimator ̂ . Standard error: 2 1/ 2 ˆ ˆ se( j ) /[TSS j (1 R j )] 33 3.3 Measure of fit or coefficient of determination R2 • The total sum of squares (TSS) TSS y (Yi Y ) Yi nY 2 i 2 2 2 • The explained sum of squares (ESS) 2 ˆ ESS yˆ (Yi Y ) ˆ2 yi x2i ˆ3 yi x3i 2 i • The residual sum of squares (RSS) RSS (Yi Yˆi ) 2 uˆi2 TSS ESS • Goodness of fit - Coefficient of Determination R2 ESS RSS R 1 TSS TSS 2 The fraction of the sample variation in Y that is explained by X2 and X3. 34 Example- Goodness of fit • Determinants of college GPA: - The variables in GPA1.dta include the college grade point average (colGPA), high school GPA (hsGPA) and achievement test score (ACT), AGE, Campus for a sample of 141 students from a large university 35 Example- Goodness of fit • Determinants of college GPA: - 36 Output interpretation • hsGPA and ACT together explain about ?% of the variation in college GPA for this sample of students. • There are many other factors including family background, personality, quality of high school education, affinity for college that contribute to a student’s college performance. 37 3.3. Measure of fit • Note that R2 lies between 0 and 1. o If it is 1, the fitted regression line explains 100 percent of the variation in Y o If it is 0, the model does not explain any of the variation in Y. • The fit of the model is said to be “better’’ the closer R 2 is to 1 • As the number of regressors increases, R2 almost invariably increases and never decreases. 38 R2 and the adjusted R2 • An alternative coefficient of determination: RSS /( n k ) R 1 TSS /( n 1) 2 n 1 R 1 (1 R ) nk 2 2 where k = the number of parameters in the model including the intercept term. 39 R2 and the adjusted R2 • It is good practice to use adjusted R2 than R2 because R2 tends to give an overly optimistic picture of the fit of the regression, particularly when the number of explanatory variables is not very small compared with the number of observations. 40 The game of maximizing adjusted R2 • Sometimes researchers play the game of maximizing adjusted R2, that is, choosing the model that gives the highest adjusted R2. This may be dangerous. • In regression analysis, our objective is not to obtain a high adjusted R2 per se but rather to obtain dependable estimates if the true population regression coefficients and draw statistical inferences about them. • Researchers should be more concerned about the logical or theoretical relevance of the explanatory variables to the dependent variable and their statistical significance. 41 Comparing Coefficients of Determination R2 • It is crucial to note that in comparing two models on the basis of the coefficient of determination, whether adjusted or not • the sample size n must be the same • the dependent variable must be the same • the explanatory variables may take any form. Thus for the models lnYi = β1 + β2X2i + β3X3i + ui (7.8.6) Yi = α1 + α2X2i + α3X3i + ui (7.8.7) the computed R2 terms cannot be compared 42 Review: Partial correlation coefficients • Example: we have a regression model with three variables: Y, X2 and X3. • The coefficient of correlation r as a measure of the degree of linear association between two variables: r12 (correlation coefficient between Y and X2), r13(correlation coefficient between Y and X3) and r23 (correlation coefficient between X2 and X3) gross of simple correlation coefficients or correlation coefficients of zero order. • Does r12 in fact measure the “true” degree of (linear) association between Y and X2 when X3 may be associated with both of them? We need a correlation coefficient that is independent of the influence of X3 on X2 and Y The partial correlation coefficient. 43 Review. Partial correlation coefficients • r12,3 =partial correlation coefficient between Y and X2, holding X3 constant. • r13,2 =partial correlation coefficient between Y and X3, holding X2 constant. • r23,1 =partial correlation coefficient between X2 and X3, holding Y constant. These are called first order correlation coefficients (order= the number of secondary subscripts). r12,3 r12 r13 r23 (1 r )(1 r ) 2 13 2 23 r23,1 r13, 2 r13 r12 r23 (1 r122 )(1 r232 ) r23 r12 r13 (1 r122 )(1 r132 ) 44 Example- Partial correlation coefficients • Y= crop yield, X2= rainfall, X3= temperature. Assume r12=0, there is no association between crop yield and rainfall. Assume r13 is positive, r23 is negative r12,3 will be positive Holding temperature constant, there is a positive association between yield and rainfall. Since temperature X3 affects both yield Y and rainfall, we need to remove the influence of the nuisance variable temperature. • In Eview: quick-> group statistic->correlation 45 More on Functional Form LEC 11 The Cobb–Douglas Production Function • The Cobb–Douglas production function, in its stochastic form, may be expressed as: 3 U i 2 Yi 1 X 2i X 3i e 7.9.1 where Y = output X2 = labor input X3 = capital input u = stochastic disturbance term e = base of natural logarithm • if we log-transform this model, we obtain: ln Yi = ln β1 + β2 lnX2i + β3lnX3i + ui = β0 + β2lnX2i + β3lnX3i + ui (7.9.2) where β0 = ln β1. 46 EXAMPLE 7.3 ValueAdded, Labor Hours, and Capital Input in the Manufacturing Sector • There Nguyen Thu Hang, BMNV, FTU CS2 47 Regression • There Nguyen Thu Hang, BMNV, FTU CS2 48 . More on Functional Form The Cobb–Douglas Production Function 7.9.4 • The output elasticities of labor and capital were 0.4683 and 0.5213, respectively. • Holding the capital input constant, 1 percent increase in the labor input led on the average to about a 0.47 percent increase in the output. • Similarly, holding the labor input constant, 1 percent increase in the capital input led on the average to about a 0.52 percent increase in the output. 49 More on Functional Form Polynomial Regression Models Figure 7.1, The U-shaped marginal cost curve shows that the relationship between MC and output is nonlinear. 50 . More on Functional Form Polynomial Regression Models • Geometrically, the MC curve depicted in Figure 7.1 represents a parabola. Mathematically, the parabola is represented by the following equation: Y = β0 + β1X + β2Xi2 (7.10.1) which is called a quadratic function, • The general kth degree polynomial regression may be written as Yi = β0 + β1Xi + β2Xi2+ · · ·+βkXik + ui (7.10.3) 51 More on Functional Form Polynomial Regression Models EXAMPLE 7.4 Estimating the Total Cost Function 52