AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory 1 Planted Acres 16 Population Line: E[Y] = B0+B1X Planted Acres 15 Yi = E[Yi]+ui 14 ui 13 12 11 E[Yi] = B0+B1Xi 10 9 50 55 Xi 60 65 70 Previous Year Price 75 2 80 Planted Acres 16 Population Line: E[Y] = B0+B1X Planted Acres 15 14 ^ +e Yi = Y i i Estimated Line: ^=B ^ +B ^ X Y 0 1 13 12 ui ei 11 10E[Y ] i ^Y = B^ +B^ X i 0 1 i 9 50 55 Xi 60 65 70 Previous Year Price 75 3 80 Planted Acres 16 ^=B ^ +B ^ X Y 0 1 Planted Acres 15 ei 14 ei 13 ei ei 12 ei 11 ei 10 9 50 55 Xi 60 65 70 Previous Year Price 75 4 80 The Ordinary Least Squares (OLS) Method • In the Ordinary Least Squares (OLS) method, the criterion for estimating β0 and β1 is to make the sum of the squared residuals (SSR) of the fitted regression line as small as possible i.e.: n Minimize SSR = minimize ei 2 i 1 = minimize Y i = minimize ˆ Y i 2 ˆ B ˆ X Yi B 0 1 i 2 5 The Ordinary Least Squares (OLS) Method • The OLS estimator (formulas) are: X iYi X i Yi n Bˆ1 2 2 n X i X i X X Y Y X X i i 2 i X X Y Bˆ X X (5.12) Bˆ 0 Y Bˆ1 X (5.13) i i 2 1 i 6 The Ordinary Least Squares (OLS) Method • The regression line estimated using the OLS method has the following key properties: 1. e Y Bˆ i i 0 ˆ X 0 B 1 i (i.e. the sum of its residuals is zero) 2. 3. It always passes through the point Y, X The residual values (ei’s) are not correlated with the values of the independent variable (Xi’s) 7 Interpretation of the Regression Model • Assume, for example, that the estimated or fitted regression equation is: ˆ Y i 3.7 0.15Xi or Yi = 3.7 + 0.15Xi + ei 8 9 Interpretation of the Regression Model Yi = 3.7 + 0.15Xi + ei ˆ = 0.15 indicates that if the • The value of B 1 cotton price received by farmers this year increases by 1 cent/pound (i.e. X=1), then this year’s cotton acreage is predicted to increase by 0.15 million acres (150,000 acres). 10 Interpretation of the Regression Model Yi = 3.7 + 0.15Xi + ei • The value of Bˆ 0 = 3.7 indicates that if the average cotton price received by farmers was ˆ =0), the cotton acreage planted this zero (i.e. X year will be 3.7 million (3,700,000) acres; sometimes the intercept makes no practical sense. 11 Measures of Goodness of Fit • There are two statistics (formulas) that quantify how well the estimated regression line fits the data: 1. The standard error of the regression (SER) (Sometimes called the standard error of the estimate) 2. R2 - coefficient of determination 12 Measures of Goodness of Fit • The SER slightly differs from the standard deviation of the ei’s (by the degrees of freedom): n S 2 e e i i 1 n 1 n SER ei 2 i 1 n2 (5.20) 13 Measures of Goodness of Fit: The R2 2 ei 2 i 1 R 1 n 2 i1 Yi Y n ei2 n i 1 2 i1 Yi Y n The term on the left measures the proportion of the total variation in Y not explained by the model (i.e. by X) • Thus, the R2 measures the proportion of the total variation in Y that is explained by the model (i.e. X) 14 Properties of the OLS Estimators • The Gauss-Markov Theorem states the properties of the OLS estimators; i.e. of the: ˆ and B ˆ B 0 1 They are unbiased : E[B0 ]= Bˆ 0 ˆ E[B1]= B 1 and 15 Properties of the OLS Estimators And if the dependent variable Y (and thus the error term of the population regression model, ui) has a normal distribution, the OLS estimators have the minimum variance 16 Properties of the OLS Estimators • BLUE – Best Linear Unbiased Estimator • Unbiased ^ ^ => bias of βj = E(βj ) - βj = 0 • Best Unbiased => minimum variance & unbiased • Linear => the estimator is linear 17