Regression Analysis--Prediction/Forecasting

UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Regression Analysis-- Prediction/Forecasting Forecasting Y Using the Sample Regression Equation After we conduct a regression analysis, we often want to use our regression equation to predict/forecast the value of the dependent variable (Y) for various values of the independent variables (the X’s). For example, suppose we want to investigate the relationship between dependent variable Y and a single independent variable X1. The true relationship between Y and X1 for the individuals in the population under study is: 𝑌 = 𝛽0 + 𝛽1 ∙ 𝑋1 + 𝑒 where e is a normally-distributed, random error term. Suppose further that we don’t have data on all individuals in the population; instead, we just have data on a sample of individuals. Using regression analysis, we find the following estimate of the relationship between Y and X1, based on our sample data: 𝑌̂ = 𝛽̂0 + 𝛽̂1 ∙ 𝑋1 ̂ . After we have The “hat” symbol is placed above variable Y to indicate that it is a forecast/predicted value, 𝒀 used the OLS Estimator Equations to estimate 𝛽̂0 and 𝛽̂1 , we can insert them into the sample regression equation and forecast/predict 𝑌̂ for a given value of X1. For example, suppose that 𝛽̂0 =10, 𝛽̂1 = 2, and suppose that we want to forecast/predict Y when X1 = 40. Then: 𝑌̂ = 𝛽̂0 + 𝛽̂1 ∙ 𝑋1 𝑌̂ = 10 + (2 ∙ 40) 𝑌̂ = 90 (𝑌̂ = 90 is our forecast/prediction of Y when X1 = 40) ̂ , there will be However, because there is error in our estimates of 𝛽̂0 and 𝛽̂1 , and 𝛽̂0 and 𝛽̂1 are used to forecast Y ̂. We measure the potential error in our forecast of Y ̂ by estimating the error in our forecast/prediction of Y ̂ variance, standard error (s.e.), and confidence interval for the forecast 𝑌, as shown below: ̂ (for a given value 𝑿𝟏 ) Variance of the Forecast 𝒀 𝑣𝑎𝑟(𝑌̂𝑖 ) = 𝜎𝑒2 ∙ [1 + (𝑋1 − 𝑋̅)2 1 + ] 𝑛 ∑𝑖(𝑋1𝑖 − 𝑋̅)2 Note: Because e is unknown, 𝜎𝑒2 is also unknown. So, we estimate 𝜎𝑒2 based on our sample data. First, we calculate the estimated errors, or “residuals,” ̂𝑖 – Yi for each individual in the sample, then we calculate the variance of 𝑒̂𝑖 = 𝑌 2 2 ̂2 the 𝑒̂𝑖 ’s, denoted 𝜎̂ 𝑒 , and then we substitute 𝜎𝑒 for 𝜎𝑒 in the equation above to find: (𝑋1 − 𝑋̅)2 1 2 ∙ [1 + + 𝑣𝑎𝑟(𝑌̂𝑖 ) = 𝜎̂ ] 𝑒 𝑛 ∑𝑖(𝑋1𝑖 − 𝑋̅)2 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas ̂ Standard Error of the Forecast 𝒀 Note: Don’t confuse SER with s.e.( 𝑌̂ ). SER is the variation in Y around the regression s.e.( 𝑌̂ ) = √𝑣𝑎𝑟(𝑌̂ ) line on average for all the X’s, whereas s.e.( 𝑌̂ ) is the variation in Y around the regression line for the particular X for which we are forecasting. ̂ Confidence Interval for the Forecast 𝒀 Confidence Interval for 𝑌̂ = 𝑌̂ +/- (tcritical,α/2·s.e.( 𝑌̂)) The values for tcritical are found from the t-table using α/2 (because a Confidence Interval is two-sided test) and d.f. = n – k, where n = sample size, and k = the number of β’s in the regression equation. The graph below shows the Confidence Intervals for 𝑌̂ for all values of X1. Notice that the Confidence Interval is narrowest (that is, the forecasts have less error) for the mean value of X1 (that is, near 𝑋̅1 ). For values of X1 larger or smaller than 𝑋̅1, the Confidence Intervals grow much wider, indicating more error in the forecasts of Y. ̂ Confidence Interval for the Forecast 𝒀 Upper Confidence Interval 𝑌̂ = 𝛽̂0 + 𝛽̂1 ∙ 𝑋1 Y 𝑌̂ + (tcritical,α/2·s.e.( 𝑌̂)) 𝑌̂ Lower Confidence Interval 𝑌̂ - (tcritical,α/2·s.e.( 𝑌̂)) 𝑋̅1 X1 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Prediction and Confidence Intervals in SAS In SAS, PROC REG can be used to calculate the predicted values, 𝑌̂, and the confidence intervals for the regression line/curve. Within PROC REG, the “output” statement is used to create a new dataset, and the predicted values and confidence interval values are placed in the new dataset. In the PROC REG statement below, the model command produces a regression of variable y on variable x1, and then and “output” command is used to create a new dataset03. When dataset03 is created, the dataset on which the regression was based (dataset02 in the example below) is automatically copied into dataset03. In addition, “p=yhat” creates a new variable called “yhat” and sets it equal to the predicted values, the 𝑌̂’s, from the regression. This new yhat variable is added to dataset03. Commands “lclm=lower_ci” and “uclm=upper_ci” create new variables “lower_ci” and “upper_ci” and add them to dataset03. These variables contain the upper and lower confidence interval numbers for each point along the regression line. proc reg data=dataset02; model Y =X1; output out=dataset03 p=yhat lclm=lower_ci uclm=upper_ci ; run; Now we can use the newly-created dataset03 together with PROC GPLOT to make a graph of our data points, the regression line/curve, and the 95% Confidence Intervals. The SAS commands are show below, along with an illustration of the resulting graph. proc gplot data=dataset03; plot Y*X1 yhat*X1 upper_ci*X1 lower_ci*X1 / overlay; run; Data Points, Regression Line, and Confidence Intervals Y Upper Confidence Interval Regression Line Lower Confidence Interval X1 3

Regression Analysis--Prediction/Forecasting

Related documents

Products

Support

Regression Analysis--Prediction/Forecasting

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib