Summary: Basic Concepts of Linear Regression Analysis (one independent variable) Regression analysis is a statistical technique for modeling and investigating the relationship between 2 or more variables. For an established relationship, it is used for prediction of the dependent variable for a given independent variable. Model for two variables: (y/x) = 0 +1X + This notation means is a random variable - called random error. (It is the vertical deviation of a response from the fitted line.) For any value of Xi, is normally distributed. For any value of Xi, the mean and variance are =E()=0 and Var()=2. Random errors are mutually independent. E(Y/X) = E(0 +1X +) Y = 0 +1X+ 0, 1, and unknown model parameters ^ Y =b0 + b1X 1. 2. where b0 and b1 estimate 0 and1 in the model Construct a scatter plot of the sample data and consider the relationship. Is it linear? Quadratic? Other? Check the assumptions of the model. 1) There is no error in the X-values. They are set prior to the experiment. The X variable is not random. This is established by the experimental design. 2) At each X, there is a normal distribution of Y-values that are independent of each other. The variance 2 of the normal distribution at each X-value is the same. (sLF estimates the standard deviation.) This assumption is referred to as homoscedasticity. To evaluate check for normal (random) scatter – plot the residuals vs. X (or plot residuals vs. the fits). The plot should reflect a random scatter of points about 0 on the vertical axis 3) The random error terms e are independent and, for any value of X, have a normal distribution with mean 0 and variance 2. To check, examine a normal probability plot. ^ 3. Write the equation for Y using the least square method. 4. Examine R2 and sLF. What do they tell you about the relationship? R2 is the coefficient of determination. It is the percent of raw variation in Y accounted for by using the fitted equation. 6360reg 1 sLF estimates the common standard deviation in Y for a fixed X. sLF 2 = 1 ^ n2 5. (Y Y ) 2 = 1 e2 n2 Test the slope of the line to see if there is a significant relationship between the two variables. Test the following hypothesis. Ho: 1 = 0 Ha: 1 0 b1 has a normal distribution with b1 = 1 and 2 Var (b1) = (df = n-2) ( X X ) Use t = 2 b1 1 s LF ( X X ) 2 Failure to reject H0 means no linear relationship between X and Y. 6. Establish a confidence interval estimate of . s LF b1 + t * ( X X ) 2 7. Establish a confidence interval estimate of a predicted Y value. A confidence interval estimate is an estimate of the mean Y for a fixed value of X. (df = n-2) ^ y ts LF 8. 1 (X X )2 n ( X X ) 2 Construct a prediction interval for Y. A prediction interval predicts an additional single observation of Y for a particular (fixed) value of X. (df = n-2) ^ y ts LF 1 1 (X X )2 n ( X X ) 2 Predict values of Y for the given X. Be careful not to extrapolate too much from the given data. 6360reg 2 ANOVA Approach The ANOVA approach is an additional method that is used to examine the model. SST (Y Y ) SSE (Y Y ) SSR (Y Y ) ^ ^ 2 Sum Squares Total - Variation of the values around their mean 2 Sum Squares Error – Residuals – Unexplained Variation – Variation of the value from the predicted value (for a fixed X) – Random variation, variation that can be attributed to factors other than the relationship 2 Sum Squares Regression – Explained Variation – Variation of the predicted values from the mean – Variation than can be attributed to the relationship between X and Y SST = SSR + SSE R2 = F= SSR SST Mean Square Re gression Explained Variation MSR Mean Square Error Un exp lained Variation MSE F is the ratio of explained variation to unexplained variation. If more variation is explained, F>1. Use the F table to check significance. 6360reg 3 The F test is used to test the following hypothesis. Ho: 1 = 0 Ha: 1 0 F= SSR / 1 s 2 LFs = MSR SSR / 1 = MSE SSE /( n 2) The information is generally organized in a table as follows. Souace Regression Error Total SS SSR SSE SST df 1 n-2 n-1 MS SSR/1 SSE/(n-2) F MSR/MSE See an Excel Example: http://www.uh.edu/~tech132/6303lst.xls 6360reg 4 (X,Y) ^ ^ Y Y =SSE (X, Y ) Y Y Y SST ^ Y Y = SSR Y Xi 6360reg 5