Regression Concept & Examples, Latent Variables, & Partial Least Squares (PLS) 1 Simple Regression Model • Make prediction about the starting salary of a current college graduate • Data set of starting salaries of recent college graduates Data Set Compute Average Salary How certain are of this prediction? There is variability in the data. 2 Simple Regression Model • Use total variation as an index of uncertainty about our prediction Compute Total Variation • The smaller the amount of total variation the more accurate (certain) will be our prediction. 3 Simple Regression Model • How “explain” the variability - Perhaps it depends on the student’s GPA Salary GPA 4 Simple Regression Model • Find a linear relationship between GPA and starting salary • As GPA increases/decreases starting salary increases/decreases 5 Simple Regression Model • Least Squares Method to find regression model – Choose a and b in regression model (equation) so that it minimizes the sum of the squared deviations – actual Y value minus predicted Y value (Y-hat) 6 Simple Regression Model • How good is the model? a= 4,779 & b = 5,370 A computer program computed these values u-hat is a “residual” value The sum of all u-hats is zero The sum of all u-hats squared is the total variance not explained by the model “unexplained variance” is 7,425,926 7 Simple Regression Model Total Variation = 23,000,000 8 Simple Regression Model Total Unexplained Variation = 7,425,726 9 Simple Regression Model • Relative Goodness of Fit – Summarize the improvement in prediction using regression model • Compute R2 – coefficient of determination Regression Model (equation) a better predictor than guessing the average salary The GPA is a more accurate predictor of starting salary than guessing the average R2 is the “performance measure“ for the model. Predicted Starting Salary = 4,779 + 5,370 * GPA 10 Detailed Regression Example 11 Data Set Obs # 1 2 3 4 5 6 7 8 9 10 Salary 20000 24500 23000 25000 20000 22500 27500 19000 24000 28500 GPA 2.8 3.4 3.2 3.8 3.2 3.4 4.0 2.6 3.2 3.8 Months Work 48 24 24 24 48 36 20 48 36 12 12 Scatter Plot - GPA vs Salary 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0 5000 10000 15000 20000 25000 30000 13 Scatter Plot - Work vs Salary 60 50 40 30 20 10 0 0 5000 10000 15000 20000 25000 30000 14 Pearson Correlation Coefficients -1 <= r <= 1 Salary Salary GPA Months Work Months Work GPA 1 0.898007 1 -0.93927 -0.82993 1 15 Three Regressions • • • • Salary = f(GPA) Salary = f(Work) Salary = f(GPA, Work) Interpret Excel Output 16 Interpreting Results • Regression Statistics – Multiple R, – R2, – R2adj – Standard Error Sy • Statistical Significance – t-test – p-value – F test 17 Regression Statistics Table • Multiple R – R = square root of R2 • R2 – Coefficient of Determination • R2adj – used if more than one x variable • Standard Error Sy – This is the sample estimate of the standard deviation of the error (actual – predicted) 18 ANOVA Table • Table 1 gives the F statistic • Tests the claim – there is no significant relationship between your all of your independent and dependent variables • The significance F value is a p-value • should reject the claim: – Of NO significant relationship between your independent and dependent variables if p< – Generally = 0.05 19 Regression Coefficients Table • Coefficients Column gives – b0 , b1, ,b2 , … , bn values for the regression equation. – The b0 is the intercept – b1value is next to your independent variable x1 – b2 is next to your independent variable x2. – b3 is next to your independent variable x3 20 Regression Coefficients Table • p values for individual t tests each independent variables • t test - tests the claim that there is no relationship between the independent variable (in the corresponding row) and your dependent variable. • Should reject the claim • Of NO significant relationship between your independent variable (in the corresponding row) and dependent variable if p<. 21 Salary = f(GPA) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(GPA) 0.898006642 0.806415929 0.78221792 1479.019946 10 ANOVA Regression Residual Total Intercept GPA df 1 8 9 SS 72900000 17500000 90400000 MS 72900000 2187500 F 33.32571 Significance F 0.00041792 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% 1928.571429 3748.677 0.514467 0.620833 -6715.89326 10573.04 6428.571429 1113.589 5.772843 0.000418 3860.63173 8996.511 22 Salary = f(Work) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(Work) 0.939265177 0.882219073 0.867496457 1153.657002 10 ANOVA Regression Residual Total df 1 8 9 SS 79752604.17 10647395.83 90400000 MS 79752604 1330924 F Significance F 59.92271 5.52993E-05 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 30691.66667 1010.136344 30.38369 1.49E-09 28362.28808 33021.0453 Months Work -227.864583 29.43615619 -7.74098 5.53E-05 295.7444812 -159.98469 23 Salary = f(GPA, Work) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(GPA,Work) 0.962978985 0.927328525 0.906565246 968.7621974 10 ANOVA Regression Residual Total Intercept GPA Months Work df 2 7 9 SS 83830499 6569501 90400000 MS 41915249 938500.2 Standard Coefficients Error 19135.92896 5608.184 2725.409836 1307.468 t Stat 3.412144 2.084495 -151.2124317 44.30826 -3.41274 F 44.66195 Significance F 0.00010346 P-value Lower 95% Upper 95% 0.011255 5874.682112 32397.176 0.075582 -366.2602983 5817.08 0.011246 -255.9848174 46.440046 24 Compare Three “Models” Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(GPA) 0.898006642 0.806415929 0.78221792 1479.019946 10 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(Work) 0.939265177 0.882219073 0.867496457 1153.657002 10 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations f(GPA,Work) 0.962978985 0.927328525 0.906565246 968.7621974 10 25 Latent Variables (Theoretical Entities) 26 Latent Variables • Latent Variables – Explanatory Variables that are not directly measured – Identified by “Exploratory Factor Analysis” – Confirmed by “Confirmatory Factor Analysis” • Statistical Methods for Latent Variables – Principles Components Analysis – PLS – SEM 27 Example: Confirmatory Factor Analysis Intention to Use Travelocity Website 28 Research Instrument 29 30 31 32