Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order A quadratic second order model E(Y)=β0+ β1x+ β2 x2 • Interpretation of model parameters: • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 • β1 : is the shift parameter; • β2 : is the rate of curvature; Example with quadratic terms The true model, supposedly unknown, is Yi 100 = .00 2 + xi2 + εi, with εi~N(0,2) 75. 00 y 50. 00 25. 00 0. 00 2. 00 Data: (x,y). See SQM.sav 4. 00 6. 00 x 8. 00 10. 00 Model 1: E(Y) = β0 + β1x Model Summary Model 1 R R Square a ,973 Adjusted R Square ,947 ,947 a. Predictors: (Constant), x Model 1 df Mean Square 80624,915 1 80624,915 4500,202 103 43,691 85125,117 104 Residual Total a. Predictors: (Constant), x Coefficients b. Dependent Variable: y Unstandardized Coefficients Model 1 B (Constant) x a. Dependent Variable: y 6,60994 ANOVA b Sum of Squares Regression Std. Error of the Estimate F Sig. ,000a 1845,332 a Standardized Coefficients Std. Error -19,959 1,483 10,744 ,250 Beta t ,973 Sig. -13,454 ,000 42,957 ,000 Linear Regression 100 .00 y = -19.96 + 10.74 * x R-Square = 0.95 75. 00 y 50. 00 25. 00 0. 00 2. 00 4. 00 6. 00 x 8. 00 10. 00 Model 2: E(Y) = β0 + β1x2 Model Summary Model 1 R Adjusted R Square R Square ,996a ,991 ,991 a. Predictors: (Constant), XSquare Model 1 Residual Total Smaller variance and SE 2,68707 ANOVA b Sum of Squares Regression Std. Error of the Estimate df Mean Square 84381,422 1 84381,422 743,695 103 7,220 85125,117 104 F 11686,632 Sig. ,000a a. Predictors: (Constant), XSquare b. Dependent Variable: y Coefficients Unstandardized Coefficients Model 1 B (Constant) XSquare a. Dependent Variable: y a Standardized Coefficients Std. Error 2,340 ,417 ,997 ,009 Beta t ,996 Sig. 5,608 ,000 108,105 ,000 Linear Regression 100 .00 y = 2.34 + 1.00 * XSquar e R-Square = 0.99 75. 00 y 50. 00 25. 00 0. 00 0. 00 25. 00 50. 00 XSquare 75. 00 100 .00 Model 3: E(Y) = β0 + β1x + β2x2 Model Summary Model 1 R R Square a .996 .991 a. Predictors: (Constant), XSquare, x Model 1 Sum of Squares Regression Std. Error of the Estimate .991 2.66608 ANOVA b df Mean Square F 84400.103 2 42200.052 725.014 102 7.108 85125.117 104 Residual Total Adjusted R Square Sig. .000a 5936.999 a. Predictors: (Constant), XSquare, x b. Dependent Variable: y Coefficients Unstandardized Coefficients Model 1 Standardized Coefficients B 4.177 Std. Error 1.206 x -.830 .512 XSquare 1.071 .046 (Constant) a. Dependent Variable: y a Beta t 3.463 Sig. .001 -.075 -1.621 .108 1.069 23.046 .000 Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order A third order model with 1 IV E(Y)=β0+ β1x+ β2 x2+ β3 x3 Use with caution given numerical problems that could arise Y >0 3 Y X1 <0 3 X1 Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order First-Order model in k Quantitative variables E(Y)=β0+β1x1+β2 x2 + ... + βk xk Interpretation of model parameters: β0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0 β1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk are held fixed; β2: change in E(Y) for a 1-unit increase in x2 when x1, x3,..., xk are held fixed; ... A bivariate model E(Y)=β0+β1x1+β2 x2 Changing x2 changes only the y-intercept. In the first order model a 1-unit change in one independent variable will have the same effect on the mean value of y regardless of the other independent variables. A bivariate model Y Response Plane X1 Yi = 0 + 1X1i + 2X2i + i (Observed Y) 0 i X2 (X1i,X2i) E(Y) = 0 + 1X1i + 2X2i Example: executive salaries • • • • • • Y = Annual salary (in dollars) x1 = Years of experience x2 = Years of education x3 = Gender : 1 if male; 0 if female x4 = Number of employees supervised x5 = Corporate assets (in millions of dollars) E(Y)=β0+ β1x1+ β2 x2 + β4 x4 + β5 x5 Data: ExecSal.sav Do not consider x3 (Gender) for the moment Exsecutive salaries: Computer Output Riepilogo del modello Modello R-quadrato R R-quadrato corretto ,870a ,757 ,747 Deviazione standard Errore della stima 12685,309 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised Simple regression Multiple regression Riepilogo del modello Modello R R-quadrato R-quadrato corretto Deviazione standard Errore della stima 1 dimension0 ,783a ,613 . Predittori: (Costante), Years of Experience a ,609 15760,006 Coefficient of determination The coefficient R2 is computed exactly as in the simple regression case. R 2 Explained SST (Total) SSR SST n n ( yi y ) i 1 Total variation n variation 2 2 ( yˆ i y ) i 1 SSR (Regression) 1 SSE SST ( y i yˆ i ) 2 i 1 SSE (Error) A drawback of R2: it increases with the number of added variables, even if these are NOT relevant to the problem. Adjusted R2 and estimate of the variance σ2 A solution: Adjusted R2 – Each additional variable reduces adjusted R2, unless SSE varies enough to compensate 2 Ra n 1 SSE SSE 2 1 1 R n k 1 SST SST An unbiased estimator of the variance σ 2 is computed as s 2 SSE n k 1 2 ˆi n k 1 Exsecutive salaries: Computer Output (2) Coefficientia Model Coefficienti non standardizzati Variables 1 B (Costante) Years of Experience Years of Education Number of Employees supervised Corporate assets (in million $) Deviazione standard Errore Coefficienti standardizz ati Beta T-tests t Sig. -37082,148 17052,089 -2,175 ,032 2696,360 173,647 ,785 15,528 ,000 2656,017 563,476 ,243 4,714 ,000 41,092 7,807 ,272 5,264 ,000 244,569 83,420 ,149 2,932 ,004 Variabile dipendente: Annual salary in $ Testing overall significance: the F-test • 1. Shows If There Is a Linear Relationship Between All X Variables Together & Y • 2. Uses F Test Statistic • 3. Hypotheses – H0: 1 = 2 = ... = k = 0 • No Linear Relationship – Ha: At Least One Coefficient Is Not 0 • At Least One X Variable Affects Y The F-test for 1 single coefficient is equivalent to the t-test Anova table F-statistic Anovab Modello 1 Somma dei quadrati Media dei quadrati df Regressione 4,766E10 Residuo 1,529E10 95 Totale 6,295E10 99 F 4 1,192E10 74,045 Sig. ,000a 1,609E8 . Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised a df = k: number of b. Variabile dipendente: Annual salary in $ regression slopes p-vale of F-test df = n-1: n= number of observations MSE (mean square error), the estimate of variance Decision: reject H0, i.e. accept this model Interaction (second order) model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 • Interpretation of model parameters: • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 • β1+ β3 x2 : change in E(Y) for a 1-unit increase in x1 when x2 is held fixed; • β2 + β3 x1 : change in E(Y) for a 1-unit increase in x2 when x1 is held fixed; • β3: controls the rate of change of the surface. Interaction (second order) model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 Contour lines are not parallel The effect of one variable depends on the level of the other Example: Antique grandfather clocks auction Clocks are sold at an auction on competitive offers. Data are: – Y : auction price in dollars – X1: age of clocks – X2: number of bidders Model 1: E(Y) = β0 + β1x1 + β2x2 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Data: GFCLOCKS.sav Data summaries Descriptive Statistics Age 32 108 194 144.94 Std. Deviatio Statistic n 27.395 Bidders 32 5 15 9.53 2.840 .420 .414 -.788 .809 Price 32 729 2131 1326.88 393.487 .396 .414 -.727 .809 Valid N (lis twise) 32 N Statistic Minimu m Statistic Maximu m Statistic Mean Statistic Skewnes s Statistic If data are Normal Skewness is 0 If data are Normal (eccess) Kurtosis is 0 Note: Skewness and Kurtosis are not enough to establish Normality Kurtosis Std. Error Statistic Std. Error .216 .414 -1.323 .809 P-P plot for Normality If data are Normal. Points should be along the straight line. In this example the situation is fairly good Bivariate scatter-plots 2000 2000 P r ice 1200 1200 800 1600 P rice 1600 800 120 140 160 Age 180 6 8 10 Bidders 12 14 Model 1: E(Y) = β0 + β1x1 + β2x2 Model Summary Model 1 R Adjusted R Square R Square a .945 .892 .885 a. Predictors: (Constant), Bidders, Age Model 1 Std. Error of the Estimate ANOVA b Sum of Squares Regression df Mean Square 4283062.960 2 2141531.480 516726.540 29 17818.157 4799789.500 31 Residual Total 133.485 a. Predictors: (Constant), Bidders, Age Coefficients b. Dependent Variable: Price Unstandardized Coefficients Model 1 Std. Error 173.809 Age 12.741 .905 Bidders 85.953 8.729 a. Dependent Variable: Price 120.188 Sig. .000a a Standardized Coefficients B -1338.951 (Constant) F Beta t -7.704 Sig. .000 .887 14.082 .000 .620 9.847 .000 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Model Summary Model 1 R R Square a .977 Adjusted R Square .954 Std. Error of the Estimate .949 88.915 a. Predictors: (Constant), AgeBid, Age, Bidders ANOVA b Model 1 Sum of Squares Regression Residual Total df Mean Square 4578427.367 3 1526142.456 221362.133 28 7905.790 4799789.500 31 F Sig. 193.041 .000a a. Predictors: (Constant), AgeBid, Age, Bidders Coefficients a b. Dependent Variable: Price Unstandardized Coefficients Model 1 B (Constant) Standardized Coefficients Std. Error 320.458 295.141 .878 2.032 Bidders -93.265 AgeBid 1.298 Age a. Dependent Variable: Price Beta t Sig. 1.086 .287 .061 .432 .669 29.892 -.673 -3.120 .004 .212 1.369 6.112 .000 Interpreting interaction models The coefficient for the interaction term is significant. If an interaction term is present then also the corresponding first order terms need to be included to correctly interpret the model. In the example an uncareful analyst could estimate the effect of Bidders as negative, since b2=-93.26 Since an interaction term is present, the slope estimate for Bidders (x2) is b2 + b3x1 Note: b = ^β For x1= 150 (age) the estimated slope for Bidders is -93.26 + 1.3 (150) = 101.74 Models with qualitative X’s Regression models can also include qualitative (or categorical) independent variables (QIV). The categories of a QIV are called levels Since the levels of a QIV are not measured on a natural numerical scale in order to avoid introducing fictitious linear relations in the model we need to use a specific type of coding. Coding is done by using IV which assume only two values: 0 or 1. These coded IV are called dummy variables Models with QIV • Suppose we want to model Income (Y) as a function of Sex (x) -> use coded, or dummy, variables x = 1 if Male, x = 0 if Female E(Y) = β0+ β1x E(Y) = β0+ β1 if x =1, i.e. Male E(Y) = β0 if x =0, i.e. Female β0 is the base level, i.e Female is the reference category β1 is the additional effect if Male In this simple model, only the means for the two groups are modeled QIV with q levels As a general rule, if a QIV has q levels we need q-1 dummies for coding. The uncoded level is the reference one. Example: a QIV has three levels, A, B and C Define x1 = 1 level A, x1 = 0 if not x2 = 1 level B, x2 = 0 if not Model: E(Y) = β0+ β1x1 + β2x2 C is the reference level Interpreting β’s β0 = μC (mean for base level C) β1 = μA - μC (additional effect wrt C if level A) β2 = μB - μC (additional effect wrt C if level B) Models with dummies Even if models which consider only dummy variables do in practice estimate the means of various groups, the testing machinery of the regression setup can be useful for group comparisons. Dummies can be used in combination with any other dummies and quantitative X’s to construct models with first order effects (or main effects) and interactions to test hypotheses of interest. In order to define dummies in SPSS see “Computing dummy vars in SPSS.ppt” Example: executive salaries A managing consulting firms has developed a regression model in order to analyze executive’s salary structure • • • • • • Y x1 x2 x3 x4 x5 = Annual salary (in dollars) = Years of experience = Years of education = Gender : 1 if male; 0 if female = Number of employees supervised = Corporate assets (in millions of dollars) Data: ExecSal.sav A simple model: E(Y) = β0 + β3x3 Male group Female group This model estimates the means of the two groups (M,F) We wanto to test if the difference in means is significant, i.e. not due to chance Regression Output Model Summary Model 1 R Adjusted R Square R Square .392a .153 .145 a. Predictors: (Constant), Gender Model 1 23320.282 ANOVA b Sum of Squares Regression Salary difference between groups is significant Std. Error of the Estimate df Mean Square 9651865066.845 1 9651865066.845 Residual 53295882433.156 98 543835535.032 Total 62947747500.001 99 a. Predictors: (Constant), Gender b. Dependent Variable: Annual s alary in $ Unstandardized Coefficients Model 1 B Std. Error (Constant) 83847.059 3999.395 Gender 20739.305 4922.915 Coefficients F 17.748 Sig. .000a a Standardized Coefficients 95% Confidence Interval for B Beta t .392 Sig. Lower Bound Upper Bound 20.965 .000 75910.389 91783.729 4.213 .000 10969.940 30508.670 a. Dependent Variable: Annual s alary in $ Mean increment for Male C.I. for mean increment Model 2: E(Y) = β0 + β1x1 + β3x3 It seems that the two groups are separated Model 2 considers same slope but different intercepts If x3 = 0 (female) then E(Y) = β0 + β1x1 If x3 = 1 (male) then E(Y) = β0 + β3 + β1x1 Computer output for model 2 R square improved greatly Model Summary Model 1 R Adjusted R Square R Square a .860 .740 Std. Error of the Estimate .735 12981.615 a. Predictors: (Constant), Years of Experience, Gender ANOVA b Model 1 Sum of Squares df Mean Square Regression 46601081714.527 2 23300540857.264 Residual 16346665785.474 97 168522327.685 Total 62947747500.001 99 a. Predictors: (Constant), Years of Experience, Gender Coefficients F Sig. 138.264 .000a a b. Dependent Variable: Annual s alary in $ Unstandardized Coefficients Model 1 (Constant) B 50614.312 Std. Error 3161.279 Gender 18894.215 2743.253 2633.831 177.875 Years of Experience Standardized Coefficients Beta 95% Confidence Interval for B t 16.011 Sig. .000 Lower Bound 44340.048 Upper Bound 56888.576 .357 6.888 .000 13449.618 24338.812 .767 14.807 .000 2280.799 2986.863 a. Dependent Variable: Annual s alary in $ New intercept for Male is significant In this model effect of experience is assumed equal for the two groups Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3 With this model we want to test whether gender and experience interacts, i.e. if male salary tend to grow at a faster (slower) rate with experience. If x3 = 0 (female) then E(Y) = β0 + β1x1 If x3 = 1 (male) then E(Y) = (β0 + β3) + (β1 + β4)x1 New intercept for male New slope for male Remark: running regression for the two groups together allows to have higher degrees of freedom (n) for estimating parameters and model variance. Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3 Model 3 considers different slope and different intercepts Computer output for model 3 There is evidence that salaries for the two groups grow at different rate with experience Model Summary Model 1 R Adjusted R Square R Square .868a .754 .746 a. Predictors: (Constant), ExpGender, Years of Experience, Gender Unstandardized Coefficients Model 1 B (Constant) Std. Error 58049.768 4461.179 Gender 7798.504 5497.470 Years of Experience 2044.541 864.122 ExpGender Std. Error of the Estimate 12700.080 Coefficients a Standardized Coefficients Beta 95% Confidence Interval for B t Sig. Lower Bound Upper Bound 13.012 .000 49194.397 66905.139 .147 1.419 .159 -3113.888 18710.896 308.565 .595 6.626 .000 1432.045 2657.036 373.653 .301 2.313 .023 122.426 1605.818 a. Dependent Variable: Annual s alary in $ Estimated lines: ^ = 58049.8 + 2044.5*(Years of Experience) for female Y ^ = 65848.3 + 2908.7*(Years of Experience) for male Y A complete second order model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22 • Interpretation of model parameters: • • • • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 β1 and β2 : shifts along the x1 and x2 axes; β3 : rotation of the surface; β4 and β5 : controls the rate of curvature. Back to Executive salaries What about if suspect that rate of growth changes and has opposite signs for M and F? x1 = Years of experience x3 = Gender (1 if Male) Note: x32 = x3 since it is a dummy E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12 E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12+ β5 x3x12 Model 4 Model 5 Comparing Model 4 and 5 Model 4 If x3 = 0 (female) then E(Y) = β0 + β1x1 + β4x12 If x3 = 1 (male) then E(Y) = (β0 + β2) + (β1 + β3)x1 + β4x12 Model 5 Different intercept and slope for M and F but same curvature If x3 = 0 (female) then E(Y) = β0 + β1x1 + β4x12 If x3 = 1 (male) then E(Y) = (β0 + β2) + (β1 + β3)x1 + (β4+β5)x12 Different intercept, slope and curvature for M and F Model 5: computer output Riepilogo del modello Modello R dimension0 ,875a 1 R-quadrato corretto R-quadrato ,766 Deviazione standard Errore della stima ,754 12507,735 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen Anovab Modello 1 Somma dei quadrati Media dei quadrati df Regressione 4,824E10 5 Residuo 1,471E10 94 Totale 6,295E10 99 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Variabile dipendente: Annual salary in $ F 9,648E9 61,673 1,564E8 Sig. ,000a Model 5: computer output Coefficientia Modello 1 Coefficienti non standardizzati B Deviazion e standard Errore Beta t Sig. (Costante) 52391,973 6497,971 8,063 ,000 Years of Experience Gender ExpGen ExpSqu Exp2Gen 3373,970 1165,248 ,982 2,895 ,005 21122,152 8285,802 -2081,897 1459,842 -53,181 45,001 112,836 54,950 ,399 -,724 -,422 ,904 2,549 -1,426 -1,182 2,053 a. Variabile dipendente: Annual salary in $ Which model is preferable? Model 3 or model 5? ,012 ,157 ,240 ,043 A test for comparing nested models Two models are nested if one model contains all the terms of the other model and at least one additional term. The more complex of the two models is called the complete (or full) model. The other is called the reduced (or restricted) model. Example: model 1 is nested in model 2 Model 1: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 Model 2: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22 To compare the two models we are interested in testing H0: β4 = β5 = 0, vs. H1: at least one, β4 or β5, differs from 0 F-test for comparing nested models Reduced model: E(Y) = β0+ β1x1+ … + β2 xg Complete Model: E(Y) = β0+ β1x1+ … + β2 xg + βg+1 xg+1 + … + βkxk To test H0: βg+1 = … = βk = 0 H1: at least one of the parameters being tested is not 0 Compute F ( SSE R SSE C ) /( k g ) MSE C Reject H0 when F > Fα, where Fα is the level α critical point of an F distribution with (k-g, n-(k+1)) d.f. F-test for nested models Where: SSER = Sum of squared errors for the reduced model; SSEC = Sum of squared errors for the complete model; MSEC = Mean square error for the complete model; Remark: k – g = number of parameters tested k +1 = number of parameters in the complete model n = total sample size Compute partial F-tests with SPSS 1. Enter your complete model in the Regression dialog box – choose the Method “Enter” 2. Click on “Next” 3. In the new box for Independent variables, enter those you want to remove (i.e. those you’d like to test) – choose the Method “Remove” 4. In the “Statistics” option select “R squared change” 5. Ok. Applying the F-test Let us use the F-test to compare Model 3 and Model 5 in the executive salaries example. Note that Model 3 is nested in Model 5 Model 3: E(Y) = β0 + β1x1 + β2x3 + β3x1x3 Model 5: E(Y) = β0 + β1x1 + β2x3 + β3x1x3 + β4x12 + β5x3x12 Apply the F-test for H0: β4 = β5 = 0 Computer output Variabili inserite/rimossec Modello Variabili Variabili inserite rimosse Metodo 1 . Exp2Gen, Per blocchi Gender, Years of Experience, ExpSqu, ExpGena 2 .a Exp2Gen, Rimuovi ExpSqub a. Tutte le variabili richieste sono state immesse. Do NOT reject H0: β4 = β5 = 0, i.e. Model 3 is better F-statistic F p-value b. Tutte le variabili richieste sono state rimosse. c. Variabile dipendente: Annual salary in $ Riepilogo del modello Variazione dell'adattamento Model R RDeviazione R- quadrat standard Variazione quadr di RVariazio o Errore della stima ato corretto quadrato ne di F df1 1 ,875° ,766 ,754 12507,735 2 ,868b ,754 ,746 12700,080 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Predittori: (Costante), Gender, Years of Experience, ExpGen ,766 61,673 -,012 2,488 df2 Sig. Variazio ne di F 5 94 ,000 2 94 ,089 A quadratic model example: Shipping costs Although a regional delivery service bases the charge for shipping a package on the package weight and distance shipped, its profit per package depends on the package size (volume of space it occupies) and the size and nature of the delivery truck. The company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. – Y : cost of shipment in dollars – X1: package weight in pounds – X2: distance shipped in miles It is suspected that non linear effect may be present Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Data: Express.sav Scatter plots 16. 0 16. 0 12. 0 12. 0 C o st o f sh ip m en t C o st o f sh ip m en t 8. 0 0. 00 2. 00 8. 0 4. 0 4. 0 4. 00 6. 00 Weight of parcel in lbs. 8. 00 50 100 150 200 250 Distance shipped Scatter plots in multiple regression often do not show too much information Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Model Summary Model 1 R R Square a .997 Adjusted R Square .994 Std. Error of the Estimate .992 .4428 a. Predictors: (Constant), Weight*Dis tance, Distance b squared, Weight s quared, Weight of parcel in lbs.,ANOVA Distance shipped Model 1 Sum of Squares Regression Mean Square 449.341 5 89.868 2.745 14 .196 452.086 19 Residual Total df F Sig. .000a 458.388 a. Predictors: (Constant), Weight*Dis tance, Distance squared, Weight squared, Coefficients a Weight of parcel in lbs., Distance shipped b. Dependent Variable: Cost of s hipment Model 1 Unstandardized Coefficients B Standardized Coefficients Std. Error (Constant) .827 .702 Weight of parcel in lbs. -.609 .180 Dis tance s hipped .004 Weight squared Dis tance s quared Beta t Sig. 1.178 .259 -.316 -3.386 .004 .008 .062 .503 .623 .090 .020 .382 4.442 .001 1.51E-005 .000 .075 .672 .513 .007 .001 .850 11.495 .000 Weight*Distance a. Dependent Variable: Cost of shipment Not significant, try to eliminate Distance squared Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 Model Summary Model 1 R R Square .997 a Adjusted R Square .994 Std. Error of the Estimate .992 .4346 a. Predictors: (Constant), Weight*Dis tance, Distance b shipped, Weight squared, Weight of parcel in ANOVA lbs. Model 1 Sum of Squares Regression Residual Total df Mean Square 449.252 4 112.313 2.833 15 .189 452.086 19 F Sig. .000a 594.623 a. Predictors: (Constant), Weight*Dis tance, Distance shipped, Weight squared, Coefficients a Weight of parcel in lbs. b. Dependent Variable: Cost of s hipment Unstandardized Standardized Coefficients Coefficients Model 1 B Std. Error (Constant) .475 .458 Weight of parcel in lbs. -.578 .171 Dis tance s hipped .009 Weight squared Weight*Distance a. Dependent Variable: Cost of shipment Beta t Sig. 1.035 .317 -.300 -3.387 .004 .003 .141 3.421 .004 .087 .019 .369 4.485 .000 .007 .001 .842 11.753 .000 Applying the F-test: Shipping costs A company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. – Y : cost of shipment in dollars – X1: package weight in pounds – X2: distance shipped in miles It is suspected that non linear effect may be present, use the F-test for nested models to decide between Model 1: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Data: Express.sav ANOVA Tables Full model ANOVA b Model 1 Sum of Squares Regression Residual Total df Mean Square 449.341 5 89.868 2.745 14 .196 452.086 19 F Sig. .000a 458.388 a. Predictors: (Constant), Weight*Dis tance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shipped b. Dependent Variable: Cost of s hipment Reduced model ANOVA b Model 1 Sum of Squares Regression Residual Total df Mean Square 445.452 3 148.484 6.633 16 .415 452.086 19 F 358.154 a. Predictors: (Constant), Distance shipped, Weight of parcel in lbs., Weight*Dis tance b. Dependent Variable: Cost of s hipment Sig. .000a F-statistic To test H0: β4 = β5 = 0, from the ANOVA tables we have F ( SSE R SSE C ) / 2 MSE C ( 6 . 633 2 . 745 ) / 2 9 . 92 0 . 196 The critical value Fα (at 5% level) for and F-distribution with 2 and 14 d.f. is 3.74 Since F (9.92) > Fα (3.74) the null hypothesis is rejected at the 5% significance level. I.e. the model with quadratic terms is preferred over the reduced one. Computer output Variables Entered/Removed Model 1 Variables Entered Variables Removed Weight* Dis tance, Dis tance squared, Weight squared, Weight of parcel in lbs., Dis tance a shipped Method . 2 a Enter F-statistic Dis tance squared, Weight squared . c Remove b F p-value a. All requested variables entered. b. All requested variables removed. Model Summary c. Dependent Variable: Cost of shipment Change Statistics Model 1 2 R R Square Adjusted R Square Std. Error of the Es timate R Square Change F Change df1 df2 Sig. F Change a .994 .992 .4428 .994 458.388 5 14 .000 b .985 .983 .6439 -.009 9.917 2 14 .002 .997 .993 a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shipped b. Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Distance shipped Reject H0: β4 = β5 = 0 Executive salaries: a final model (?) • • • • • • Y x1 x2 x3 x4 x5 = Annual salary (in dollars) = Years of experience = Years of education = Gender : 1 if male; 0 if female = Number of employees supervised = Corporate assets (in millions of dollars) Try adding other variables to model 3 E(Y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x3 + β5x4 + β6x5 Model 6 Computer Output: Model 6 Riepilogo del modello Modello R 1 R-quadrato ,963a R-quadrato corretto ,927 ,922 Errore della stima 7020,089 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender Anovab Model 1 Somma dei quadrati Regressione Residuo Totale Media dei quadrati df 5,836E10 6 4,583E9 93 6,295E10 99 F Sig. 9,727E9 197,384 ,000a 4,928E7 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender Computer Output: Model 6 Coefficients Model Coefficienti non standardizzati 1 B (Costante) Years of Experience Gender ExpGender Years of Education Number of Employees supervised Corporate assets (in million $) a. Variabile dipendente: Annual salary in $ Deviazion e standard Errore Coefficient i standardiz zati Beta -38331,331 9533,238 2178,964 171,979 13203,101 3137,775 669,546 2689,594 53,239 180,310 209,042 311,914 4,470 46,600 ,634 ,249 ,233 ,246 ,353 ,110 t Sig. -4,021 ,000 12,670 ,000 4,208 ,000 3,203 ,002 8,623 ,000 11,910 ,000 3,869 ,000 Executive salaries: comparison of models Mod. Predictors Adj. R2 1 x1, x2, x4, x5 Standard error 0.747 12685.31 2 x1, x3 0.735 12981.62 138.26 3 x1, x3, x1∙x3 0.746 12700.08 98.09 6 x1, x3, x1∙x3, x4, x5 0.922 7020.09 F-stat 74.05 197.38