Simple linear regression Correlation coefficient Covariance Sxy

Simple linear regression Yi = β0 + β1 xi + εi ; i = 1, . . . , n E(Y |X) = β0 + β1 x • Systematic components: β0 + β1 xi • Stochastic component : error term ε the central parameter is the slope parameter β1 • If X rises by an unit, than Y rises by β1 units on average Interpretation of the parameter σ: Standard deviation of the error term. Measure of the averaged deviation of the Y-values from the regression line R2 : Coefficient of determination SSM SSE =1− SST SST R2 = Correlation coefficient Covariance n 1X (xi − x̄)(yi − ȳ) = Cov(x, y) = n Sxy i=1 Correlation = Standardized Covariance ρx,y = p Cov(x, y) V ar(x)V ar(y) ρ=1: exact linear with positive slope ρ=0: No linear relationship ρ = −1 : Exact linear with negative slope Correlation symmetric measure of linear relationship Least Square-estimation example: Deep Drill Data ● ● ● ● 3 ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● 1 Logarithmic cataclasit concentration ● ● ● ● 0 ● ● ●●● ●● 0.10 ●● ● ●●● ● ●● 0.15 ●● 0.20 0.25 0.30 0.35 Carbon concentration Residual standard error: 0.6917 on 70 degrees of freedom Multiple R-squared: 0.6562 Adjusted R-squared: 0.6513 F-statistic: 133.6 on 1 and 70 DF Confidence interval for βC : [9.828611, 13.9272200] 1 The multiple linear regression model Yi = β0 + β1 xi1 + β2 xi2 + ...βp xp +εi ; i = 1, ..., n | {z } x0i β x0i = (1xi1 ...xip ) Y = Xβ + ε with   Y1    Y =  ...  X =  Yn  1 x11 , · · · .. .. . . 1 xn1 x1p xnp   β0    .   β =  ..  ε =  βp   Model assumptions E(εi ) = 0 E(ε) = 0 V (εi ) = σ 2 (εi |i = 1, ..., n) = independ. V (ε) = σ 2 I εi ∼ N (0, σ 2 ) ε ∼ N (0, σ 2 I) X: Constant design matrix (matrix of the regressors) β: Vector of regression parameter Y : Random vector of dependent variable ε: Error terms Least-Square-estimator Data: Y = (Y1 , ..., Yn ) Design matrix X β̂ = arg min (Y − Xβ)0 (Y − Xβ) {z } β | P n 0 2 i=1 (Yi −xi β) ε̂i = Yi − x0i β For (X 0 X) invertible: β̂ exists, is unique and β̂ = (X 0 X)−1 X 0 Y. 2  ε1 ..  .  εn Intercept C S Estimate -1.0463 6.3290 2.2250 Std. Error 0.1710 1.0104 0.2752 t value -6.119 6.264 8.086 P r(> |t|) 5.04e-08 *** 2.79e-08 *** 1.39e-11 *** Interpretation: Multiple regression E(Y |X) = β0 + β1 x1 + ... + βp xp x1 =⇒ x1 + 1 and x2 , ..., xp fixed −→ E(Y |X1 + 1, ..., Xp ) − E(Y |X1 , ..., Xp ) = β1 −→ Effect of x1 when other parameters are fixed −→ Adjusted effect of x1 on Y Testing in the multiple linear regression Y = β0 + β1 x1 + ... + βp xp H0 : βp = 0 Basic idea: Compare model fit under H0 with unrestricted model Here: Fit model Y = β0 + ... + βp−1 xp−1 −→ SSE1 (Residual sum of squares) Fit unrestricted model −→ SSE Look at (SSE1 − SSE)/a ∼ F1,n−(p+1) SSE F = F-Distribution Example: Deep Drill Data Residual standard error: 0.4992 on 69 degrees of freedom Multiple R-squared: 0.8235 General linear hypothesis H0 : Aβ = c rg(A) = a (SSEH0 − SSE)/a ∼ F1,n−p SSE SSEH0 : Sum of Squares in the restricted model SSE: Sum of Squares in the complete model Example: Deep Drill Data Restricted model: LN CAT R = β0 + β1 C Complete model: LN CAT R = β0 + β1 C + β2 S (SSEH0 − SSE)/a = 65.37686 ≥ 4 = SSE 3 F1,69 − distribution Dummy-Variables (1) Y = β0 + β1 X1 + X1 ∈ (0, 1) = Indicator for a group, e.g. Rain/ No Rain E(Y |X1 = 0) = β0 E(Y |X1 = 1) = β0 + β1 β1 : Mean Difference =⇒ Regression= t-Test Dummy-Variables (2) X is discrete and has more than 2 levels 4 groups (e.g. single (1), married (2), widowed (3), divorced (4) Define Indicator for each level: 0 no single X1 = 1 single 0 not married X2 = 1 married Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + Since X1 + X2 + X3 + X4 = 1 =⇒ Multicollinearity −→ leave out X4 Dummy-Variables (3) Y = β0 + β1 X1 + β2 X2 + β3 X3 + Then: E(Y |X = 1) = β0 + β1 = µ1 E(Y |X = 2) = β0 + β2 = µ2 E(Y |X = 3) = β0 + β3 = µ3 E(Y |X = 4) = β0 = µ4 The Category 4 is called the reference category β1 : Difference between category 4 and category 1 β2 : Difference between category 4 and category 2 Test: µ1 = µ2 = µ3 = µ4 ⇔ β1 = β2 = β3 = 0 Combine Dummy Variables and continuous X (1) Y = β0 + β1 X1 + β2 X2 + X1 = [0, 1]: Dummy Variable X2 : Continuous 4 E(Y |X1 = 0, X2 ) = β0 + β2 X2 E(Y |X = 2) = β0 + β1 + β2 X2 =⇒ Two parallel regression lines (Analysis of Covariance) ● ● ● ● 3 ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● Y 2 ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● 1 ● ● 0 ● ● ●●● ●● ●● ● 0.10 ●●● ● ●● ●● 0.15 0.20 0.25 0.30 0.35 X2 Combine Dummy Variables and continuous X (2) Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 | {z } Interaction E(Y |X1 = 0, X2 ) = β0 + β2 X2 E(Y |X = 2) = β0 + β1 + (β2 + β3 )X2 =⇒ Two different regression lines (Model with Interaction) ● ● ● ● 3 ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● Y 2 ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● 1 ● 0 ● ● ●●● ●● ●● ● 0.10 ●●● ● ●● 0.15 ●● 0.20 0.25 0.30 0.35 X2 Combine Dummy Variables and continuous X (3) Tests: • β3 = 0 ↔ Is there a different slope? • β2 = β3 = 0 ↔ Any effect of X2 on Y? • β1 = β3 = 0 ↔ Any effect of X1 on Y? • β1 = β2 = β3 = 0 ↔ Any effect of X1 and X2 on Y −→ General Hypothesis? 5 Continuous Variables (1) Use transformations of X and Y lnY = β0 + β1 X + or √ 3 Y = β0 + β1 X + Example: Volume of eggs V = β0 + β1 L + β2 d + ??? lnY = β0 + β1 ln(L) + β2 ln(d) + ⇔ Y = exp(β0 ) · Lβ1 · dβ2 · exp() β2 ≈ 2 β1 ≈ 1 Transformation Transformation on X −→ Different interpretation. Comparison (R2 ) is possible Transformation on Y −→ Different interpretation on all effects −→ Transformation also for error term −→ Different model e.g. lnY = β0 + β1 X + ⇔ Y = exp(β0 · β1 X) · exp() =⇒ Multiplicative error structure Continuous Variables (2) Polynomial regression: Y = β0 + β1 X + β2 X 2 + β3 X 3 + ... + Note: Relationship between X and Y is non linear, but model is linear in β In General: Y = β0 + β1 f1 (X) + β2 f2 (X) + β3 f3 (X) + ... + f1 , ..., fp are fixed, known functions Example: Regression spline f1 (X) = X f2 (X) = X 2 f3 (X) = (X − X1 )2+ f4 (X) = (X − X2 )2+ 6 Multiple Regression • Rich tool with many possible models • Comparison of models by SSE • F- test of linear hypothesis • Variable selection stratgies • Model check by residuals 7

Simple linear regression Correlation coefficient Covariance Sxy

Related documents

Products

Support

Simple linear regression Correlation coefficient Covariance Sxy

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib