Inferences about slope

732G21/732A35/732G28 1 Formal statement Yi   0  1 X 1   i     Yi is i th response value β0 β1 model parameters, regression parameters (intercept, slope) Xi is i th predictor value  i is i.i.d. normally distributed random vars with expectation zero and variance σ2 732G21/732A35/732G28 2 Inference about regression coefficients and response:  Interval estimates and test concerning coefficients  Confidence interval for Y  Prediction interval for Y  ANOVA-table 732G21/732A35/732G28 3  After fitting the data, we may obtain a regr. line y  1.5  0.00005 x   Is 0.00005 significant or just because of random variation? (hence, no linear dependence between Y and X) How to do? ◦ Use Hypothesis testing (later) ◦ Derive confindence interval for β0 . If ”0” does not fall within this interval, there is dependence 732G21/732A35/732G28 4  Estimated slope b1 is a random variable (look at formula)  X n b1  i 1 i  X Yi  Y   X n i 1 i  X 2 Properties of b1  Normally distributed (show)  E(b1)= β1 2 2  b1  n  Variance 2  X i 1 Further: i X Test statistics b1  1 sb1  is distributed as t(n-2) 732G21/732A35/732G28 5   See table B.2 (p. 1317) Example one-sided interval t(95%), 15 observations t13=1.771 732G21/732A35/732G28 6  Confidence interval for β1 (show…) b1  t 1   / 2, n  2sb1   If variance in the data is unknown, s b1  s2 2  X n i 1 i  X 2 Example Compute confidence interval for slope, Salary dataset 732G21/732A35/732G28 7 50 y = 0.5471x + 8.4545 45 40 Salary (y) 35 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 Age (x) 732G21/732A35/732G28 8  Often, we have sample and we test at some confidence level α H o :   0 H a :   0 or H o :   0 H a :   0 or H o :   0 H a :   0 How to do?    Step 1: Find and compute appropriate test function T=T(sample,λ0) Step 2: Plot test function’s distrubution and mark a critical area dependent on α If T is in the critical area, reject H0 otherwise do not reject H0 (accept H1) 732G21/732A35/732G28 9  Test H o : 1  0 H a :  1 0 b1 Step 1: compute t  sb1 *    Step 2: Plot the distribution , mark the points  t 1   / 2, n  2 and the critical area. Step 3: define where t* is and reject H0 if it is in the critical area Example Test the hypothesis for Salary dataset:  Manually, compute also P-values  By Minitab 732G21/732A35/732G28 10 Sometimes, we need to know ” β0=0?” Do confidence intervals and hypothesis testing in the same way using folmulas below!  b0  Y  b1 X Properties of b0  Normally distributed (show)   E(b0)= β0  2 2 1  Variance (show..)  b0      n   Further: Test statistics   X2  n 2   X  X  i  i 1 b0   0 sb0  is distributed as t(n-2) 732G21/732A35/732G28 11   If distribution not normal (if slightly, OK, otherwise asymptotic) Spacing affects variance (larger spacing –smaller variance) Example Test β0=0 for Salary data 732G21/732A35/732G28 12 Estimate at X=Xh (Xh – any): Properties of E(Yh)  Normally distributed (show)  E (Yˆh )  E Yh   2 1   X  X  Variance h  2 Yˆ   2    h Further: n  Yˆh  b0  b1 X h    n 2 X i  X    i 1  ˆ  E Y  Y h is Test statistics h s Yˆh Confidence interval  distributed as t(n-2)   Yˆh  t 1   / 2, n  2s Yˆh 732G21/732A35/732G28 13  Make a plot… CONFIDENCE INTERVAL We estimate the position of the mean in the population with X = Xh POINT ESTIMATE PREDICTION INTERVAL We estimate the position of the individual observation in the population with X = Xh 732G21/732A35/732G28 14  When parameters are unknown, the mean E(Yh) may have more than one possible location New observation = mean + random error -> prediction interval should be wider  732G21/732A35/732G28 15 Further: ˆ Y  Y h ( new ) h is distributed as t(n-2) Test statistics spred  Prediction interval Yˆh  t 1   / 2, n  2s pred   How to estimate s(pred) ? New observ. is any within b0+b1Xh+ε. Hence    2  pred    2 b0  b1 X h      2 b0  b1 X h    2     2 Yˆh   2   Standard error (show)   2  1  Xh  X   2  s pred   MSE 1   n 2  n  Xi  X      i 1 732G21/732A35/732G28 16 Example  Calculate confidence and prediction intervals for 35 years old person  Compare with output in Minitab 732G21/732A35/732G28 17  Total sum of squares SSTO   Yi  Y  n Error sum of squares SSE   i 1  Regression sum of squares i 1  SSR   Yˆ  Y  n  2  Yi  Yî 2 n i 1 2 i SSTO  SSR  SSE 732G21/732A35/732G28 18  SSTO has n-1 (sum up to zero)  SSE has n-2 ( 2 model parameters)  SSR has 1 (fitted values lie on regression line= 2 degreessum up to zero 1 degree) n-1 = n-2 + 1 SSTO =SSE + SSR Important : MSxx= SSxx/degrees_of_freedom 732G21/732A35/732G28 19  ANOVA table Source of variation SS df Regression SSR   Yˆ  Y  MS 2 1  n-2 i  Error SSE   Yi  Yî Total SSTO   Yi  Y  n - 1 2 MSR  SSR 1 MSE  SSE n2 2 732G21/732A35/732G28 20 Expected mean squares E MSE    2 E MSR      2    2 1  X n i 1 X 2 i E(MSE) does not depend on the slope, even when zero E(MSR) =E(MSE) when slope is zero -> IF MSR much more than MSE, slope is not zero, if approximately same, can be zero 732G21/732A35/732G28 21 H o : 1  0 H a :  1 0  Test statistics F* = MSR/MSE , use F(1,n-2) (see p. 1320) Decision rules:   If F* > F(1-α;1, n-2) conclude Ha If F* ≤ F(1-α;1, n-2) conclude H0 Note: F test and t test about β1 are equivalent 732G21/732A35/732G28 22  General approach H o : 1  0 H a :  1 0  Full model: (linear) n n SSE ( F )   Yi  (b0  b1 X )   2 i 1  i 1 Reduced model: (constant)  Yi  Yî  2  SSE SSE ( R)   Yi  b0   Yi  Y   SSTO n i 1 2 n 2 i 1 732G21/732A35/732G28 23 It is known (why?..) SSE(F)≤SSE(R). Large difference -different models, small difference – can be same   Test statistics SSE R   SSE F  SSE ( F ) F  / df R  df F df F *    For univariate linear model, equivalent to F* = MSR/MSE F* belongs to F(dfR-dfF,dfF) distribution (plot critical area..) Test rule: F*> F(1-α; dfR-dfF,dfF)  reject H0 732G21/732A35/732G28 24 Example For Salary dataset  Compose ANOVA table and compare with MINITAB  Perform F-test and compare with MINITAB 732G21/732A35/732G28 25  Coefficient of determination: SSR R  SSTO 2  Coefficient of correlation: r  R2 Limitations:  High R does not mean a good fit  Low R does not mean than X and Y are not related Example: For Salary dataset, compute R2 and compare with MINITAB 732G21/732A35/732G28 26  Chapter 2 up to page 78 732G21/732A35/732G28 27

Inferences about slope

Related documents

Products

Support

Inferences about slope

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib