Estimation of regression function

732G21/732A35/732G28 1  732G21 Sambandsmodeller http://www.ida.liu.se/~732G21 One semester=Regr.analysis+ + analysis of variance (teacher: Lotta Hallberg) 732G28 Regression methods http://www.ida.liu.se/~732G28 Half of semester=Regr. analysis 732A35 Linear statistical models http://www.ida.liu.se/~732A22 Almost one semester=Regr. Analysis+ + analysis of variance (teacher: Lotta Hallberg) 732G21/732A35/732G28 2  Course language: English, but you may use Swedish  We use It’s learning (accessed via Student portal) (show…)  9 Lectures  8 Labs (computer). Deadlines, around 5 days after lab ends  8 Lessons=I solve problems on the whiteboard + lab discussion  One written final exam  Course book: Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models with Student Data CD, 5th Edition, ISBN 0073108742. 732G21/732A35/732G28 3  Linear statistical models are widely used in ◦ ◦ ◦ ◦ ◦ Business Economics Engineering Social, biological sciences Etc Example: A database contains price of houses sold in Linköping in 2009, their age, size, other parameters. ◦ Given parameters of a new house  determine its approximate market price  Determine reasonable price bounds 732G21/732A35/732G28 4  Analysis of databases No Area (X1) Age (X2) Price (Y) 1 320 14 2,530,000 2 210 1 1,800,000 … … … …  Observations (records, cases) in rows  Variables in columns ◦ Explanatory variables (predictors, inputs) Xi ◦ Response Y, we assume Y=f(X1,…,Xn) In this lecture, models with only one explanatory variable 732G21/732A35/732G28 5 Real data can seldom be presented as Y=βX (observation errors, missing inputs etc) Example: Age and salary for a sample of eight persons from a company. Age Salary 50 45 21 32 40 56 61 55 39 33 17 30 27 35 44 38 36 25 40 35 Salary (y)  30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 Age (x) Scatterplot 732G21/732A35/732G28 6  Presented relation is almost linear Linear regression analysis: find a linear finction as close as possible to the data 50 y = 0.5471x + 8.4545 45 40 35 Salary (y)  30 25 20 15 10 5 0 0 10 20 40 30 50 60 70 Age (x) 732G21/732A35/732G28 7  For each X, there is a probability distribution P(Y=y|X=x) of Y  The aim is to find a regression function E(Y|X=x) 732G21/732A35/732G28 8 Construction of regression models    Selection of prediction variables (variance reduction) Functional form (from theory, approximation) Domain of the model Software  MINITAB  SAS  SPSS  Matlab  Excel 732G21/732A35/732G28 9 Formal statement Yi   0  1 X 1   i     Yi is i th response value β0 β1 model parameters, regression parameters (intercept, slope) Xi is i th predictor value  i is i.i.d. random vars with expectation zero and variance σ2 732G21/732A35/732G28 10 Features (show…) E Yi    0  1 X i  2 Yi    2  All Yi and Yj are uncorrelated Meaning of regression parameters  β0 response value at X=0  β1 change in EY per unit increase in X 732G21/732A35/732G28 11 Given data set S   X 1 , Y1 ,...,  X n , Yn  Method of least squares:     Observed response Yi Estimated response  0  1 X i Deviation Yi   0  1 X i  Regression fit is good when all deviations are minimized (see pict) -> minimimize sum of squares n Q   Yi   0  1 X i  2 i 1 732G21/732A35/732G28 12  How to find minimum of Q? Q 0  0 Q 0  1 Estimators of β0 and β1   X n b1  i 1 i  X Yi  Y   X n i 1 X 2 i b0  Y  b1 X 732G21/732A35/732G28 13 Exercise (For salary data, MINITAB): 1. 2. 3. 4. Make scatterplot (Scatterplot…, with, without regression lien) Perform regression using ”Regression…” Perform regression using ”Fitted line plot..” Calculate coefficients by hand 732G21/732A35/732G28 14 50 y = 0.5471x + 8.4545 45 40 Salary (y) 35 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 Age (x) 732G21/732A35/732G28 15 Gauss-Markov theorem    Estimators b0 and b1 are unbiased and have minimum variance among all unbiased estimators Unbiased  bias=Eb0-β0=0  Eb0=β0 Analogously, Eb1=β1 Show illustration… 732G21/732A35/732G28 16  Mean (expected response)  Point estimator of mean response (fitted value)  0  1 X Yˆ  b0  b1 X Residuals ei  Yi  Yˆi 732G21/732A35/732G28 17 Plot of residuals (obtain it with MINITAB) 8 6 4 Residuals  2 0 0 10 20 30 40 50 60 70 -2 -4 -6 Age 732G21/732A35/732G28 18  Properties of residuals n 1. e i 1 n 2. i 0 2 e i i 1 n 3. 4. is minimum possible n  Y   Yˆ i 1 n i i 1 i  X i ei  0 , i 1 5. Q  0) (because  0 (because of 1) n  Yˆ e i 1 i i 0 (can be shown) Regression line always goes through X , Y  732G21/732A35/732G28 19  Estimate of variance of single population (sample variance) n 1 2   s2  Y  Y  i n  1 i 1  In regression, we compute s2 using residuals (look at residual plot) n  SSE   Yi  Yˆi i 1 s 2  MSE    e 2 n i 1 2 i SSE n2 732G21/732A35/732G28 20   Why divided by n-2? Because E(MSE)=σ2 Important: In general, unbiased SSE s  MSE  nd 2 d - degrees of freedom, number of model parameteres Example: Compute residuals, SSE, MSE, find it in MINITAB output 732G21/732A35/732G28 21  Minitab ◦ Graph → Scatterplot ◦ Stat → Regression ◦ Stat->Fitted Line Plot 732G21/732A35/732G28 22  Course book, Ch. 1 up to page 27. 732G21/732A35/732G28 23

Estimation of regression function

Related documents

Products

Support

Estimation of regression function

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib