4.2.1 Interpolation, extrapolation and prediction variance

Prediction variance in Linear Regression • Assumptions on noise in linear regression allow us to estimate the prediction variance due to the noise at any point. • Prediction variance is usually large when you are far from a data point. • We distinguish between interpolation, when we are in the convex hull of the data points, and extrapolation where we are outside. • Extrapolation is associated with larger errors, and in high dimensions it usually cannot be avoided. Linear Regression • Surrogate is linear n combination of 𝑛𝑏 yˆ   bii (x) given shape functions i 1 • For linear 1  1  2  x approximation • Difference (error) e  y   b  (x ) e  y  Xb between 𝑛𝑦 data and surrogate T T e e  ( y  X b ) (y  Xb) • Minimize square error • Differentiate to obtain X T Xb  X T y b nb j j i 1 i i j Model based error for linear regression • The common assumptions for linear regression – The true function is described by the functional form of the surrogate. – The data is contaminated with normally distributed error with the same standard deviation at every point. – The errors at different points are not correlated. • Under these assumptions, the noise standard deviation (called standard error) is estimated as eT e ˆ  n y  nb 2 • 𝜎 is used as estimate of the prediction error. Prediction variance n yˆ   bii (x), • Linear regression model i 1 • Define x (m) i  i (x), then yˆ  x( m)T b, • With some algebra Var[ yˆ (x)]  x ( m )T • Standard error b x (m) ( m )T (m) X X x   , ( m )T X X   x 2 s y  ˆ x T T 1 1 x( m ) Interpolation, extrapolation and regression • Interpolation is often contrasted to regression or least-squares fit • As important is the contrast between interpolation and extrapolation • Extrapolation occurs when we are outside the convex hull of the data points n 1 x    i xi , i 1 n 1  i 1 i  1,  i  0, • For high dimensional spaces we must have extrapolation! 2D example of convex hull • By generating 20 points at random in the unit square we end up with substantial region near the origin where we will need to use extrapolation • Using the data in the notes, give a couple of alternative sets of alphas Approximately for the point (0.4,0.4) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Example of prediction variance • For a linear polynomial RS y=b1+b2x1+b3x2 find the prediction variance in the region  1  x1  1,  1  x2  1 1 0.8 • (a) For data at three vertices (omitting (1,1)) 0.6 0.4 0.2 0 -0.2 -0.4 x   1, 1 , x   1,1 , x  1, 1 T 1 x (m) T 2 T 3 1  2   T 1   x1   X X   0.25 1 x  1  2 s y  ˆ x ( m )T 1 1 2 1  , 1 2  -0.6 -0.8 -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1  1  1 3  1 -1 X  1  1 1  , X T X   1 3 -1  1 1  1   1 -1 3  (m) 2 2 ˆ X X x   0.5 1  x  x  x  x    1 2 1 2  x1x2  T 1 Interpolation vs. Extrapolation • At origin s y 2.  ˆ At 3 vertices s y  ˆ . At (1,1) s y  3ˆ 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1 s y  ˆ x -0.8 ( m )T -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 (m) 2 2 ˆ X X x   0.5 1  x  x  x  x    1 2 1 2  x1x2  T 1 Standard error contours • Minimum error sy  obtained by setting to zero derivative of prediction variance with respect to 𝑥1 and 𝑥2 . • What is special about this point • Contours of prediction variance provide more detail. 1 1 ˆ 3 1 1.4 1 3 1. 6 1.2 1 0.8 0.6 x1  x2   at 0.8 0.4 1 0.8 1. 4 1. 2 0.2 1 -0.2 0.6 8 0. 0. 6 0 -0.4 0.6 -0.6 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 1 -1 -1 0.8 0.8 -0.8 0.8 1 Data at four vertices 1 1 X  1  1 • Now x1T   1, 1 , x2T  1,1 , x3T  1, 1 , x4T  1,1 • And x ( m )T X X  T 1  1  1 1 0 0   1 1  , X T X  4 0 1 0  1 1  0 0 1   1 1  x ( m)  0.25(1  x12  x22 ), • Error at vertices 3 sy  ˆ 2 • At the origin minimum is 1 s y  ˆ 2 • How can we reduce error without adding points? Graphical Comparison of Standard Errors Three points 1. 4 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0. 8 1 -0.8 -0.8 0.8 0.8 0.55 7 0. -0.6 0.6 0.8 1 -1 -1 0.7 0. 6 -0.4 -0.6 0.6 0.55 1 -0.2 -0.4 -1 -1 0.55 8 0. -0.2 -0.8 0.55 0 0 0.6 0.6 0.65 0.2 0.2 0.65 0.6 0.4 0.65 1. 2 0. 8 0.7 5 0.7 1 0.7 0.65 7 0. 0.6 0.8 5 0.7 0.8 0.8 0.4 0.7 0.8 1. 6 1.2 1 0.8 0.6 1 1.4 0.6 1 0.7 1 Four points 0. 65 0.6 0.7 5 -0.8 0.6 65 0. 0.65 -0.6 -0.4 -0.2 5 0.7 0.7 0.7 0 0.2 0.4 0.6 0.8 0.8 1 Homework • Redo the four point example, when the data points are not at the corners but inside the domain, at +-0.8. What does the difference in the results tells you? • For a grid of 3x3 data points, compare the standard errors for a linear and quadratic fits.

4.2.1 Interpolation, extrapolation and prediction variance

Related documents

Products

Support

4.2.1 Interpolation, extrapolation and prediction variance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib