LOWESS The LOWESS Smoother “lowess” stands for LOcally WEighted polynomial regreSSion. 1/19/2011 The original reference for lowess is Copyright © 2011 Dan Nettleton Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. JASA 74 829-836. 1 2 How is the lowess curve determined? An Example Suppose we have data points (x1,y1), (x2,y2),...(xn,yn). i xi yi Let 0 < f ≤ 1 denote a fraction that will determine the smoothness of the curve. 1 1 1 2 2 8 3 5 4 4 7 5 5 12 3 6 13 9 7 15 16 8 25 15 9 27 23 10 30 29 Let r = n*f rounded to the nearest integer. Consider the tricube weight function defined as Tricube Weight Function T(t) = ( 1 - | t | 3 ) 3 for | t | < 1 for | t | ≥ 1. For i=1, ..., n; let hi be the rth Suppose a lowess curve will be fit to this data with f=0.4. y T(t) =0 smallest number among |xi-x1|, |xi-x2|, ..., |xi-xn|. For k=1, 2, ..., n; let wk(xi)=T( ( xk – xi ) / hi ). 3 t 4 x Calculation of hi from |xi-xj| Values Table Containing |xi-xj| Values n=10, f=0.4 Î r=4 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x1 x2 x3 0 1 4 6 11 12 14 24 26 29 1 0 3 5 10 11 13 23 25 28 4 3 0 2 7 8 10 20 22 25 x4 6 5 2 0 5 6 8 18 20 23 x5 x6 x7 x8 x9 x10 11 10 7 5 0 1 3 13 15 18 12 11 8 6 1 0 2 12 14 17 14 13 10 8 3 2 0 10 12 15 24 23 20 18 13 12 10 0 2 5 26 25 22 20 15 14 12 2 0 3 29 28 25 23 18 17 15 5 3 0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 5 x1 x2 x3 0 1 4 6 11 12 14 24 26 29 1 0 3 5 10 11 13 23 25 28 4 3 0 2 7 8 10 20 22 25 x4 6 5 2 0 5 6 8 18 20 23 x5 x6 x7 x8 x9 x10 11 10 7 5 0 1 3 13 15 18 12 11 8 6 1 0 2 12 14 17 14 13 10 8 3 2 0 10 12 15 24 23 20 18 13 12 10 0 2 5 26 25 22 20 15 14 12 2 0 3 29 h1= 6 28 h2= 5 25 h3= 4 23 h4= 5 18 h5= 5 17 h6= 6 15 h7= 8 5 h8=10 3 h9=12 0 h10=15 6 Weights wk(xi) Rounded to Nearest 0.001 Next consider local weighted regressions k i 1 2 3 4 5 6 7 8 9 10 1 1.000 0.976 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.986 1.000 0.193 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.348 0.482 1.000 0.820 0.000 0.000 0.000 0.000 0.000 0.000 4 0.000 0.000 0.670 1.000 0.000 0.000 0.000 0.000 0.000 0.000 5 0.000 0.000 0.000 0.000 1.000 0.986 0.850 0.000 0.000 0.000 6 0.000 0.000 0.000 0.000 0.976 1.000 0.954 0.000 0.000 0.000 7 0.000 0.000 0.000 0.000 0.482 0.893 1.000 0.000 0.000 0.000 8 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.986 0.893 9 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.976 1.000 0.976 10 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.670 0.954 1.000 ~ 0.976 w6(x5) = (1 - ( | x6 - x5 | / h5 ) 3 ) 3 = ( 1 - ( | ( 13 – 12 ) / 5 | ) 3 ) 3 = ( 1 – 1 / 125 ) 3~ For each i=1, 2, ..., n; let βˆ *0 ( x i ) and βˆ 1* ( x i ) denote the values of β 0 and n that minimize ∑ w ( x )( y k i k β1 - β0 - β1 xk ).2 k =1 For i=1, 2, ..., n; let ŷi* = βˆ *0 ( x i ) + βˆ 1* ( x i )x i and ei = yi - ŷi* 7 8 9 10 11 12 13 14 15 16 17 18 Next measure the degree to which an observation is outlying. Now down-weight outlying observations and repeat the local weighted regressions. Consider the bisquare weight function defined as B(t) = ( 1 - t 2 ) 2 for | t | < 1 for | t | ≥ 1. denote the values of β0 and β1 B(t) =0 For each i=1, 2, ..., n; let βˆ 0 ( x i ) and βˆ 1 ( x i ) Bisquare Weight Function n For k=1,2,...,n; let δk = B(ek /(6s)) that minimize ∑δk w k (x i )( yk - β0 - β1x k )2. k =1 t 19 For i=1, 2, ..., n; let ŷi = βˆ 0 ( x i ) + βˆ 1 ( x i )x i . where s is the median of |e1|, |e2|, ..., |en|. 19 20 Iterate one more time. ŷi values to compute new δk as described previously. Substitute the new δk for the old δk and repeat the local weighted regressions one last time to obtain the final ŷi values. These resulting ŷi values are the lowess fitted values. Now use the new fitted Plot these values versus x1, x2, ..., xn and connect with straight lines to obtain the lowess curve. 21 22 23 24 25 26 27 28 29 30 Plot Showing All 10 Lines and Predicted Values after One More Iteration 31 The Lowess Curve 32 lowess in R o=lowess(x,y,f=0.4) plot(x,y) lines(o$x,o$y,col=2,lwd=2) o$x will be a vector containing the x values. o$y will contain the lowess fitted values for the values in o$x. f controls the fraction of the data used to obtain each fitted value. 33 34 o=lowess(y~x,f=.2) plot(fossil) lines(o,lwd=2) #See also the function 'loess' which has more #capabilities than 'lowess'. 35 36