The LOWESS Smoother LOWESS

advertisement
LOWESS
The LOWESS Smoother
“lowess” stands for
LOcally WEighted polynomial regreSSion.
1/19/2011
The original reference for lowess is
Copyright © 2011 Dan Nettleton
Cleveland, W. S. (1979). Robust locally weighted
regression and smoothing scatterplots.
JASA 74 829-836.
1
2
How is the lowess curve determined?
An Example
Suppose we have data points (x1,y1), (x2,y2),...(xn,yn).
i
xi
yi
Let 0 < f ≤ 1 denote a fraction that will determine the smoothness of the curve.
1
1
1
2
2
8
3
5
4
4
7
5
5
12
3
6
13
9
7
15
16
8
25
15
9
27
23
10
30
29
Let r = n*f rounded to the nearest integer.
Consider the tricube weight function defined as
Tricube Weight Function
T(t) = ( 1 - | t | 3 ) 3 for | t | < 1
for | t | ≥ 1.
For i=1, ..., n; let hi be the
rth
Suppose a
lowess curve
will be fit
to this data
with f=0.4.
y
T(t)
=0
smallest
number among |xi-x1|, |xi-x2|, ..., |xi-xn|.
For k=1, 2, ..., n; let wk(xi)=T( ( xk – xi ) / hi ).
3
t
4
x
Calculation of hi from |xi-xj| Values
Table Containing |xi-xj| Values
n=10, f=0.4 Î r=4
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x1
x2
x3
0
1
4
6
11
12
14
24
26
29
1
0
3
5
10
11
13
23
25
28
4
3
0
2
7
8
10
20
22
25
x4
6
5
2
0
5
6
8
18
20
23
x5
x6
x7
x8
x9
x10
11
10
7
5
0
1
3
13
15
18
12
11
8
6
1
0
2
12
14
17
14
13
10
8
3
2
0
10
12
15
24
23
20
18
13
12
10
0
2
5
26
25
22
20
15
14
12
2
0
3
29
28
25
23
18
17
15
5
3
0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
5
x1
x2
x3
0
1
4
6
11
12
14
24
26
29
1
0
3
5
10
11
13
23
25
28
4
3
0
2
7
8
10
20
22
25
x4
6
5
2
0
5
6
8
18
20
23
x5
x6
x7
x8
x9
x10
11
10
7
5
0
1
3
13
15
18
12
11
8
6
1
0
2
12
14
17
14
13
10
8
3
2
0
10
12
15
24
23
20
18
13
12
10
0
2
5
26
25
22
20
15
14
12
2
0
3
29 h1= 6
28 h2= 5
25 h3= 4
23 h4= 5
18 h5= 5
17 h6= 6
15 h7= 8
5 h8=10
3 h9=12
0 h10=15
6
Weights wk(xi) Rounded to Nearest 0.001
Next consider local weighted regressions
k
i
1
2
3
4
5
6
7
8
9
10
1
1.000
0.976
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
0.986
1.000
0.193
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3
0.348
0.482
1.000
0.820
0.000
0.000
0.000
0.000
0.000
0.000
4
0.000
0.000
0.670
1.000
0.000
0.000
0.000
0.000
0.000
0.000
5
0.000
0.000
0.000
0.000
1.000
0.986
0.850
0.000
0.000
0.000
6
0.000
0.000
0.000
0.000
0.976
1.000
0.954
0.000
0.000
0.000
7
0.000
0.000
0.000
0.000
0.482
0.893
1.000
0.000
0.000
0.000
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.986
0.893
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.976
1.000
0.976
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.670
0.954
1.000
~ 0.976
w6(x5) = (1 - ( | x6 - x5 | / h5 ) 3 ) 3 = ( 1 - ( | ( 13 – 12 ) / 5 | ) 3 ) 3 = ( 1 – 1 / 125 ) 3~
For each i=1, 2, ..., n; let βˆ *0 ( x i ) and βˆ 1* ( x i )
denote the values of β 0 and
n
that minimize
∑ w ( x )( y
k
i
k
β1
- β0 - β1 xk ).2
k =1
For i=1, 2, ..., n; let
ŷi* = βˆ *0 ( x i ) + βˆ 1* ( x i )x i
and
ei = yi - ŷi*
7
8
9
10
11
12
13
14
15
16
17
18
Next measure the degree to which
an observation is outlying.
Now down-weight outlying observations
and repeat the local weighted regressions.
Consider the bisquare weight function defined as
B(t) = ( 1 - t 2 ) 2 for | t | < 1
for | t | ≥ 1.
denote the values of β0 and β1
B(t)
=0
For each i=1, 2, ..., n; let βˆ 0 ( x i ) and βˆ 1 ( x i )
Bisquare Weight Function
n
For k=1,2,...,n; let
δk = B(ek /(6s))
that minimize ∑δk w k (x i )( yk - β0 - β1x k )2.
k =1
t
19
For i=1, 2, ..., n; let ŷi = βˆ 0 ( x i ) + βˆ 1 ( x i )x i .
where s is the median of |e1|, |e2|, ..., |en|.
19
20
Iterate one more time.
ŷi values to compute new
δk as described previously. Substitute the new δk
for the old δk and repeat the local weighted
regressions one last time to obtain the final ŷi values.
These resulting ŷi values are the lowess fitted values.
Now use the new fitted
Plot these values versus x1, x2, ..., xn and connect
with straight lines to obtain the lowess curve.
21
22
23
24
25
26
27
28
29
30
Plot Showing All 10 Lines and Predicted Values
after One More Iteration
31
The Lowess Curve
32
lowess in R
o=lowess(x,y,f=0.4)
plot(x,y)
lines(o$x,o$y,col=2,lwd=2)
o$x will be a vector containing the x values.
o$y will contain the lowess fitted values for the values in o$x.
f controls the fraction of the data used to obtain each fitted value.
33
34
o=lowess(y~x,f=.2)
plot(fossil)
lines(o,lwd=2)
#See also the function 'loess' which has more
#capabilities than 'lowess'.
35
36
Download