The LOWESS Smoother 1/19/2011 1 Copyright © 2011 Dan Nettleton

advertisement
The LOWESS Smoother
1/19/2011
Copyright © 2011 Dan Nettleton
1
LOWESS
“lowess” stands for
LOcally WEighted polynomial regreSSion.
The original reference for lowess is
Cleveland, W. S. (1979). Robust locally weighted
regression and smoothing scatterplots.
JASA 74 829-836.
2
How is the lowess curve determined?
Suppose we have data points (x1,y1), (x2,y2),...(xn,yn).
Let 0 < f ≤ 1 denote a fraction that will determine the smoothness of the curve.
Let r = n*f rounded to the nearest integer.
Consider the tricube weight function defined as
=0
Tricube Weight Function
for | t | < 1
for | t | ≥ 1.
For i=1, ..., n; let hi be the rth smallest
T(t)
T(t) = ( 1 - | t |
3) 3
number among |xi-x1|, |xi-x2|, ..., |xi-xn|.
For k=1, 2, ..., n; let wk(xi)=T( ( xk – xi ) / hi ).
t
3
An Example
i
xi
yi
1
1
1
2
2
8
3
5
4
4
7
5
5
12
3
6
13
9
7
15
16
8
25
15
9
27
23
10
30
29
Suppose a
lowess curve
will be fit
to this data
with f=0.4.
y
x
4
Table Containing |xi-xj| Values
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x1
x2
x3
0
1
4
6
11
12
14
24
26
29
1
0
3
5
10
11
13
23
25
28
4
3
0
2
7
8
10
20
22
25
x4
6
5
2
0
5
6
8
18
20
23
x5
x6
x7
x8
x9
x10
11
10
7
5
0
1
3
13
15
18
12
11
8
6
1
0
2
12
14
17
14
13
10
8
3
2
0
10
12
15
24
23
20
18
13
12
10
0
2
5
26
25
22
20
15
14
12
2
0
3
29
28
25
23
18
17
15
5
3
0
5
Calculation of hi from |xi-xj| Values
n=10, f=0.4  r=4
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x1
x2
x3
0
1
4
6
11
12
14
24
26
29
1
0
3
5
10
11
13
23
25
28
4
3
0
2
7
8
10
20
22
25
x4
6
5
2
0
5
6
8
18
20
23
x5
x6
x7
x8
x9
x10
11
10
7
5
0
1
3
13
15
18
12
11
8
6
1
0
2
12
14
17
14
13
10
8
3
2
0
10
12
15
24
23
20
18
13
12
10
0
2
5
26
25
22
20
15
14
12
2
0
3
29 h1= 6
28 h2= 5
25 h3= 4
23 h4= 5
18 h5= 5
17 h6= 6
15 h7= 8
5 h8=10
3 h9=12
0 h10=15
6
Weights wk(xi) Rounded to Nearest 0.001
k
i
1
2
3
4
5
6
7
8
9
10
1
1.000
0.976
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
0.986
1.000
0.193
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3
0.348
0.482
1.000
0.820
0.000
0.000
0.000
0.000
0.000
0.000
4
0.000
0.000
0.670
1.000
0.000
0.000
0.000
0.000
0.000
0.000
5
0.000
0.000
0.000
0.000
1.000
0.986
0.850
0.000
0.000
0.000
6
0.000
0.000
0.000
0.000
0.976
1.000
0.954
0.000
0.000
0.000
7
0.000
0.000
0.000
0.000
0.482
0.893
1.000
0.000
0.000
0.000
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.986
0.893
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.976
1.000
0.976
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.670
0.954
1.000
~0.976
w6(x5) = (1 - ( | x6 - x5 | / h5 ) 3 ) 3 = ( 1 - ( | ( 13 – 12 ) / 5 | ) 3 ) 3 = ( 1 – 1 / 125 ) 3 ~
7
Next consider local weighted regressions
For each i=1, 2, ..., n; let βˆ *0 (xi ) and βˆ 1* ( x i )
denote the values of β0 and
β1
n
2
w
(
x
)(
y
β
β
x
)
that minimize  k i
k
0
1 k .
k 1
For i=1, 2, ..., n; let
ŷi* = βˆ *0 (xi ) + βˆ1* (xi )xi
and
ei = yi - ŷi*
8
9
10
11
12
13
14
15
16
17
18
Next measure the degree to which
an observation is outlying.
Consider the bisquare weight function defined as
B(t) = ( 1 - t 2 ) 2 for | t | < 1
for | t | ≥ 1.
B(t)
=0
Bisquare Weight Function
For k=1,2,...,n; let
δk = B(ek /(6s))
t
19
where s is the median of |e1|, |e2|, ..., |en|.
19
Now down-weight outlying observations
and repeat the local weighted regressions.
For each i=1, 2, ..., n; let βˆ 0 (xi ) and βˆ 1 ( x i )
denote the values of β0 and β1
n
that minimize ∑δk wk (xi )( yk - β0 - β1x k )2.
k =1
For i=1, 2, ..., n; let ŷi = βˆ 0 (xi ) + βˆ1 (xi )xi .
20
Iterate one more time.
ŷi values to compute new
δk as described previously. Substitute the new δk
for the old δ and repeat the local weighted
k
regressions one last time to obtain the final ŷi values.
These resulting ŷ values are the lowess fitted values.
i
Now use the new fitted
Plot these values versus x1, x2, ..., xn and connect
with straight lines to obtain the lowess curve.
21
22
23
24
25
26
27
28
29
30
31
Plot Showing All 10 Lines and Predicted Values
after One More Iteration
32
The Lowess Curve
33
lowess in R
o=lowess(x,y,f=0.4)
plot(x,y)
lines(o$x,o$y,col=2,lwd=2)
o$x will be a vector containing the x values.
o$y will contain the lowess fitted values for the values in o$x.
f controls the fraction of the data used to obtain each fitted value.
34
o=lowess(y~x,f=.2)
plot(fossil)
lines(o,lwd=2)
#See also the function 'loess' which has more
#capabilities than 'lowess'.
35
36
Download