Linear-Least-Squares

advertisement
Linear Least Squares
Approximation
By
Kristen Bauer, Renee Metzger,
Holly Soper, Amanda Unklesbay
Linear Least Squares
Is the line of best fit
for a group of points
It seeks to minimize
the sum of all data
points of the square
differences between
the function value and
data value.
It is the earliest form
of linear regression
Gauss and Legendre
The method of least squares was
first published by Legendre in
1805 and by Gauss in 1809.
Although Legendre’s work was
published earlier, Gauss claims
he had the method since 1795.
Both mathematicians applied the
method to determine the orbits of
bodies about the sun.
Gauss went on to publish further
development of the method in
1821.
Example
Consider the points (1,2.1), (2,2.9), (5,6.1), and (7,8.3)
with the best fit line f(x) = 0.9x + 1.4
The squared errors are:
x1=1
f(1)=2.3
y1=2.1
e1= (2.3 – 2.1)² = .04
x2=2
f(2)=3.2
y2=2.9
e2= (3.2 – 2.9)² =. 09
x3=5
f(5)=5.9
y3=6.1
e3= (5.9 – 6.1)² = .04
x4=7
f(7)=7.7
y4=8.3
e4= (7.7 – 8.3)² = .36
So the total squared error is .04 + .09 + .04 + .36 = .53
By finding better coefficients of the best fit line, we can make this error
smaller…
We want to
minimize the
vertical distance
between the point
and the line.
• E = (d1)² + (d2)² + (d3)² +…+(dn)² for n data points
• E = [f(x1) – y1]² + [f(x2) – y2]² + … + [f(xn) – yn]²
• E = [mx1 + b – y1]² + [mx2 + b – y2]² +…+ [mxn + b – yn]²
• E= ∑(mxi+ b – yi )²
E must be MINIMIZED!
How do we do this?
E = ∑(mxi+ b – yi )²
Treat x and y as constants, since we are
trying to find m and b.
So…PARTIALS!
E/m = 0 and E/b = 0
But how do we know if this will yield
maximums, minimums, or saddle points?
Minimum Point
Maximum Point
Saddle Point
Minimum!
Since the expression
E is a sum of
squares and is
therefore positive (i.e.
it looks like an upward
paraboloid), we know
the solution must be a
minimum.
We can prove this by
using the 2nd Partials
Derivative Test.
2nd Partials Test
Suppose the gradient of f(x0,y0) = 0.
(An instance of this is E/m = E/b = 0.)
We set
 f
 f
 f
A  2 ,B 
,C  2
yx
x
y
2
2
2
And form the discriminant D = AC – B2
1) If D < 0, then (x0,y0) is a saddle point.
2) If D > 0, then f takes on
A local minimum at (x0,y0) if A > 0
A local maximum at (x0,y0) if A < 0
Calculating the Discriminant
A
A
A
A
2 f
x 2
2E
m2
 2  (mx  b  y ) 2
m2
  (2 x)(mx  b  y )
m
A   (2 x 2 )
A  2  x2
2 f
B
yx
2E
B
bm
 2  (mx  b  y) 2
B
bm
  (2 x)(mx  b  y )
B
b
B   (2 x )
B  2 x
2 f
C
y 2
C
C
C
C
C
D  AC  B 2  4  x 2  1  4( ( x)) 2
2E
b 2
 2  (mx  b  y) 2
b 2
  (2)(mx  b  y)
b
2
2 1
D  AC  B 2  4  x 2  1  4( ( x)) 2
1) If D < 0, then (x0,y0) is a saddle point.
2) If D > 0, then f takes on
A local minimum at (x0,y0) if A > 0
A local maximum at (x0,y0) if A < 0
Now D > 0 by an inductive proof showing that


n  x    xi 
i 1
i 1
n
2
i
n
2
Those details are not covered in this
presentation.
We know A > 0 since A = 2 ∑ x2 is always
positive (when not all x’s have the same value).
Therefore…
Setting E/m and E/b equal to zero will
yield two minimizing equations of E, the
sum of the squares of the error.
Thus, the linear least squares algorithm
(as presented) is valid and we can continue.
E = ∑(mxi + b – yi)² is minimized (as just shown) when
the partial derivatives with respect to each of the
variables is zero. ie: E/m = 0 and E/b = 0
E/b = ∑2(mxi + b – yi) = 0
m∑xi + ∑b = ∑yi
mSx + bn = Sy
set equal to 0
E/m = ∑2xi (mxi + b – yi) = 2∑(mxi² + bxi – xiyi) = 0
m∑xi² + b∑xi = ∑xiyi
mSxx + bSx = Sxy
NOTE:
∑xi = Sx
∑yi = Sy
∑xi² = Sxx
∑xiyi = SxSy
Next we will solve the system of
equations for unknowns m and b:
 mSxx  bSx  Sxy

 mSx  bn  Sy
Solving for m…
nmSxx + bnSx = nSxy
mSxSx + bnSx = SySx
nmSxx – mSxSx = nSxy – SySx
Multiply by n
Multiply by Sx
Subtract
m(nSxx – SxSx) = nSxy – SySx
Factor m
nSxy  SySx
m
nSxx  SxSx
Next we will solve the system of
equations for unknowns m and b:
Solving for b…
 mSxx  bSx  Sxy

 mSx  bn  Sy
mSxSxx + bSxSx = SxSxy
mSxSxx + bnSxx = SySxx
bSxSx – bnSxx = SxySx – SySxx
Multiply by Sx
Multiply by Sxx
Subtract
b(SxSx – nSxx) = SxySx – SySxx
Solve for b
SxxSy  SxySx
b
nSxx  SxSx
Example: Find the linear least squares
approximation to the data: (1,1), (2,4), (3,8)
Use these formulas:
nSxy  SySx
m
nSxx  SxSx
Sx = 1+2+3= 6
Sxx = 1²+2²+3² = 14
Sy = 1+4+8 = 13
Sxy = 1(1)+2(4)+3(8) = 33
n = number of points = 3
SxxSy  SxySx
b
nSxx  SxSx
3(33)  6(13) 21
m
  35
.
3(14)  6(6) 6
14(13)  33(6)  16
b

  2.667
3(14)  6(6)
6
The line of best fit is y = 3.5x – 2.667
Line of best fit: y = 3.5x – 2.667
15
10
5
-1
1
-5
2
3
4
5
THE ALGORITHM
in Mathematica
Activity
For this activity we are going to use the linear least
squares approximation in a real life situation.
You are going to be given a box score from either a
baseball or softball game.
With the box score you are given you are going to write
out the points (with the x coordinate being the number of
hits that player had in the game and the y coordinate
being the number of at-bats that player had in the game).
After doing that you are going to use the linear least
squares approximation to find the best fitting line.
The slope of the besting fitting line you find will be the
team’s batting average for that game.
In Conclusion…
E = ∑(mxi+ b – yi )² is the sum of the
squared error between the set of data
points {(x1,y1),…,(xi,yi),…,(xn,yn)} and the
line approximating the data f(x) = mx + b.
By minimizing the error by calculus
methods, we get equations for m and b
that yield the least squared error:
nSxy  SySx
m
nSxx  SxSx
SxxSy  SxySx
b
nSxx  SxSx
Advantages
Many common methods of approximating data
seek to minimize the measure of difference
between the approximating function and given
data points.
Advantages for using the squares of differences
at each point rather than just the difference,
absolute value of difference, or other measures of
error include:
– Positive differences do not cancel negative differences
– Differentiation is not difficult
– Small differences become smaller and large differences
become larger
Disadvantages
Algorithm will fail if data points fall in a
vertical line.
Linear Least Squares will not be the best
fit for data that is not linear.
The End
Download