PowerPoint_13_Least-Squares

advertisement
GG 313 Geological Data Analysis
Lecture 13
Solution of Simultaneous
Equations
October 4, 2005
Homework discussion.
People are having problems with the null hypothesis and
what the solution to a hypothesis test means.
Rejecting the null hypothesis is the positive result of our
tests. We need to understand what this means, and the
easiest way is by example.
Consider the Mann-Whitney Test and example 2-7 (page
45). Comparing grain sizes from two different locations on
the moon, we want to see if the mean grain sizes differ in
the two samples. Of course they won’t be the same, but is
the difference statistically significant?
Our hypothesis is that the the mean grain size is different,
implying the two samples come from different populations.
Our null hypothesis is thus that the mean grain sizes are
the same with some statistical confidence (95%).
In this test we combine the two samples and rank eache
element of the sample. If the two samples were identical,
then they would have the same W1 and W2, which would
be (eqn. 2.33) and U1 and U2 would equal zero.
Since U=0 is certainly less than the critical value (U)
obtained from the table. So as U increases, the two
means are getting farther apart. We cannot disprove our
null hypothesis if U is smaller than U.
In the notes for this example, U=24 and U=20. So we
can conclude that the two means cannot be the same
with 95% certainty. IN THE NOTES, Paul says “This
(U=24) is larger than the critical value of 20, suggesting
we cannot reject the null hypothesis.” He’s wrong. Don’t
believe everything you read.
Another example: Homework 5, problem 1.
It states: “… do these data support the claim that on
average higher concentrations were obtained before
cleaning versus after cleaning?” What is the hypothesis?
What is the null hypothesis?
These data are in PAIRS. We are trying to figure out if the
“before” number is statistically larger than the “after”
number.
Using the “sign” test, what do you do first?
-subtract the 2nd number of each pair from the first. If the
first is larger, the result will be “+”, if the first is smaller, the
result will be “-”. How many “+” are there?
If there were as many + as -, what would that say about
the null hypothesis?
If the number of + is much larger than the number of -,
what would that say?
You can calculate the probability of having n + using the
binomial coefficients, but that’s a lot of work. Since, for
this problem, np>5 and n(1-p)>5, you can use z-statistics
(eqn. 2.32). If your z-value from the data is >2, what does
that mean?
These tests are not difficult, but you do need to think the
logic through. You cannot expect to blindly use the notes
and formulas and come up with the correct answer. Since
there’s a 50% probability that you’ll get the answer right,
the method is everything.
Linear algebra provides us with an easy method for
solving systems of simultaneous equations.
Consider the following set of four equations with four
unknowns x1,x2,x3,and x4:
a11x1  a12 x2  a13 x3  a14 x4  b1
a21x1  a22 x2  a23 x3  a24 x 4  b2
a31x1  a32 x2  a33 x3  a34 x4  b3
a41x1  a42 x2  a43 x3  a44 x4  b4
In matrix form, the above is just:

Ax  b
(3.77)
Where:
a11 a12

a21 a22

A
a31 a32

a41 a42
a13
a23
a33
a43
a14 

a24 
,
a34 

a44 
x1 
 
x2 

x
,
x3 
 
x4 
b1 
 
b2 

and b 
b3 
 
b4 
(3.79-3.81)
The solution to the equations is obtained by multiplying
 both sides of Eqn. 3.78 by A-1, to obtain:
1
1
A Ax  A b
1
Ix  x  A b
(3.82)
(3.83)
We have thus solved for x and obtained the solution.
EXAMPLE: Consider a simple example. We have three
planes that are defined below. Any three planes cross at
a point. At what point do they cross each other?
x-y-2z=2
x+y+2z=10
-2x-2y-z=3
Setting up the matrix:
1 1 2
x
2 


 
 
A  1
1 2 , x = y, b = 10



2 2 1

z

3 

In Matlab:
>> Ainv=inv(A)
Ainv =
0.5000 0.5000
0
-0.5000 -0.8333 -0.6667
0
0.6667 0.3333
Multiplying by B,
>> X=Ainv*B
X=
6.0000 =x
-11.3333 =y
7.6667 =z
Thus the planes cross at the point above.
I think this is an easy way to solve such sets of
equations, particularly as the number of equations and
unknowns increase. Other methods may be
computationally more efficient, but the above method
is easy to set up and solve.
_______________
Try an example where you know the answer: Try the
x-z plane at y=2 (0x+1y+0z=2), the x-y plane at z=0
(0x+0y+1z=0), and the y-z plane at x=-2 (1x+0y+0z=2). Where do these 3 planes cross? Solve graphically
and using the matrix solution.
Solutions of this sort are relatively simple. We have no
options - there is only one correct answer and no freedom
to choose between possible answers.
A more interesting case is where we have more data than
we need to fit a model. For example, as we’ve said before,
two points define a line. But what if we want to define a line
a line with three points?
A and B define a line, but added data (point C) sheds
doubt on our original interpretation.
We would like to define a new line that is somehow the “best”
line given the data we have. In addition, we would like to
know just how likely it is that the new line, which is an
estimate based on a sample of all points in the population,
reflects the population. We must be careful! We are
assuming that our MODEL (a straight line) reflects the shape
of reality. A “best fit” does not validate the model.
A good bet for finding the “best” fit to a model curve is to
minimize the square of the errors of each point with respect to
the model curve. Thus, we want to find the curve that
minimizes these errors:
This figure works well,
but what if the line is
nearly vertical?
The figure above shows the errors in the y-value
(regression of y on x). Similarly, we could find the errors in
x:
Do these two methods yield the same answers? Consider
the case below:
In this case the errors in y are far smaller than the errors in
x, and utilizing errors in y will likely yield a better result. If the
curve had a steep slope, the opposite would be the case.
In any case, we can use a method that does not vary with
slope of the curve by measuring perpendiculars to the curve
at each point:
This method, called orthogonal regression, is most useful
when the slope of the line is unknown and it can be in any
direction.
We wish to find a line of the form:
y(x)=a1+a2(x-x0).
(3.89)
Why does Paul use x0? Note that y(x)=a1+a2x-a2x0, so
y(x)=(a1-a2x0)+a2x=a3+a2x. So why bother with x0???
Let’s ignore it…
We have two unknowns, a1and a2, and we wnt to find
values of these unknowns that minimize the square of
the error:
n
minimize : (yobserved  ytheoretical)2
i1
(3.90)
We have one equation for each data point:
a1  a2 x1  y1
a1  a2 x2  y2
(3.91)
a1  a2 xn  yn
We only need two equations to solve for a1 and a2, but we
have n equations. This situation is called over-determined.
The only 
way these equations can have a unique solution is
for n to equal 2 or if all the points lie exactly on a straight
line.
We can write the equations in matrix notation, A x=b :
1

1
1

1
y1 
x1 
   
x2  a1 y2 
 
x3 a2  y3 

 
x4 
y4 
(3.92)
Why can’t we just invert as we did before, and solve for
x=A-1b ? Unfortunately, this isn’t possible since A isn’t
 and it thus has no inverse.
square,
But all is not lost… Consider the equations:
a1  a2 x1  y1  e1
a1  a2 x2  y2  e2
(3.93)
a1  a2 xn  yn  en
ei is the error of the observed y minus the theoretical
value, and we want to obtain the values of a1 and a2
thatminimize the sum of the squares of the ei:
n
E(a1 , a2 )   ei2  eT e
(3.94)
i1
Recall the definition of variance. Minimization of E will
minimize the variance of the errors. Recall that if E is
minimum then the slope of E must be equal to zero.

E is a function of two variables, and the slope must be zero
for each, which says:
E(a1a2 ) E(a1a2 )

0
a1
a2
(3.95)
Evaluating:
n
n





E


2
2
 
 a1  a2 xi  yi  
 ei 
a1 a1 i1  a1 i1

n
(3.96)
 2 a1  a2 xi  yi   0, and
i1

E
  n 2    n
2

 a1  a2 xi  yi  
 ei 
a2 a2 i1  a2 i1

n
 2 a1  a2 xi  yi xi  0
i1
(3.97)
With the results from the two equations (3.96 and 3.97)
we have two equations and two unknowns (a1 and a2),
and we can solve. Re-arranging:
n
n
na1  a2  xi   yi
i1
n
(3.98)
i1
n
n
a1  xi  a2  x 2i   yi x i
i1
i1
(3.99)
i1
Note that the unknowns are a1 and a2, and that the x and
y values are known, so the sumations in the equations
above are all of known constants. We form the sums as

follows (notation is poor here; S stands for sum, not
covariance):
n
n
n
n
i1
i1
i1
i1
Sx   xi , Sy   yi , Sxy   yi x i , Sxx   x 2i (3.100)
Substituting:
na1  a2 Sx  Sy
3.101
a1Sx  a2 Sxx  Sxy
3.102
Solving for the y-intercept, a1 in the first equation,
1
a
a1  Sy  2 Sx
3.103
n
n

Substituting into the first equation for a1 yields:



 
1
1 2 
a2  Sxy  Sx Sy  Sxx  Sx 

 
n
n 
3.107
In matrix notation:
n

Sx
Sx a1  Sy 
   
Sxx a2  Sxy 
(3.109)
This matrix is of the form N x = B, which can be
solved for x by x = N-1 B.

In - class problem:
Use the following data and calculate the least-squares fit
to a line using eqn. 3.109.
Data points:
x y
33 -33
10 -5
4 -2
50 -44
Download