The method of least squares for linear regression,

advertisement
The method of least squares for linear regression,
with application to power law fits
Access 2009
We’ll let Maple check the examples we worked by hand:
> restart: #clear old commands
> with(plots): #load plotting package of commands
> with(stats):
#load statistics package of commands
Notice that the following commands use a lot of parenthesis, so be careful when you try your own
examples.
Example 1
> XY1:=[[0,1],[2,2],[4,1]]:
> PlotXY1:=pointplot(XY1):
display(PlotXY1);
> fit[leastsquare[[X,Y]]]([[0,2,4],[1,2,1]]);
#syntax: [X,Y] says we are calling the coords "X" and "Y".
#what follws is the list of "X" values, and then the
#corresponding "Y" values....
#
so the points are [0,1],[2,2],[4,1]
> Line1:=plot(4/3, X=-1..5,Y=-1..3, color=black):
#I picked a slightly larger X-Y box than the points were in
display({Line1,PlotXY1}, title=‘Example 1 least squares fit‘);
Example 2
> XY2:=[[0,3],[1,5],[2,5]]:
> PlotXY2:=pointplot(XY2):
display(PlotXY2);
> fit[leastsquare[[X,Y]]]([[0,1,2],[3,5,5]]);
> Line2:=plot(10/3 + X, X=-1..3,Y=-1..6, color=black):
display({Line2,PlotXY2}, title=‘Example 2 least squares fit‘);
Example 3
> XY3:=[[0,3],[1,5],[2,6],[3,7]]:
> PlotXY3:=pointplot(XY3):
display(PlotXY3);
>
> fit[leastsquare[[X,Y]]]([[0,1,2,3],[3,5,6,7]]);
> Line3:=plot(33/10 + (13/10) *X, X=-1..4,Y=-1..8, color=black):
display({Line3,PlotXY3}, title=‘Example 3 least squares fit‘);
>
How do you test for power laws?
What if our points don’t lie in a straight line? What if it looks like it should be a quadratic function or
even a square root function? In other words, we want to see if there is a power m and a proportionality
constant b so that the formula
y = b xm
effectively mirrors the real data. Taking (e.g. natural) logarithms of the proposed power law yields
ln(y ) = ln(b ) + m ln(x ).
So, if we write Y = ln(y ) and X = ln(x ), B = ln(b ), this becomes the equation of a line in the new
variables X and Y:
Y = mX + B
Thus, in order for there to be a power law for the original data, the ln-ln data should (approximately)
satisfy the equation of a line, and vise verse. If we get a good line fit to the ln-ln data, then the slope m
of this line is the power relating the original data, and the exponential e B of the Y-intercept is the
proportionality constant b in the original relation y = b x m. With real data it is not too hard to see if the
ln-ln data is well approximated by a line, in which case the original data is well-approximated by a
power law.
Astronomical example: As you may know, Isaac Newton was motivated by Kepler’s (observed) Laws
of planetary motion to discover the notions of velocity and acceleration, i.e. differential calculus, along
with the inverse square law of planetary acceleration around the sun.....from which he deduced the
concepts of mass and force, and that the universal inverse square law for gravitatonal attraction was the
ONLY force law depending only on distance between objects, which was consistent with Kepler’s
observations! Kepler’s three observations were that
(1) Planets orbit the sun in ellipses, with the sun at one of the ellipse foci.
(2) A planet sweeps out equal angles from the sun, in equal time intervals, independently of where it is
in its orbit.
(3) The square of the period of a planetary orbit is directly proportional to the cube of the orbit’s
semi-major axis.
So, for roughly circular orbits, Keplers third law translates to the statement that the period is
proportional to R 1.5, where R is the approximate radius. Let’s see if that’s consistent with the following
data:
Planet
Mercury
Earth
Jupiter
Uranus
Pluto
mean distance R from sun
(in astronomical units where 1=dist to earth)
0.387
1.
5.20
19.18
39.53
Orbital period T
(in earth years)
0.241
1.
11.86
84.0
248.5
> restart:with(linalg):with(plots):with(stats):
> Rs:=[0.387,1.,5.20,19.18,39.53];
Ts:=[.241,1.,11.86,84.0,248.5];
pts:=[seq([Rs[i],Ts[i]],i=1..5)];
> data:=pointplot(pts):
display(data);
> lnRs:=map(ln,Rs); #get ln-ln data
lnTs:=map(ln,Ts);
lnpts:=[seq([lnRs[i],lnTs[i]],i=1..5)];
> lnlndata:=pointplot(lnpts):
display(lnlndata);
Notice the ln-ln plot really does seem close to a line! So maybe there is a power law for the original
data:
> fit[leastsquare[[lnR,lnT]]]([ lnRs,lnTs]);#this is the bizarre
syntax
We can paste in the equation of the line and see how well we did.
> line:=plot(.4868929851e-3+1.499816412*lnR, lnR=-2..6,lnT=-2..6,
color=black):
display({lnlndata,line}, title=‘least squares fit‘);
Finally, we can go back from the least squares line fit to a power law
> m:=1.499816412; #power
b:=exp(0.0004868929851); #proportionality constant
Notice in the command lines below how to
(i) make a title
(ii) get the axes labeled "g" and "c" for number of genes and number of cells
(iii) get the display to include appropriate ranges - to contain all our data
(iv) first create a picture of the power function, and then display its graph along with the original
gene/cell data points.
> powerplot:=plot(b*R^m,R=0..50,T=0..300,color=black):
> display({powerplot,data},title=
‘power law for planetary period as a function of radius‘);
#by calling the variables R and T, and giving their ranges, I
#get Maple to label the axes as I want
>
>
Download