The method of least squares for linear regression, with application to power law fits Access 2009 We’ll let Maple check the examples we worked by hand: > restart: #clear old commands > with(plots): #load plotting package of commands > with(stats): #load statistics package of commands Notice that the following commands use a lot of parenthesis, so be careful when you try your own examples. Example 1 > XY1:=[[0,1],[2,2],[4,1]]: > PlotXY1:=pointplot(XY1): display(PlotXY1); > fit[leastsquare[[X,Y]]]([[0,2,4],[1,2,1]]); #syntax: [X,Y] says we are calling the coords "X" and "Y". #what follws is the list of "X" values, and then the #corresponding "Y" values.... # so the points are [0,1],[2,2],[4,1] > Line1:=plot(4/3, X=-1..5,Y=-1..3, color=black): #I picked a slightly larger X-Y box than the points were in display({Line1,PlotXY1}, title=‘Example 1 least squares fit‘); Example 2 > XY2:=[[0,3],[1,5],[2,5]]: > PlotXY2:=pointplot(XY2): display(PlotXY2); > fit[leastsquare[[X,Y]]]([[0,1,2],[3,5,5]]); > Line2:=plot(10/3 + X, X=-1..3,Y=-1..6, color=black): display({Line2,PlotXY2}, title=‘Example 2 least squares fit‘); Example 3 > XY3:=[[0,3],[1,5],[2,6],[3,7]]: > PlotXY3:=pointplot(XY3): display(PlotXY3); > > fit[leastsquare[[X,Y]]]([[0,1,2,3],[3,5,6,7]]); > Line3:=plot(33/10 + (13/10) *X, X=-1..4,Y=-1..8, color=black): display({Line3,PlotXY3}, title=‘Example 3 least squares fit‘); > How do you test for power laws? What if our points don’t lie in a straight line? What if it looks like it should be a quadratic function or even a square root function? In other words, we want to see if there is a power m and a proportionality constant b so that the formula y = b xm effectively mirrors the real data. Taking (e.g. natural) logarithms of the proposed power law yields ln(y ) = ln(b ) + m ln(x ). So, if we write Y = ln(y ) and X = ln(x ), B = ln(b ), this becomes the equation of a line in the new variables X and Y: Y = mX + B Thus, in order for there to be a power law for the original data, the ln-ln data should (approximately) satisfy the equation of a line, and vise verse. If we get a good line fit to the ln-ln data, then the slope m of this line is the power relating the original data, and the exponential e B of the Y-intercept is the proportionality constant b in the original relation y = b x m. With real data it is not too hard to see if the ln-ln data is well approximated by a line, in which case the original data is well-approximated by a power law. Astronomical example: As you may know, Isaac Newton was motivated by Kepler’s (observed) Laws of planetary motion to discover the notions of velocity and acceleration, i.e. differential calculus, along with the inverse square law of planetary acceleration around the sun.....from which he deduced the concepts of mass and force, and that the universal inverse square law for gravitatonal attraction was the ONLY force law depending only on distance between objects, which was consistent with Kepler’s observations! Kepler’s three observations were that (1) Planets orbit the sun in ellipses, with the sun at one of the ellipse foci. (2) A planet sweeps out equal angles from the sun, in equal time intervals, independently of where it is in its orbit. (3) The square of the period of a planetary orbit is directly proportional to the cube of the orbit’s semi-major axis. So, for roughly circular orbits, Keplers third law translates to the statement that the period is proportional to R 1.5, where R is the approximate radius. Let’s see if that’s consistent with the following data: Planet Mercury Earth Jupiter Uranus Pluto mean distance R from sun (in astronomical units where 1=dist to earth) 0.387 1. 5.20 19.18 39.53 Orbital period T (in earth years) 0.241 1. 11.86 84.0 248.5 > restart:with(linalg):with(plots):with(stats): > Rs:=[0.387,1.,5.20,19.18,39.53]; Ts:=[.241,1.,11.86,84.0,248.5]; pts:=[seq([Rs[i],Ts[i]],i=1..5)]; > data:=pointplot(pts): display(data); > lnRs:=map(ln,Rs); #get ln-ln data lnTs:=map(ln,Ts); lnpts:=[seq([lnRs[i],lnTs[i]],i=1..5)]; > lnlndata:=pointplot(lnpts): display(lnlndata); Notice the ln-ln plot really does seem close to a line! So maybe there is a power law for the original data: > fit[leastsquare[[lnR,lnT]]]([ lnRs,lnTs]);#this is the bizarre syntax We can paste in the equation of the line and see how well we did. > line:=plot(.4868929851e-3+1.499816412*lnR, lnR=-2..6,lnT=-2..6, color=black): display({lnlndata,line}, title=‘least squares fit‘); Finally, we can go back from the least squares line fit to a power law > m:=1.499816412; #power b:=exp(0.0004868929851); #proportionality constant Notice in the command lines below how to (i) make a title (ii) get the axes labeled "g" and "c" for number of genes and number of cells (iii) get the display to include appropriate ranges - to contain all our data (iv) first create a picture of the power function, and then display its graph along with the original gene/cell data points. > powerplot:=plot(b*R^m,R=0..50,T=0..300,color=black): > display({powerplot,data},title= ‘power law for planetary period as a function of radius‘); #by calling the variables R and T, and giving their ranges, I #get Maple to label the axes as I want > >