Math 1106 – Elementary Applied Calculus
Textbook: Bittinger’s Calculus and Its Applications
Post-Lecture Notes for Chapter 7, “Functions of Several Variables”
After each lecture, some summary and supplemental thoughts about the topics discussed in the lecture will be added to this document. However, not everything said in the lecture will be listed here, and students are expected to attend each lecture to obtain all pertinent information.
In Math Modeling, we usually start with a set of points that represent a function of one variable. Perhaps a good one to use is the first one encountered in the KSU Math 1101 Mathematical Modeling course, which tracks the increase in concentration of CO
2
in Earth’s atmosphere over a period of years:
Year
1965
1970
1980
1990
1995
CO
2
Concentration (in parts per million)
319.9
325.5
338.5
354.0
360.7
The student is taught to enter that data into the TI-83 calculator’s two lists named L
1
and L
2
, remembering to enter not the actual year but rather the number of years it’s been since 1965. After doing so, the calculator’s list editor displays the data like this:
Then a scatter-plot data is displayed …
… and the student is asked to come up with a linear equation that closely matches the trend displayed by the data. Pressing the right buttons, the student obtains a linear regression equation from the mysterious calculator. In this case, the mathematical model for CO2 concentration over time is: CO = 1 .
38 t + 319 .
02
2
C ( t ) parts per million, t years since 1965.
And superimposing the graph of this function on top of the scatter-plot gives this picture of a line that pretty well matches the trend of the data:
But before the students were taught how to push the buttons to obtain that linear regression equation, they were led to investigate various straight lines that might be used as a model for the data. Perhaps they’d choose two of the points at random, and work out the exact equation for the straight line that runs through those two points (using the point-slope formula). Those lines would have slightly different slopes and/or y intercepts from each other and the linear regression line. And for each such model, the students would be asked to obtain a “score” for the equation, one that could be compared against the scores for lines worked out by other student groups, with a view towards choosing the one with the lowest score (like in golf, the lower the score, the better the fit).
Invariably, the linear regression line would earn the lowest score.
The “sum of the squares of the distances” measure is the one that is generally agreed upon as being the scoring mechanism that is a fair way of adjudging a line’s fit versus the given data. As it turns out, the linear regression line is tailored to work out to be the one with that lowest score!
In other words, if we have a set of n points (i.e., a set of ordered x-y pairs), as we do in this case, then represent each one as ( x i
, y i
), 1 ≤ i ≤ n .
Represent a candidate straight line as f ( x ) = mx + b . Then the score for any such line, as to how well it represents the given points, is a function of the line’s two characteristic numbers: its slope ( m ) and its y -intercept ( b ).
Use that function, with the x -values of the data, to determine the corresponding y -values. Thus, the score of any such line’s “fitness” for the n points of data becomes the output of the scoring function that has two input variables:
SSD ( m , b ) = n
( f ( x i
) − y i
) 2 . i = 1
This 2-input variable function (the name SSD conjures up the words “Sum of the Squares of the Differences”) is the sum of the squares of the n differences between the height of a data point and the height of the line for the function’s value using the point’s x -value. Another way to write this function’s equation is by substituting mx i
+ b where f ( x i
) appears:
SSD ( m , b ) = n
( mx i
+ b − y i
) 2 . i = 1
This function is the scoring function. We want to know when (that is, for what value of m and what value of b ) it gives the lowest score. And that’s
what derivatives can help us to find, if we are able to discover where the two derivatives (remember, there are two partial derivatives since this is a function of two input variables) are simultaneously equal to zero.
These are the two partial derivatives for the function SSD (one of them with respect to the variable m and the other with respect to the variable b ): n
SSD m
= 2 ( mx i
+ b − y i
) ⋅ x i i = 1
SSD b
= n
2 ( mx i
+ b − y i
) ⋅ 1 . i = 1
Don’t forget: in the first result, m is the variable and everything else is a constant. So, the derivative of mx i
is simply x i
; and don’t forget that the chain rule mandates that after using the power rule on the parenthesized squared expression, there’s a final factor (multiplier) that is the derivative
(with respect to m ) of that inside expression. And in the second result, with b being the variable and everything else regarded as a constant, the chain rule’s final factor is the derivative of b : which is simply 1.
So now we’ve gotten the two partial derivatives. The graph of the original function of two variables will reach its lowest point when both derivatives are equal to zero simultaneously.
That means to set each derivative equal to 0 and solve them for their common answer. A reminder: these will be two linear equations with two variables, m and b . In effect, each one can be plotted as a straight line
(on a coordinate system whose axes are m and b ), and we’re going to look for the place where they intersect. There is a technique involving matrices that makes easy work of finding the common solution.
Here is the system of two linear equations (to simplify matters, the 2 gets dropped by dividing both sides by 2): i n
= 1 n
( mx i
2 + bx i
− x i y i
) = 0 or equivalently i n
= 1 n mx i
2 + i n n
= 1 bx i
= n n i = 1 x i y i
( mx i
+ b − y i
) = 0 mx i
+ b = y i i = 1 i = 1 i = 1 i = 1
The latter pair can be written in matrix notation as AX = B , with the three matrices as spelled out below: n n n i = n
1 i = 1 x i
2 x i i = 1 n x i m b
= i = 1 n i = 1 x i y i y i
.
Right now, for a student who is turned on by this investigation, would be a good time to ask yourself, “I wonder what the matrices A and B would look like if the data was in the shape of a quadratic line (a parabola) instead?”
Well, first off, the equation for a second-degree polynomial (a quadratic function) involves three coefficients rather than just 2, so the SSD function turns out to have three input variables, and so there will be three partial derivatives. And so the matrix A will be a 3-by-3 matrix and the matrix B will be 3-by-1. Once you correctly work out what they look like, it may be obvious, from comparing the patterns, what the matrices will be for a cubic regression, or a quartic regression, etc.
Back to the 2-by-2 situation. The goal is to prove that there is one and only one pair of numbers for m and b that nails down the best-fit linear regression line for the set of data. If you recall how matrix multiplication and matrix inverses work, that means: n n
− 1 n m b
= i = n
1 i = 1 x x i
2 i i = 1 n x i i = 1 n i = 1 x i y i y i
.
From that, it’s just messy arithmetic to develop what the precise values of m and b are.
If you have a really good memory for how the inverse of a 2-by-2 matrix is developed, this becomes: m b
= n n i = 1 x i
2 −
1 n i = 1 x i
2
− n n i = 1 x i
− n n i = 1 x i
2 x i i = 1
⋅ x i n i = 1 n i = 1 x i
2 x i y y i i
= n n i = 1 x i
2 −
1 n i = 1 x i
2
− n n i = 1 n i = 1 x i x i n i = 1 y i x i
− y i n i = 1
+ n i = 1 n i = 1 y n i i = 1 y i
This all works very nicely in theory, and I have actually written a small program for my TI-83 that reads the data from L
1
and L
2
and prompts me for what type of regression to do (linear, quadratic, cubic, etc.), and it uses the matrix formula A -1 B to tell me the values for the coefficients of the regression equation. But theory runs aground on the processing limits of the TI-83 calculator, and the program goes belly up for higher-degree polynomials. For that reason, it’s best to work out the equation for each coefficient (as I’ve done above), because the TI-83 is a lot happier working with those, rather than trying to get the inverse of the matrix A . So I expect that’s what the TI-83 actually has for the way it creates the various types of polynomial regression equations (linear, quadratic, cubic, quartic, and quintic).
Before showing the standard formulas for m and b in the linear regression case, it helps to remember that some of those summation expressions seen above (combined with a bit of algebra) turn into nothing more than
“the average of all the x -values” and “the average of all the y -values” for the data points. So in the formulas below, you should know that stand for those two averages, respectively. And average of all the individual x i y i
products. m = n i = 1
XY x i
2
−
−
X n
* Y
( )
2
and b = Y − m X
X and Y
XY represents the
Let’s see. For the CO
2
data listed above, X = 15 , Y = 339 .
72 , XY = 5275 .
2
5 and x i
2 = 355 . And so plugging all of that into the formulas: i = 1 m =
5 * ( 5275 .
2
1775
−
−
15 * 339
5 ( 15 ) 2
.
72 )
=
5 * ( 5275 .
2
1775
− 5095 .
8 )
− 1125
=
5 * 179 .
4
650
=
897
650
= 1 .
38 and b = 339 .
72 − 1 .
38 * 15 = 339 .
72 − 20 .
7 = 319 .
02 which is exactly what the calculator came up with as the two numbers in the equation of its linear regression line. Hurray!
No, you don’t need to know all this for the final examination! Just, as I said before, you should be able to get the 2 partial derivatives as shown in the colorized formulas above.
But I still enjoyed writing it!