Types of Linear Algebraic Equations

advertisement
Linear Algebraic Equations
Systems of linear algebraic equations arise in all walks of life. They represent the
most basic type of system of equations and they’re taught to everyone as far back as 8-th
grade. Yet, the complete story about linear algebraic equations is usually not taught at all.
What happens when there are more equations than unknowns or fewer equations than
unknowns? These are precisely the questions that are answered below.
Before we processed with this, there’s some background material that we’ll need
to learn. We’ll first need to discuss ways to minimize a function of several variables.
Then, we’ll need to understand how to do this using a matrix-vector notation. After this is
done, we’ll be able to look at linear algebraic equations.
How to Minimize a Function of Several Variables
The best way to introduce this topic is with an example. Let’s minimize the
function
(1)
J  x2  y2
where x and y are both constrained to lie on the line
(2)
y  ax  b.
This problem amounts to finding the point on the line that is closer to the origin
than any other point on the line. There are several ways to solve this problem. Let’s look
at the following three ways.
The first way to solve this problem is to use geometry. Draw a perpendicular to
the line that intersects the origin. The equation of the perpendicular line is y = -x/a.
Substituting the equation for the perpendicular line into the equation for the line yields
the intercepts
(3)
x0  
ab
b
, y0 
.
2
1 a
1 a2
The second way to solve this problem is to recognize that this problem is a
constrained optimization problem; a problem in which a function is minimized while
being subjected to a constraint. The constrained minimization problem is converting into
an unconstrained minimization problem. This is done using a substitution step. The
constraint, Eq. (2), is substituted into the minimizing function, Eq. (1), which yields
(4)
J  x 2  (ax  b) 2 .
The minimum of J is now found by taking the derivative of J with respect to x and setting
it to zero. This yields
(5)
0  2 x  2(ax  b)a,
which again leads to the answer given in Eq. (3).
The third way of solving this problem is also done by converting the constrained
minimization problem into an unconstrained minimization problem. However, this time,
no substitution step will be needed in creating the unconstrained minimization problem.
This third method, which is the method that we’ll later employ when we look at solving
linear algebraic equations, proceeds by first writing the constraint as
(6)
f  y  ax  b  0.
We then define the augmented minimizing function
(7)
J a  J  f  x 2  y 2   ( y  ax  b).
where  is a new variable. The minimum of Ja subject to no constraints is the same as the
minimum of J subject to the constraint, Eq. (2). (This will be shown to be true in a
moment.) In other words, we have replaced the constrained minimization problem (Eqs.
(1) and (2)) with an unconstrained minimization problem (Eq. (7)). Although Ja and J are
different functions, their values are the same at the minimum, implying from Eq. (7) that
the constraint is satisfied at the minimum (f = 0). Let’s now show that indeed J = Ja at the
minimum. To do this, first notice that Ja is a function of 3 variables – x, y, and . Thus,
the minimum of Ja satisfies the three conditions
J a J
f

 ,
x
x
x
J
J
f
0 a 
 ,
y
y
y
J
0  a  f.

0
(8a-c)
We see from Eq. (8c) that the constraint is satisfied at the minimum. In order to conclude
that minimizing Ja also minimizes J, we’ll need to show that Eqs. (8a,b) imply that
dJ
 0. From Eqs. (8a,b) we get,
dx
dJ  J
J  1 J J dy
  dx 
dy



dx  x
y  dx x y dx
J J  f


x y  x
f  J J  J
J 
 
 

   0.
y  x y  x
y 
Hence, the minima of Ja and J are the same. From Eqs. (7) and (8), the specific
minimizing conditions are
0  2 x  a ,
(9a-c)
0  2 y  ,
0  y  ax  b.
The solution yields Eq. (3), as expected.
Matrix-vector notation
The functions that we’ll be minimizing shortly will be expressed using a compact
matrix-vector notation. We’ll see functions like
(10a-c)
where
J 1  x T x, J 2  x T y , and J 3  x T Ax,
 a11
 x1 
 x1 




 a
 x2 
 x2 
 21




x   ,y   , A   





 x n 1 
 x n 1 
a( n 1)1
x 
x 
 a n1
 n 1 
 n 1 

a12
a 22

a( n 1) 2
an2
 a1( n 1)
 a 2( n 1)


 a( n 1)( n 1)
 a n ( n 1)



.

a( n 1) n 
a nn 
a1n
a2n

The T means transpose rows and columns. Notice by transposing a vector or a matrix
twice that we get back the original vector or matrix. Also, notice that the transpose of a 1
x 1 vector (which is called a scalar) yields the original scalar. The derivatives of the
functions J with respect to the coordinates x1, x2, x3,… , xn-1, xn will be placed in a vector
as follows:
(11)
 J 


 x1 
 J 


J  x 2 

 .
x  J 


 x n 1 
 J 


 x n 
Thus, you can verify quite easily that the derivatives of the functions J given in Eqs. (10)
with respect to x are
(11a-d)
J
J 1
J
 2x, 2  y, and 3  ( A  AT )x.
x
x
x
Finally, it will be useful to recognize the following two properties of the inverse of a
square matrix and the transpose of a matrix: First, for any product of matrices, we have
(12)
 A1 A2  Am T
 Am Am1  A1 .
T
T
T
Notice that the order of the multiplication of the matrices reverses itself. Secondly, we
have
(13)
 A1 A2  Am 1  Am 1 Am11  A11.
Notice again that the order of the multiplication of the matrices reverses itself.
Types of Linear Algebraic Equations
We are now ready to look at systems of linear algebraic equations. A system of
linear algebraic equations can be written as
a11 x1  a12 x 2   a1( n 1) x n 1  a1n x n  b1 ,
a 21 x1  a 22 x 2   a 2( n 1) x n 1  a 2 n x n  b2 ,
(14)

a ( n 1)1 x1  a ( n 1) 2 x 2   a ( n 1)( n 1) x n 1  a ( n 1) n x n  bn 1 ,
a n1 x1  a n 2 x 2   a n ( n 1) x n 1  a nn x n  bn .
Notice that there are m equations and n unknowns. The number of equations and the
number of unknowns can be different from one another. When there are fewer equations
than unknowns, the system of equations is referred to as under-determined or under-
constrained. In this case there are infinitely many possible solutions to Eq. (14). When
the number of equations is equal to the number of unknowns, the system is referred to as
uniquely determined or uniquely constrained, referring to the fact that the solution is
unique. When there are more equations than unknowns, the system is referred to as overdetermined or over-constrained. In this case, there is no exact solution to the problem
although approximate solutions are possible.
Using a matrix-vector notation, Eq. (14) is written as
(15)
Ax  b
where A is a m x n matrix, x is a n x 1 vector, and b is a m x 1 vector. Throughout the
remainder of this write-up, we shall assume that the equations are not linear combinations
of each other. Mathematically, this implies that the rank of the matrix is the smaller of n
and m, written rank (A) = min (n, m). This is called the full rank condition. By assuming
that the rank of the matrix is full, it follows that the inverse of the m x m matrix ATA
exists when m  n, and that the inverse of the n x n matrix AAT exists when n  m. These
are results that we’ll later need.
m<n
m=n
m>n
Under-determined
Under-constrained
Uniquely determined
Uniquely constrained
Over-determined
Over-constrained
Minimum Norm
Solution
Exact Solution
Least Square Solution
Under-determined Systems: Minimum Norm Solutions
When there are fewer equations than unknowns, as stated above, there are
infinitely many solutions. One of the solutions that is frequently desirable is the one that
has the smallest norm (size). The squared norm is defined as
(16)
J  x1  x2   xn  xT x.
2
2
2
The minimum norm problem is to minimize Eq. (16) subject to the linear algebraic
equation constraints
(17)
f  Ax  b  0
Let’s now convert this constrained minimization problem into an unconstrained
minimization problem. We replace J with the augmented function
(18)
J a  J  (1 f1  2 f 2    m1 f m1  m f m )  J  λ T f .
The augmented function is a function of x and  which add up to n + m unknowns. The
minimum of Ja satisfies the m + n equations
J a
 2x  A T λ ,
x
J
0  a  Ax  b.
λ
0
(19a,b)
Substituting the x in Eq. (19a) into Eq. (19b), pre-multiplying the result by A, and then
pre-multiplying the result by (AAT)-1 yields λ  2( AAT ) 1 b. Substituting this back into
Eq. (19a) yields the minimum norm solution to under-determined linear algebraic
equations
(20a,b)
x  A**b, A**  AT ( AAT ) 1
in which A** is called the pseudo-inverse of A for under-determined systems.
Uniquely determined Systems: Exact Solutions
When the number of equations is equal to the number of unknowns, there is only
one unique solution to the problem. Simply pre-multiply Eq. (15) by A-1 to get the exact
solution to uniquely determined linear algebraic equations
(21)
x  A 1b.
Over-determined Systems: Least Squares Solutions
When the number of equations is greater than the number of unknowns, there is
no exact solution. Any and all solutions will be approximate. The least squares solution is
the best approximate solution in the sense that it minimizes the squared norm (size) of the
error
(22)
J  eT e  ( Ax  b) T ( Ax  b).
This is an unconstrained minimization problem. We simply need to minimize J with
respect to the n unknown values of x. Thus,
(23)
0
J
 T T

x A Ax  2b T Ax  b T b  2( AT A)x  2 AT b.
x x


Pre-multiplying this by -½(ATA)-1 yields the least squares solution to over-determined
linear algebraic equations
(24a,b)
x  A**b, A**  ( AT A) 1 AT
in which A** is called the pseudo-inverse of A for over-determined systems.
Download