Linear Algebraic Equations Systems of linear algebraic equations arise in all walks of life. They represent the most basic type of system of equations and they’re taught to everyone as far back as 8-th grade. Yet, the complete story about linear algebraic equations is usually not taught at all. What happens when there are more equations than unknowns or fewer equations than unknowns? These are precisely the questions that are answered below. Before we processed with this, there’s some background material that we’ll need to learn. We’ll first need to discuss ways to minimize a function of several variables. Then, we’ll need to understand how to do this using a matrix-vector notation. After this is done, we’ll be able to look at linear algebraic equations. How to Minimize a Function of Several Variables The best way to introduce this topic is with an example. Let’s minimize the function (1) J x2 y2 where x and y are both constrained to lie on the line (2) y ax b. This problem amounts to finding the point on the line that is closer to the origin than any other point on the line. There are several ways to solve this problem. Let’s look at the following three ways. The first way to solve this problem is to use geometry. Draw a perpendicular to the line that intersects the origin. The equation of the perpendicular line is y = -x/a. Substituting the equation for the perpendicular line into the equation for the line yields the intercepts (3) x0 ab b , y0 . 2 1 a 1 a2 The second way to solve this problem is to recognize that this problem is a constrained optimization problem; a problem in which a function is minimized while being subjected to a constraint. The constrained minimization problem is converting into an unconstrained minimization problem. This is done using a substitution step. The constraint, Eq. (2), is substituted into the minimizing function, Eq. (1), which yields (4) J x 2 (ax b) 2 . The minimum of J is now found by taking the derivative of J with respect to x and setting it to zero. This yields (5) 0 2 x 2(ax b)a, which again leads to the answer given in Eq. (3). The third way of solving this problem is also done by converting the constrained minimization problem into an unconstrained minimization problem. However, this time, no substitution step will be needed in creating the unconstrained minimization problem. This third method, which is the method that we’ll later employ when we look at solving linear algebraic equations, proceeds by first writing the constraint as (6) f y ax b 0. We then define the augmented minimizing function (7) J a J f x 2 y 2 ( y ax b). where is a new variable. The minimum of Ja subject to no constraints is the same as the minimum of J subject to the constraint, Eq. (2). (This will be shown to be true in a moment.) In other words, we have replaced the constrained minimization problem (Eqs. (1) and (2)) with an unconstrained minimization problem (Eq. (7)). Although Ja and J are different functions, their values are the same at the minimum, implying from Eq. (7) that the constraint is satisfied at the minimum (f = 0). Let’s now show that indeed J = Ja at the minimum. To do this, first notice that Ja is a function of 3 variables – x, y, and . Thus, the minimum of Ja satisfies the three conditions J a J f , x x x J J f 0 a , y y y J 0 a f. 0 (8a-c) We see from Eq. (8c) that the constraint is satisfied at the minimum. In order to conclude that minimizing Ja also minimizes J, we’ll need to show that Eqs. (8a,b) imply that dJ 0. From Eqs. (8a,b) we get, dx dJ J J 1 J J dy dx dy dx x y dx x y dx J J f x y x f J J J J 0. y x y x y Hence, the minima of Ja and J are the same. From Eqs. (7) and (8), the specific minimizing conditions are 0 2 x a , (9a-c) 0 2 y , 0 y ax b. The solution yields Eq. (3), as expected. Matrix-vector notation The functions that we’ll be minimizing shortly will be expressed using a compact matrix-vector notation. We’ll see functions like (10a-c) where J 1 x T x, J 2 x T y , and J 3 x T Ax, a11 x1 x1 a x2 x2 21 x ,y , A x n 1 x n 1 a( n 1)1 x x a n1 n 1 n 1 a12 a 22 a( n 1) 2 an2 a1( n 1) a 2( n 1) a( n 1)( n 1) a n ( n 1) . a( n 1) n a nn a1n a2n The T means transpose rows and columns. Notice by transposing a vector or a matrix twice that we get back the original vector or matrix. Also, notice that the transpose of a 1 x 1 vector (which is called a scalar) yields the original scalar. The derivatives of the functions J with respect to the coordinates x1, x2, x3,… , xn-1, xn will be placed in a vector as follows: (11) J x1 J J x 2 . x J x n 1 J x n Thus, you can verify quite easily that the derivatives of the functions J given in Eqs. (10) with respect to x are (11a-d) J J 1 J 2x, 2 y, and 3 ( A AT )x. x x x Finally, it will be useful to recognize the following two properties of the inverse of a square matrix and the transpose of a matrix: First, for any product of matrices, we have (12) A1 A2 Am T Am Am1 A1 . T T T Notice that the order of the multiplication of the matrices reverses itself. Secondly, we have (13) A1 A2 Am 1 Am 1 Am11 A11. Notice again that the order of the multiplication of the matrices reverses itself. Types of Linear Algebraic Equations We are now ready to look at systems of linear algebraic equations. A system of linear algebraic equations can be written as a11 x1 a12 x 2 a1( n 1) x n 1 a1n x n b1 , a 21 x1 a 22 x 2 a 2( n 1) x n 1 a 2 n x n b2 , (14) a ( n 1)1 x1 a ( n 1) 2 x 2 a ( n 1)( n 1) x n 1 a ( n 1) n x n bn 1 , a n1 x1 a n 2 x 2 a n ( n 1) x n 1 a nn x n bn . Notice that there are m equations and n unknowns. The number of equations and the number of unknowns can be different from one another. When there are fewer equations than unknowns, the system of equations is referred to as under-determined or under- constrained. In this case there are infinitely many possible solutions to Eq. (14). When the number of equations is equal to the number of unknowns, the system is referred to as uniquely determined or uniquely constrained, referring to the fact that the solution is unique. When there are more equations than unknowns, the system is referred to as overdetermined or over-constrained. In this case, there is no exact solution to the problem although approximate solutions are possible. Using a matrix-vector notation, Eq. (14) is written as (15) Ax b where A is a m x n matrix, x is a n x 1 vector, and b is a m x 1 vector. Throughout the remainder of this write-up, we shall assume that the equations are not linear combinations of each other. Mathematically, this implies that the rank of the matrix is the smaller of n and m, written rank (A) = min (n, m). This is called the full rank condition. By assuming that the rank of the matrix is full, it follows that the inverse of the m x m matrix ATA exists when m n, and that the inverse of the n x n matrix AAT exists when n m. These are results that we’ll later need. m<n m=n m>n Under-determined Under-constrained Uniquely determined Uniquely constrained Over-determined Over-constrained Minimum Norm Solution Exact Solution Least Square Solution Under-determined Systems: Minimum Norm Solutions When there are fewer equations than unknowns, as stated above, there are infinitely many solutions. One of the solutions that is frequently desirable is the one that has the smallest norm (size). The squared norm is defined as (16) J x1 x2 xn xT x. 2 2 2 The minimum norm problem is to minimize Eq. (16) subject to the linear algebraic equation constraints (17) f Ax b 0 Let’s now convert this constrained minimization problem into an unconstrained minimization problem. We replace J with the augmented function (18) J a J (1 f1 2 f 2 m1 f m1 m f m ) J λ T f . The augmented function is a function of x and which add up to n + m unknowns. The minimum of Ja satisfies the m + n equations J a 2x A T λ , x J 0 a Ax b. λ 0 (19a,b) Substituting the x in Eq. (19a) into Eq. (19b), pre-multiplying the result by A, and then pre-multiplying the result by (AAT)-1 yields λ 2( AAT ) 1 b. Substituting this back into Eq. (19a) yields the minimum norm solution to under-determined linear algebraic equations (20a,b) x A**b, A** AT ( AAT ) 1 in which A** is called the pseudo-inverse of A for under-determined systems. Uniquely determined Systems: Exact Solutions When the number of equations is equal to the number of unknowns, there is only one unique solution to the problem. Simply pre-multiply Eq. (15) by A-1 to get the exact solution to uniquely determined linear algebraic equations (21) x A 1b. Over-determined Systems: Least Squares Solutions When the number of equations is greater than the number of unknowns, there is no exact solution. Any and all solutions will be approximate. The least squares solution is the best approximate solution in the sense that it minimizes the squared norm (size) of the error (22) J eT e ( Ax b) T ( Ax b). This is an unconstrained minimization problem. We simply need to minimize J with respect to the n unknown values of x. Thus, (23) 0 J T T x A Ax 2b T Ax b T b 2( AT A)x 2 AT b. x x Pre-multiplying this by -½(ATA)-1 yields the least squares solution to over-determined linear algebraic equations (24a,b) x A**b, A** ( AT A) 1 AT in which A** is called the pseudo-inverse of A for over-determined systems.