overview of real linear algebra

advertisement
OVERVIEW OF REAL LINEAR ALGEBRA
LARRY SUSANKA
Date: April 13, 2014.
1
2
LARRY SUSANKA
OVERVIEW OF REAL LINEAR ALGEBRA
3
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
✸ Opening Remarks
Sets and Set Operations
✸ First Steps
Rn and Euclidean Space
✸ The Association Between “Arrow-Vectors” and Rn
✸ Position Vectors
Parametric and Point-Normal Equations for Lines and Planes
Systems of Linear Equations
Einstein Summation Convention
Matrices
The Identity and Elementary Matrices
Special Matrices
Row Reduction
Matrix Form For Systems of Linear Equations
Example Solutions
Determinants Part One: The Laplace Expansion
Determinants Part Two: Everything Else You Need to Know
Linear Transformations from Rn to Rm
Eigenvalues
Real Vector Spaces and Subspaces
Basis for a Vector Space
The Span of Column Vectors
A Basis for the Intersection of Two Vector Subspaces of Rn
✸ Solving Problems in More Advanced Math Classes
Dimension is Well Defined
Coordinates
✸ Position Vectors and Coordinates
Linear Functions Between Vector Spaces
Change of Basis
Effect of Change of Basis on the Matrix for a Linear Function
✸ Effect of Change of Basis on a Position Vector
✸ Effect of Change of Basis on a Linear Functional
Effect of Change of Basis on the Trace and Determinant
Example: How To Use Convenient Bases Efficiently
Bases Containing Eigenvectors
Several Applications
Approximate Eigenvalues and Eigenvectors
Nullspace, Image, Columnspace and Solutions
More on Nullspace and Columnspace
Rank Plus Nullity is the Domain Dimension
Sum, Intersection and Dimension
Direct Sum
A Function with Specified Kernel and Image
7
8
11
18
24
26
29
31
33
34
36
38
40
42
44
50
51
55
57
58
63
66
68
71
72
73
74
76
78
80
81
83
85
86
89
91
95
98
99
101
102
103
105
4
LARRY SUSANKA
44. Inner Products
45. The Matrix for an Inner Product
46. Orthogonal Complements
47. Orthogonal and Orthonormal Bases
48. Projection onto Subspaces in an Inner Product Space
49. A Type of “Approximate” Solution
50. Embedding an Inner Product Space in Euclidean Space
51. Effect of Change of Basis on the Matrix for an Inner Product
52. A Few Facts About Complex Matrices and Vectors
53. Real Matrices with Complex Eigenvalues
54. Real Symmetric Matrices
55. Real Skew Symmetric Matrices
56. Orthonormal Eigenbases and the Spectral Theorem
57. The Schur Decomposition
58. Normal Matrices
59. Real Normal Matrices
60. The LU Decomposition
61. The Singular Value Decomposition
62. ✸ Loose Ends and Next Steps
Index
106
107
109
111
112
113
113
114
115
117
118
120
121
122
124
125
127
129
131
137
OVERVIEW OF REAL LINEAR ALGEBRA
5
6
LARRY SUSANKA
OVERVIEW OF REAL LINEAR ALGEBRA
7
1. ✸ Opening Remarks.
In these notes1 I will give a kind of narrative of those ideas encountered in
our one-quarter introductory Linear Algebra class, with additional material
that could be used to fill out a semester course. They constitute a synopsis
of the pertinent points in a natural order in a conversational but condensed
style. This is what I would have liked to have said, and what I wish to have
understood by successful students in the class.
To the student:
You might use these notes as review for the final, or tick off the sections
as we come to them to keep track of our progress. I also give advice here
and there about how to think of these things, and how to study them, and
tools and techniques you can use to solve the assigned problems.
You are well advised to read over these notes very carefully, at least the
sections we actually cover. They tell you what I think is important and will,
if nothing else, tip you off as to potential test questions.
Mathematics is a foreign language, and you cannot understand that language without a dictionary. This is particularly true for those studying
Linear Algebra for the first time. Step one must be to learn vocabulary:
I recommend “flash cards” with the technical words you encounter on one
side and the definitions (perhaps with one or two key examples) on the
other. The notes here can be a guide for the creation of these flash cards.
Without memorizing the vocabulary we use you cannot even participate in
the discussion. You can’t succeed. If you do memorize the vocabulary, the
difficulties become manageable. Really.
In this class you are being asked to do things that are different in character
from the math you have—most likely—seen before. Calculation is part of the
job, but here only part. The real question is to determine which calculation
needs to be done, and for what purpose. You will be asked to (gasp) prove2
things. The class is not about calculations alone, though we will learn to do
some rather intricate ones, but the ideas which inspire them as well.
Linear Algebra is a case study in the incredible power of generalization
employed in service of certain types of practical problems. These are situations which are unmanageably complex in their natural context, but when
stripped of unnecessary extra features are seen to be special cases of vector
spaces, linear functions, inner products and so on. All of our theorems and
compact notation then apply.
1Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
2
You will not, of course, be asked to prove theorems. You will be asked to understand
what a few theorems state. You will need to verify that the conditions in the statement
of the theorem hold, so that the theorem actually applies to what you are doing. You will
then draw a conclusion.
8
LARRY SUSANKA
Linear Algebra is not about Kirchhoff’s Laws, analysis of rotating bodies,
a model of an economy with n products, sequences defined recursively, representation of forces in the structural members of a building, maximum or
minimum of functions with many independent variables, Quantum Mechanics, signal analysis and error correction, the study of curved surfaces, Markov
probability models, analysis of n different food items for m different nutrient
values, stresses inside a crystal, models of ecosystems or approximations in
the study of air-flow over a wing.
Instead, it is about techniques absolutely necessary to study all those
things efficiently, and many many more, all at once, with unified vocabulary
and technique. Your job here is to learn that vocabulary and technique.
Of necessity, most of the detail work must be done with hardware assistance, a computer program such as Maple, Mathematica, MATLAB or
Mathcad or, preferably (for us) a calculator. Embrace this necessity.
You should be able in principle to do everything by hand but the goal is
to find ways to do nothing whatsoever by hand, beyond checking here and
there to make sure your calculator has not meandered off into the ozone
without you. The calculations tend to become far too complex for humans
as the size “n” of the problems rise. Even if it is possible to do a particular
problem by hand (because an idea is being illustrated with a simple or
carefully “rigged” example) you should try in every case to find a way to do
it also with hardware.
Let me be blunt: The correct amount of arithmetic to perform in a calculation in this class is little (scales with n) or none (read off or enter
coefficients.) Most other calculations require a number of steps that scales
with n2 or n3 (hard but machinery can do it) or n! (essentially impossible
except in trivial cases.) If you find yourself wandering into a mess, think
again and try another way.
2. Sets and Set Operations.
In all human languages, the language of mathematics included, the words
are defined in terms of each other. Ultimate meaning is derived from introspection or pointing out the window.
Unlike other human languages, modern mathematics tries to keep strict
control over this process and reduce the number of undefined “ultimate root”
words to a minimum. The “grammar” of mathematics is designed to help
us catch inconsistencies and logical error.
Two undefined items get mathematics started: the first is the word “set”
and the second is the “element of ” binary operation. One is supposed to
simply know what they mean, without definition.
Virtually all of mathematics is expressed in terms of the theory that comes
from these two undefined objects: Set Theory.
OVERVIEW OF REAL LINEAR ALGEBRA
9
I think of a set as a bag that can contain things. The things in the bag
are called the elements of the bag. A set is completely defined by the things
it contains. There are not “red bags” and “yellow bags” that contain the
same things but are different. If they have the same elements they are the
same set.
The statement “a is an element of the set B,” written in math notation
“a ∈ B,” will be true if, when you peek into B, you see a in there.
The relation A ⊂ B between sets is defined in terms of the “element of”
relation. A ⊂ B if every element of A is an element of B. So B contains, at
least, all the elements of A and possibly others too. It is read aloud “A is a
subset of B.”
One can write this in shorthand as
A⊂B
if and only if
a ∈ A ⇒ a ∈ B.
The right arrow stands for the word “implies.” The “if and only if” phrase
has a couple of uses. The first time you see it in use with some new mathematical object it says “we are about to define something.” After that, it
indicates a theorem to be proved.
There are two ways we will define sets in these notes. First, we can list
their elements as
{ 1, 2, 3, 4 }.
The curly brackets can be thought of as the “paper” of the bag. This set
has four elements, the numbers 1, 2, 3 and 4.
The next way is to describe the elements in some unambiguous fashion.
The notation for that is something like this:
{ x | x is a whole number between one and four, including one and four. }
The x to the left of the vertical bar is a variable letter that represents any
potential member of the set. It could be any unused symbol. The vertical
bar is read out loud as the words “such that.” After that is the condition
you will use to test x for membership in the set. If the statement is true for
a given x, that x is “in.” Otherwise it is “not in.”
A slight variant of this approach is to give some information about the
type of mathematical object you expect x to be to the left of the vertical
bar. For instance we let R denote the set of all real numbers and N denote
the natural numbers: the positive counting numbers and zero.
Then { x ∈ R | 1 ≤ x < 5 } is the interval of real numbers [1, 5), closed on
the left and open on the right. On the other hand { x ∈ N | 1 ≤ x < 5 } is
the set { 1, 2, 3, 4 }.
10
LARRY SUSANKA
This can be kind of tedious, but it has virtues. For instance sometimes
you see sets given by a notation that sort of “splits the difference” between
the two approaches. The notation
{ 1, 2, . . . }
means . . . what exactly? Most people would say it stands for the counting
numbers. Others might think it stands for integer powers of two, starting
with 20 . Others might even be more creative.
Use this last notation at your own risk. It is of necessity somewhat ambiguous.
If A and B are two sets we can create two new sets from them, A ∩ B
and A ∪ B. They are called the intersection of A and B, and the union
of A and B, respectively.
They are defined as
A ∩ B = { x | x ∈ A and x ∈ B } and
A ∪ B = { x | x ∈ A or x ∈ B }.
The word “and” in the definition of intersection has its usual English
meaning. x is in the intersection when it is in both A and B.
But the word “or” in the definition of union means something a bit different from its English usage. In math, “or” is not exclusive. So x is in the
union if it is in A or it is in B, or both if that is possible.
One interesting set is the “empty set.” It is the empty bag, the bag with
nothing in it. It is denoted ∅. The statement “a ∈ ∅” is always false. The
statement “∅ ⊂ B” is always true.
That is just about all you need to know about sets for now, except for one
more thing. We will be talking about “ordered sets” and these have some
additional structure. A raw set has no order. { 1, 2, 3, 4 } = { 2, 1, 3, 4 } =
{ 2, 2, 2, 1, 3, 4 }.
So in this class when we talk about an ordered set we will mean a set
together with something extra: an ordered labeling of the elements, a unique
integer label for each element, which must be specified along with the set.
2.1. Exercise.
Find the following sets:
T
T
(i) N { x ∈ R | .5 < x <
T 2.5 }. (ii){ x ∈ R | x < 2 } { x ∈
S R | x > 3 }.
(iii) { x ∈ N | x/6 ∈ N } { x ∈ R | x < 25 } (iv) { 1, 3, 5 } { 2, 7, 9 }
OVERVIEW OF REAL LINEAR ALGEBRA
11
3. ✸ First Steps.
Most people taking this class3 have been introduced to vectors at some
point—they are essential in the physical sciences and it is customary to discuss them in various earlier math classes too. This class is entirely concerned
with them, start to finish.
A vector is an object completely characterized by two quantities, which
we call magnitude and direction. The physical meaning of these quantities in an application of vectors comes from experience and varies from
application to application.
There are a number of things in the world with which you are no doubt
familiar that are commonly represented as vectors, and you should think a
bit about the meaning of “magnitude” and “direction” in each case.
• Displacement—a representation of a movement from a starting
place to an ending place, with emphasis on the “distance and direction from start-to-end” of the completed movement rather than how
or where it occurred.
• Velocity—a description of motion, whose magnitude is the speed
and whose direction “points the way.” The velocity would be the
displacement vector over one time unit, if the motion continued unchanged for the whole time unit.
• Forces—these describe “pushes” by one thing against another. A
force is the cause of acceleration. If you see changes in the motion of
something, it is because there is a force acting on that thing. No such
changes require that the resultant of all forces have zero magnitude.
• A representation of a uniform Wind or Current—in the air or
water. This example is tied to velocity. It can be interpreted as
the velocity of a dust particle swept up and carried along by an
unvarying wind or current.
Why should any mathematical construction (in our case, vectors) describe
faithfully a category of real-world experiences? I don’t know, really. It is an
interesting philosophical puzzle. But many mathematical abstractions do, at
least reasonably well. It is only through experience, conjecture bolstered by
many experiments, that we (i.e. physicists, engineers, you, me) can decide
that a mathematical entity is a reasonable tool to try to describe something
we see and want to understand better. We are going to study in this class
how vectors behave. Only you can decide if vectors mimic well aspects of
the world, such as those on the list above.
Mostly we will be working with pretty abstract mathematical objects,
ordered triples of real numbers, matrices and so on, and referring to them
as vectors and studying their properties. Precise definitions are necessary in
3Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
12
LARRY SUSANKA
mathematics, but we also need at least one way (preferably more than one
way) of thinking about these abstract definitions, to guide our calculations
and tell us which ideas are important and which are not. Luckily, there
are several excellent visualization guides of this kind for vectors and their
collective, vector spaces.
In this particular section we represent vectors as “arrows,” one of the
intuitive notions students usually encounter in earlier presentations, but
bear in mind these are not the more rigorous, more abstract, definitions
that will come later. They are (and throughout the text continue to be)
visual or intuitive aids to help us think about the ideas we encounter. In a
sense they are more basic, more primitive, more real than these later ideas.
You can draw or even buy an arrow, but an ordered triple of real numbers
is hard to find in a store.
Your instructor will, no doubt, illuminate the later more abstract discussion with numerous references to the pictures we see in this section. During
a discussion like that, the word “displacement” might be used synonymously
with “arrow-vector.”
An arrow has direction given by its shaft with
specified “tail-to-tip” orientation, and magnitude
given by its length.
We take the point of view that two arrows located
anywhere are merely instances of the “same”
arrow-vector, so long as each has the same length
and direction. So the arrow to the left represents
the “same” arrow-vector (or simply “vector,” for
short) as the one on the right above, even though
it is located at a different place.
You have plenty of experience with “fuzzing out” the distinctions among
things which are manifestly different but which exhibit similarities upon
which we wish to focus. For example, the fractions 3/4 and 6/8 represent
different ideas. In the first, you break the “whole” into 4 equal pieces, and
you have 3 of them. In the second, you break the “whole” into 8 equal
pieces, and you have 6 of them. With these differences, there is something
important that is similar about these two fractions: namely, I am just as full
if I eat 3/4 of a pizza or if I eat 6/8 of a pizza. We choose to focus on that
and we say 3/4 = 6/8. We gather together all the fractions “equal” to 3/4
and refer to the entire pile of them by picking any convenient representative,
such as that in lowest terms or with some specified denominator.
Similarly, when we speak of vectors represented using arrows, the arrows
above are different on the page, but by picking one of them we refer to both,
and any other arrow with the same magnitude and direction as well!
OVERVIEW OF REAL LINEAR ALGEBRA
13
There is a phrase used for this type of thing in mathematics. It is called
an equivalence relation. All the fractions equivalent to 3/4 are said to be
in the same equivalence class of fractions. The arrows equivalent to a given
specific arrow form, all together, an arrow-vector.
Two vectors A =
and B =
are added by finding the copy of B which has its tail on the nose of a copy
of A. The sum vector A + B consists of the arrow that starts with its tail
at the tail of this copy of A and ends with its tip at the tip of this copy of
B, together with all other arrows with the same magnitude and direction.
It is important to note the direction of A + B, in this case from left to right.
−B is the vector that looks just like B but with tip and tail switched:
B=
−B =
A positive constant k times a vector A is a new vector pointing in the
same direction as A but with length stretched (if k > 1) or shrunk (if
k < 1) by the factor k. Negative multiples of A are said to have direction
opposite to A.
2A =
and
(1/2)A =
A − B is defined to be A + (−B). So A − B is the vector on the left of
the picture:
14
LARRY SUSANKA
The process of adding two vectors is called vector addition. Multiplying
a vector by a constant is called scalar multiplication.
The vector with zero magnitude is hard to represent as an arrow: it is
called the zero vector, denoted 0. It doesn’t really have a direction—or
perhaps it has any direction. You pick. Context helps distinguishes it from
the number 0.
3.1. Exercise. You should satisfy yourself that:
A + B = B + A, A − A = 0 and that 2A = A + A.
On the far left is a picture
of the vector sum 2A − 3B. A
vector such as this, formed as a
sum of multiples of vectors, is
often referred to as a resultant
vector. It is also called a linear
combination of the vectors
involved, in this case A and B.
3.2. Exercise. (i) Draw a picture of 3C − 2D and C + 21 D where C and D
are given by:
(ii) Find the resultant of the linear combination 2V − 3W when V and W
are shown below:
OVERVIEW OF REAL LINEAR ALGEBRA
15
Now that we know how to add vectors, combine them into a single resultant vector, a natural next step might be to see how we can break them
into pieces in various ways. We will think about how to decompose a vector
into the sum of two others. One of these will be a multiple of a specified
vector and the other perpendicular to that specified vector. This is a very
important process in applications. The process of finding the “parallel part”
is called projection.
In order to create the picture of such a decomposition you need to know
only one thing not present in the first part of this section. You must have
a concept of perpendicularity, and be able to tell when two arrows are
perpendicular to each other by some method. In this section, the old standby
“eyeball” method will suffice.
A common and very important usage of vector decomposition occurs when
we consider force vectors. A classic example would be that of a box sliding
down a slanted board. The most obvious force here is the weight of the
box. But that force is directed straight down and the surface of the board
prevents movement in that direction. The right way to handle this is to
decompose the force caused by gravity into two perpendicular pieces: the
part that is straight into the surface of the board (the source of friction)
and the part that points along the line of the board. It is only this last part
which makes the box slide.
16
LARRY SUSANKA
Whatever the source of the vectors involved may be, we will draw some
pictures here to see “how to do it.” We want to learn how to decompose a
vector V into the sum of a vector P which is a multiple of some vector W
and another vector V − P which is perpendicular to W . We call the second
vector V − P because whenever V = P + A it must be that A = V − P,
so there is no point in introducing an independent name at this point for
the perpendicular part of the decomposition. Find below three different
decompositions of this type in picture form.
Notice two things about the pictures above: First, in each case P and
V − P are perpendicular to each other. Second, P is a multiple of W .
In constructing the decomposition, the length of W is irrelevant. The
only thing that is important about W is the “slope” of its shaft.
To create the decomposition, draw a fresh picture of a V and W pair from
one of the pictures above on a sheet of paper.
V should be somewhere in the middle and W off to the side for reference.
Draw a dotted line through the tail of the copy of V . The dotted line must
go along the same direction as W. Extend this dotted line a good bit on
either side of V , across the whole paper.
Next lay your pencil down on the paper. Put the eraser on this dotted
line with the point on the same side as V is on. Make the shaft of the pencil
perpendicular to this dotted line. This is where you need to know about
perpendicularity.
Slide the pencil up or down the dotted line, keeping it perpendicular to
the dotted line, till the pencil tip points at the tip of V or the shaft crosses
the tip of V . Stop. This gives the decomposition.
OVERVIEW OF REAL LINEAR ALGEBRA
17
3.3. Exercise.
(i) Draw a picture showing the decomposition as described above for the indicated
V and W on the left.
(ii) Draw a picture showing the decomposition as described above for the indicated
V and W on the left.
(iii) Decompose V into the
sum of two vectors, one
“along the line” of W , and the
other perpendicular to W .
There are a couple of points I would like to emphasize (or re-emphasize)
before getting down to business.
First, each instance of a vector in the world actually occurs at some specific spot, and whenever an arrow is drawn it is drawn somewhere specifically.
When we think of something in the world as a vector, we take the point of
view that any representative refers not only to itself but to all others with
the same magnitude and direction too. When you refer to 7/3 you are often
making a statement about 21/9 at the same time even without mentioning
that second fraction.
Second, a given push (a force) is a real thing that exists however we decide
to describe it. The wind is just whatever it is and doesn’t need us to tell
it that it is 30 miles per hour from the North. A displacement across a
room is a real thing, in itself. But in physics and other classes we try to
describe things, often using mathematics and numbers. This association
always involves a huge pile of assumptions including, for example, a choice
18
LARRY SUSANKA
of a distance unit, a time unit, an “origin,” directions for coordinate axes
and methods for measuring lengths and angles and the passage of time and
some way of gauging the magnitude of a “push” and on and on. There is also
a conceptual framework, frequently generated by the esthetic sensibilities of
the creators of the model, which helps us think about the measurements. It
is not always clear which among the conceptual underpinnings are necessary,
or even if they are consistent. Our description depends not only on the real
thing, but on all these choices involved in a representation too.
When we go through this process of assembling a model of something in
the world we must never forget that the map is not the territory. A nickname
for a thing is not the thing itself. The universe names itself, and whatever
shorthand we use to describe part of it leaves out almost everything. In
applications we must always be looking “out the window” to make sure
the world is still answering to our nickname for it. It is astounding how
often, over the last four centuries, it comes when we call. We must be doing
something right.
4. Rn and Euclidean Space.
Now that we have in mind the typical visual representation of vectors,
and some important subjects usually studied using vectors, it is time to get
more precise.
Rn is defined to be the set of columns or “stacks” of real numbers which
are n real numbers high. A column is a type of matrix. (We will talk about
matrices of various shapes a lot more later.) The real numbers in a column
like this are called entries or coordinates .
You add columns, if they are the same height, by adding corresponding
entries. You multiply columns by real numbers (called scalars) by multiplying all the entries by that number. So
 1  1

 1  1  1
cx
x
x + y1
y
x
 x2   cx2 
 x2   y 2   x2 + y 2 

  

    
 and c  ..  =  ..  .
 ..  +  ..  = 
..




 .  . 
. 
.
.
xn
yn
xn + y n
xn
cxn
Adding columns of different heights is not defined. As is customary in
algebra, subtraction of columns is defined by adding “minus one times” the
subtracted column to the first column.
We will refer to members of Rn , in a context where these operations are
important, as vectors. The addition operation will then be called vector
addition. The multiplication is called scalar multiplication.
A sum or difference of scalar multiples of vectors is called a linear combination of the vectors involved. This linear combination, written as a
single vector, is sometimes called the resultant of the linear combination.
OVERVIEW OF REAL LINEAR ALGEBRA
19
For purely typographical reasons we often denote members of Rn (columns)
by rows
 1
x
 x2 
 
(x1 , x2 , . . . , xn ) ⇐⇒  .. 
 . 
xn
with intent taken from context. These are not to be mistaken for row matrices, which will come up later, and which have no commas separating
entries4.
We define ei by
 
0
 .. 
.
 
ei = 1 ←− ith row
.
 .. 
0
where the 1 is located in the ith spot, counting down from the top, and all
other entries are 0.
The set of these ei will be denoted En = { e1 , . . . , en }. This set is called
the standard basis of Rn . Note the ambiguity in the notation here: e1 as
an element of En is not the same vector as e1 as an element of Ek unless
n = k.
The zero vector, that is, the vector whose entries are all 0, suffers from
the same ambiguity. The zero vector of any size is denoted 0 and usually
distinguished from the number 0 by context or the use of bold type.
Any vector x can be written as the resultant of one and only one (except
for order of summands) linear combination of these standard basis vectors
e 1 , . . . , en :
x = x 1 e1 + x 2 e2 + · · · + x n en =
n
X
x i ei .
i=1
We will usually use bold lower case letters for members of Rn , while the
entries will usually be the same letter in normal type, with subscripts or
superscripts to indicate position in the column.
Sometimes in R2 or R3 the vectors e1 , e2 or e3 are denoted ~i, ~j or ~k. In
that case, usually the entries of a vector x are denoted x, y or z rather than
x1 , x2 or x3 . We will mostly use subscripts or superscripts. That allows us
4This convention is far from universal, even among works by the same author—though
one might hope for consistency within a given text. In my notes on tensors, for instance,
we take a different point of view.
20
LARRY SUSANKA
to write down, so far as possible, expressions whose form does not depend
on dimension.
We define dot product between members v and w of Rn by
v · w = v1 w1 + · · · + vn wn =
n
X
vi wi .
i=1
The dot product has many useful properties. For instance, the dot product ei · ej equals 1 if i = j and 0 otherwise.
Dot product is commutative and distributes over vector addition:
v·w =w·v
and
u · (v + w) = u · v + u · w.
Also if c is a scalar, c(v · w) = (cv) · w = v · (cw).
We define magnitude or norm (synonymous) of a vector u by
√
kuk = u · u.
The distance between vectors u and v is defined to be k u − v k.
If c is a scalar, kcuk = |c| kuk.
The norm satisfies the very important Cauchy-Schwarz and triangle
inequalities: for all pairs of vectors v and w
|v · w| ≤ kvk kwk and
| kvk − kwk | ≤ kv + wk ≤ kvk + kwk
The proof of the second of these follows easily from the first. The proof
of the Cauchy-Schwarz inequality is given in the next calculation.
The inequality |v · w| ≤ kvk kwk is obviously true when v · w = 0. We
assume, , that this dot product is nonzero so, in particular, nether v nor w
is the zero vector.
v·w v·w w · v−
w
0≤ v−
w·w
w·w
v · w 2
v·w
w·w
w·v+
= v·v−2
w·w
w·w
(v · w)2
= v·v−
.
w·w
So
(v · w)2
≤v·v
w·w
and the Cauchy-Schwarz inequality follows.
The angle θ between members v and w of Rn is defined by the equation
v · w = kvk kwk cos(θ).
OVERVIEW OF REAL LINEAR ALGEBRA
21
In the plane applied to unit vectors (corresponding to points on the unit
circle) the proof of this is nothing more than the difference formula for
cosines:
( cos(β), sin(β) )·( cos(α), sin(α) ) = cos(β)cos(α)+sin(β) sin(α) = cos(β−α).
That the angle formula “works” for any pair of nonzero vectors in the plane,
not just unit vectors, is an exercise. To justify the formula in Rn when n > 2
requires some thought. An argument can be made using the lengths of the
edges of a triangle formed by the two vectors involved. For now you can
just regard this as the definition of angle in these cases.
So two vectors are perpendicular exactly when their dot product is zero.
It sometimes is useful to be able to produce vectors perpendicular to a
given vector, and this dot product idea makes it easy. For instance to find
a vector perpendicular to (3, −4) you switch entries and change one sign.
So (4, 3) works. In higher dimensions you kill (i.e. replace by 0) all but
two entries and apply this rule to these two. So (4, 3, 0) and (7, 0, −3) and
(0, −7, −4) are all perpendicular to (3, −4, 7).
Euclidean Space of dimension n is Rn with the operations of vector
addition and scalar multiplication together with this particular notion of
distance and angle, given by dot product.
A unit vector (that is, a vector one unit long) in Euclidean Space has
coordinates
( cos(θ1 ), cos(θ2 ), . . . , cos(θn ) )
where θi is the angle between the vector and the “ei axis.” Use dot product
to justify this. The numbers cos(θi ) are called direction cosines.
We define projection onto a line through the origin, containing the
nonzero vector w to be
v·w
P rojw (v) =
w.
w·w
Note that P rojw ((P rojw (v))) = P rojw (v) for all v. If v is perpendicular
to w then P rojw (v) = 0. If v is a multiple of w then P rojw (v) = v.
We use the decomposition
v = P rojw (v) + (v − P rojw (v))
for several things. It gives the vector v as the sum of two vectors. One is
“along the line of” the vector w. The other is perpendicular to that line.
The projection onto the plane5 through the origin perpendicular
to the vector w is the function
5 If n > 3 we might call this projection onto a “hyperplane.” In dimension 2, this is
projection onto the line perpendicular to w.
22
LARRY SUSANKA
CoP rojw (v) = v − P rojw (v).
Note that CoP rojw (CoP rojw (v)) = CoP rojw (v) for all v and if v is
perpendicular to w then CoP rojw (v) = v. If v is a multiple of w then
CoP rojw (v) = 0.
We define reflection in a plane perpendicular to vector w to be the
function
Ref lw (v) = −P rojw (v) + (v − P rojw (v)) = v − 2 P rojw (v).
Note that Ref lw (Ref lw (v)) = v for all v and if Ref lw (v) = v then v is
perpendicular to w. Also Ref lw (w) = −w.
We define cross product between members v and w of R3 by
v × w = ( v 2 w3 − v 3 w2 , v 3 w1 − v 1 w3 , v 1 w2 − w1 v 2 ).
An easy calculation using dot product shows that v × w is perpendicular
to both v and w.
Also c(v × w) = (cv) × w = v × (cw) for any scalar c.
It is a fact that v × w = −w × v. So cross product is not commutative.
Except for special examples (u × v) × w 6= u × (v × w). Cross product
is not associative.
Cross product does obey both left and right distributive laws with respect
to vector addition:
u × (v + w) = u × v + u × w
and
(v + w) × u = v × u + v × w.
In general, if the angle between v and w is θ,
kv × wk = kvk kwk sin(θ)
so the magnitude of v × w is the area of the parallelogram formed using v
and w. This last formula is proved via a messy collection of terms to verify
the left equation in
kvk2 kwk2 − kv × wk2 = (v · w)2 = kvk2 kwk2 cos2 (θ).
It is also a fact, useful in Calculus, that the number |u · (v × w)| gives
the value of the volume of the “bent box” (i.e. parallelepiped) determined
by the three edges u, v and w.
The number u · (v × w) is called the triple product of u, v and w.
4.1. Exercise. Find the resultant of the linear combination
2(1, 3) − 3(−2, 1).
OVERVIEW OF REAL LINEAR ALGEBRA
23
4.2. Exercise. (i) When will v · w = kvk kwk?
(ii) Use the Cauchy-Schwarz inequality to prove the triangle inequality.
4.3. Exercise. (i) Find the angle between (7, 1, 0, 5) and (1, 1, 1, 0).
(ii) Find four vectors perpendicular to (1, 3, −8, −2).
(iii) Create a unit vector pointing in the same direction as (1, 3, −8, −2).
(iv) What is the cosine of the angle between (1, 3, −8, −2) and the e4 axis?
4.4. Exercise. (i) Show that c(v · w) = (cv) · w = v · (cw) and that v · w =
w · v for scalar c and vectors v and w.
(ii) Show that (v + u) · w = v · w + u · w for vectors v, w and u.
4.5. Exercise. Decompose v = (1, 3, 5) into the sum of two vectors: one
“along the line of ” w = (−1, 1, 2) and the other perpendicular to that line.
4.6. Exercise. (i) Calculate P rojw (e1 ) for w = (1, −1, 3).
(ii) Calculate CoP rojw (e2 ) for w = (1, −1, 3).
(iii) Calculate Ref lw (e3 ) for w = (1, −1, 3).
4.7. Exercise. (i) Show that P rojw (v + cu) = P rojw (v) + cP rojw (u) for
any nonzero vector w and scalar c and any vectors v and u.
(ii) Show P rojw (P rojw (v)) = P rojw (v).
4.8. Exercise. (i) Show that CoP rojw (v+ku) = CoP rojw (v)+kCoP rojw (u)
for any nonzero vector w and scalar k and any vectors v and u.
(ii) Show CoP rojw (CoP rojw (v)) = CoP rojw (v).
4.9. Exercise. (i) Show that Ref lw (v + cu) = Ref lw (v) + cRef lw (u) for
any nonzero vector w and scalar c and any vectors v and u.
(ii) Show Ref lw (Ref lw (v)) = v.
4.10. Exercise. In R3 , calculate ~i · ~j × ~k and ~k · ~i × ~j and ~j · ~k × ~i .
4.11. Exercise. (i) Find a vector perpendicular to both (3, 5, 1) and (1, 2, 3).
(ii) Find an example to show that (u × v) × w need not equal u × (v × w).
(iii) Show that cross product does obey distributive laws with respect to
vector addition:
u × (v + w) = u × v + u × w.
(iv) Show that
u · (v × w) = w · (u × v) = v · (w × u).
24
LARRY SUSANKA
4.12. Exercise. (i) Find the area of the parallelogram determined by edges
(1, 1, 1) and (1, 3, 2).
(ii) Find the area of the parallelogram determined by edges (1, 1, 0) and
(1, 3, 0). Why did I ask this question, since it is so similar to the last?
(iii) Find the volume of the parallelepiped determined by edges (1, 1, 1),
(1, 0, 2) and (3, 6, 7).
5. ✸ The Association Between “Arrow-Vectors” and Rn .
It is time to make a connection6 between vectors represented in Rn as in
the last section, and our earlier ideas typified by the quasi-physical “arrowvectors” from Section 3.
We identify a part of our universe as worthy of study, and call that, below,
“our world.”
In order to make this work, we need to have quite a bit of certain types
of information about our world. We have to be able, for instance, to understand how to move around in our world, and measure properties of specific
displacements there. We have to be able to “parallel-transport” one of our
specific displacements to an “equivalent” specific displacement that starts
where any other particular displacement ends. Thought of as arrows, we
have to be able to move one arrow so that its tail is at the tip of any other.
Looking at the collections of equivalent displacements, which we will call
arrow-vectors, we have to be able to consistently extend an arrow-vector
(multiply by a scalar) or to combine several of them into a single arrowvector. And if A = B + C represents arrow-vector A as a sum of two other
arrow-vectors then it must be true, for instance, that 2A = 2B + 2C. It is
entirely possible that we could be deluding ourselves about our ability to do
or know these things. Nevertheless, we suppose that we can do and know
them. If we are wrong, we will find out sooner or later, because our model
will make predictions that are contrary to what we see7.
If we study our world and realize that we cannot make any displacement
at all then we are confined to a point. Our world is “zero dimensional” and
we could identify our only displacement vector with the set {0}: not a very
interesting situation, and it is hard to imagine anyone finding that worthy
of study, but there you are.
We might find, though, that we can move in the world. We pick a displacement W that is to be our measuring stick. W is not very abstract at
all. A representative is sitting in front of our nose: our standard movement
6Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
7That could be really important information, and discoveries like that have led to Nobel
Prizes.
OVERVIEW OF REAL LINEAR ALGEBRA
25
in one possible direction. As we examine other displacements we might find
that every single one is a multiple of W . There is only one independent direction in this world. We say our world is one dimensional, our movements
are “confined to a line.” So every displacement A is of the form A = a1 W
for some real number a1 and we associate the displacement A with the real
number a1 in R = R1 .
It is particularly important to note that this association depends explicitly
on the yardstick W , so when you use the “map” that associates each A with
its own a1 you must explicitly include the “legend.” Without knowing W
you cannot know what a1 means. But if you know W and how it fits inside
our world (and of course you do know this: that was our starting point)
then you know exactly what this number refers to.
After some thought, one might conceive of a one dimensional world as a
line cutting across the room and extending out into space, infinitely far in
both direction, whatever that means. However it might be more like a big
ball of string with our yardstick W trapped inside the string. Remember,
though, we would have to know (or think we know) how to slide along one
arrow onto and past another around the bends of the string, and how to
know when one displacement is equivalent to another at a different location
along the string.
This world is worth thinking about, but there are more interesting cases.
Let’s consider the situation that we would be in if we had found there to
be more than one direction in our world. If there are displacements that are
not multiples of W , pick one. Call it V .
We can experiment with displacement after displacement trapped on our
world. We might find that every one of them is a combination of these two
chosen displacements. After a while we would become convinced that any
displacement A in this world can be realized as
A = a1 W + a2 V
26
LARRY SUSANKA
for certain real numbers a1 and a2 .
Because one arrow-vector is not enough to describe all displacements as
we explore our world and two arrow-vectors are, we say our world has two
dimensions. Our world seems like a tabletop, or a piece of paper.
In this case we associate the arrow-vector A = a1 W + a2 V with the point
a1 e1 + a2 e2 = (a1 , a2 ) in R2 .
Again, to understand what an ordered pair means physically on the tabletop or on the paper we must have W and V in hand, identified. And we
have to know enough about our world so we think we understand when
two displacements at different locations are equivalent, how to combine displacements, and we must satisfy ourselves that these displacements scale and
combine the way they should. Displacements trapped on an “unbounded”
sheet of paper, even if it is bent, could be made to work. Displacements
sliding around trapped on the surface of a sphere could not. But you would
find that out, in the end. Experiments would produce anomalies8 and you
would realize that your assumptions lead to contradiction. The way these
contradictions reveal themselves could provide very interesting information
about the way the world really is.
Now let’s suppose we find that there are at least three directions we can
go, by discovering a displacement which cannot be written in terms of W and
V . If U is an example of a displacement like this, and if every displacement
can be written as a sum
a1 W + a2 V + a3 U
then we call our world three dimensional, and associate the displacement
A = a1 W + a2 V + a3 U with
a1 e1 + a2 e2 + a3 e3 = (a1 , a2 , a3 ) in R3 .
It is often far easier to do calculations in Rn than to work with arrowvectors directly. To make sense of answers computed in the model, R3 , we
must keep in mind the ordered list of guide arrow-vectors in our world that
give the ordered triples meaning.
6. ✸ Position Vectors.
Vector operations are so efficient9 that sometimes we want to use them to
help us define specific places. This is awkward, because vectors don’t have
specific locations. To do this an agreement is required. We simply have
to agree on a “base point” for our vectors. With this agreement we can
8For instance when confined to the surface of a sphere a certain multiple of each displacement is the zero displacement. This presents problems.
9Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
OVERVIEW OF REAL LINEAR ALGEBRA
27
create new things called position vectors. Using a vector this way means
you must always use the version of the vector, thought of as an arrow, with
its tail at this specified base point. Its nose then rests on the location you
intend to identify. This agreement, a known choice of base point, is part of
the definition of “position vector” and must be specified explicitly or you
will not know which point is intended.
In the case of our visual assistants, the “arrow-vectors,” using an arrow
that had its tail at some particular place in our world might have been
convenient to help us calculate if we decide on coordinates. With position
vectors, using this copy is required, whether we represent our arrow-vectors
in Rn or not.
A position vector is not a vector because the location of its tail really does
matter. If we decide on a different base point, the position vector to the
same point will differ from the earlier position vector for two reasons: first,
the base point is different so they are different arrows. But even if we forget
that, they will not have the same magnitude and direction.
Position vectors (with the same base point) cannot be added to each other
to yield a position vector or anything else: the tail of one of these position
vectors cannot be at the nose of the other, to use the customary construction
of an arrow-vector sum. If you “formally add arrows” by moving one position
vector so that its tail is at the nose of the other to attempt this addition
you will see that you end up at points corresponding to different places
in the world depending on your base point. The same is true with scalar
multiplication: if you stretch a position vector by a factor of 2 you end up
at different places depending on the base point used.
6.1. Exercise. Draw pictures in a plane to convince yourself of the two
assertions in the previous paragraph.
If we do decide on a base point then the difference between two position
vectors with this base point is a vector. That is because, had we picked
a different base point, the difference between the two new position vectors
would be the same. The difference between starting and ending position
defines a displacement. That displacement is a vector even if the start and
end position vectors are not.
And the sum of any vector added to a position vector can be sensibly
interpreted as a position vector with the same base point.
6.2. Exercise. Draw pictures in a plane to convince yourself of the statements in the last two paragraphs.
There are two reasons, I think, that people have trouble grasping the
difference between vectors and position vectors. First, the situations where
you must be aware of the distinction or you will inadvertently do the wrong
calculation don’t come up that often in the early examples. One can do the
correct calculation by accident. Second, it is customary to start working
28
LARRY SUSANKA
with entries rather than pictures of arrows from the very start, so a base
point has already been selected by someone, even if the student is not made
aware of the choice, and the presumption is that it will not change.
Let’s examine a situation where you might be tempted to perform undefined operations and see how to keep it all straight.
Let QA denote the position vector of point Q using base point A. It is
a specific arrow, with tail at A and nose at Q. Let QB denote the position
vector using base point B. We will call AA the base position vector at
A and BB the base position vector at B. Both are, of course, the zero
position vectors. However they denote different locations.
Given some position P, let’s say we want to move twice as far from A
along the line from A to P and then move from that spot by displacement
vector D. We will call the final position Q and we want a position vector
for Q.
Using base point A we have
PA = AA + (PA − AA),
the sum of a position vector and a displacement vector. That displacement
does not depend on the base point: PA − AA = PB − AB.
The first part of the instruction to locate Q does not imply that we stretch
PA by factor 2. Instead, the intent is to multiply the displacement vector
PA−AA by 2. It is only vectors that can be multiplied by scalars to produce
a result that is independent of choice of base point. Then we add D.
With base point A, the position vector we intended to describe is then
QA = AA + 2(PA − AA) + D.
Since AA is the zero position vector, if we just examine coordinates it
seems like we are doing nothing. But by writing the position vector for P in
this way, we can translate the intended operations to a position vector with
a new base point effortlessly:
QB = AB + 2(PB − AB) + D
= BB + (AB − BB) + 2(PB − AB) + D.
AB is not, of course, the zero position vector in this second description.
6.3. Exercise. Make selections of locations A, B, P and D in a plane and
see what I am describing above in that case.
OVERVIEW OF REAL LINEAR ALGEBRA
29
7. Parametric and Point-Normal Equations for Lines and Planes.
We want to use vector operations to describe lines and planes in space.
Since these lines and planes are composed of points that have specific locations, we use position vectors to describe them: we use the copy of a vector
with its tail at a base point to “point” to the location we intend.
Implicitly, when we use members of R2 or R3 for position vectors we must
have agreed upon, in advance, a choice of base point (identified with the
origin in R2 or R3 ) and standard direction vectors (in a particular order) of
unit length to which the coordinates of these ordered pairs or triples refer.
In this section we assume that has been done, and identify vectors and also
position vectors with their coordinate representations.
Since we will not, in this section, make a notational distinction between
vectors and position vectors we must take extra care to make completely
clear which is which during calculations.
When there is mention of perpendicularity, we also assume that we have
chosen standard direction vectors to be perpendicular to each other and to
have the same length: i.e. we use the same scales on each coordinate axis10
so that perpendicularity in the world matches perpendicularity in Rn .
We say a vector lies in or lies along a line or plane if, whenever the tail
touches the line or plane the entire “shaft” of the vector, head to tail, is in
the line or plane.
We say a position vector p points to a line or plane if the nose rests on
a point of the line or plane when the tail of p is at the origin.
We say a nonzero vector n is normal to a line or plane if n · v = 0
whenever v lies in the plane or line.
Suppose a given position vector p points to a line in R2 and vector n is
normal to that line.
The point-normal form for the position vector x of points in the line in
R2 is
n · (x − p) = 0.
In other words, x is a position vector of a point on the line exactly when it
satisfies that equation.
Expanding, we have the usual standard equation you have all seen for a
line in the plane
n1 x1 + n2 x2 = n · p.
10Imagine what would happen to the coordinates of arrow-vectors that are perpendic-
ular in the world if we measured in centimeters on one axis and inches along another.
30
LARRY SUSANKA
Suppose a given position vector p points to a plane in R3 and vector n is
normal to that plane.
The point-normal form for the position vector x of points in the plane in
R3 is
n · (x − p) = 0.
Again, this produces the usual standard equation for a plane in space
n1 x1 + n2 x2 + n3 x3 = n · p.
Parametric formulas for constant velocity motion and parametric formulas
for planes or “hyperplanes” in higher dimensions are given by:
Q(t) = p + tv
Q(s, t) = p + sw + tv
Q(r, s, t) = p + ru + sw + tv.
Here r, s and t are free parameters. They can be any real numbers. Q is
the position vector of a point in the object. u, w and v are vectors that lie
in the object.
7.1. Exercise. (i) Find a parametric equation for a line through (1, 6) and
(8, 9).
(ii) Find a point-normal form for this line.
(iii) Find a standard equation for this line.
7.2. Exercise. Find a parametric equation for a line through the points
(7, 8, 9) and (−2, −2, 4).
7.3. Exercise. (i) Find a point-normal equation for the plane through the
points (7, 8, 9) and (−2, −2, 4) and (−1, 1, 7).
(ii) Find a parametric equation for this plane.
(iii) Find a standard form for this plane.
7.4. Exercise. (i) Find a point-normal form for the plane 2x1 −3x2 +3x3 =
7.
(ii) Find a parametric equation for this plane.
OVERVIEW OF REAL LINEAR ALGEBRA
31
8. Systems of Linear Equations.
We now set aside (for quite some time) our geometrical considerations
and descend into a quagmire of messy algebraic calculations. Once you
learn what is going on it is absolutely vital that you acquire methods of
dodging these complex calculations, offloading any that remain to a hardware assistant to the extent possible. You then must interpret the answer,
no small task in itself, but a proper job for a human. It is essentially impossible for us to perform the raw calculations we will be dealing with using
pencil and paper unless the problem has been carefully rigged to allow for
this. Artificial problems of this kind virtually never come up in practice.
Nature is just not that accommodating. So we have to use hardware.
We will learn in this class to express one underlying technique in
various ways and for various purposes. That single technique is, simply, to
find the solution(s) of a system of k linear equations in n unknowns
x1 , x2 , . . . , xn or know when that system has no solution.
m1 1 x1 + m1 2 x2 + · · · + m1 n xn
m2 1 x1 + m2 2 x2 + · · · + m2 n xn
..
..
..
.
.
.
mk 1 x1 + mk 2 x2 + · · · + mk n xn
= b1
= b2
..
.
= bk
If the system of equations is homogeneous—that is, all the bi are zero—
the system has one obvious solution: namely choose all xi to be zero too.
But there might be other solutions, and if some of the bi are nonzero there
might be no solution at all.
From basic algebra we know how to solve such a system by laborious
application of elementary operations and how to express a solution set
in parametric form when there are many solutions.
These operations come in three types: First, you can multiply one equation by any constant and add it to a different equation. Second you can
multiply an equation by any nonzero constant. Third, you can switch the
order of two equations. We will call these operations of type one, two
and three11, respectively. The solution to a system of equations is not
changed if you apply any operation of these three types to the system. And
any system can be solved (in principle) by applying these three operations
by hand to the system of equations, eliminating variables followed by backsubstitution.
If, in the course of solving a system, you produce an equation of the form
“0 = c” for some nonzero c then the system has no solution and is called
inconsistent.
11An operation of type three can be produced by a combination of three operations of
type one plus one operation (multiply by −1) of type two.
32
LARRY SUSANKA
If, in the course of solving the system some of the equations turn into
“0 = 0” and there are n remaining equations of the form
x1 + w 1 2 x2 + · · · + w 1 n xn = c 1
x2 + · · · + w 2 n xn = c 2
..
..
.
.
xn = c n
where each equation has one fewer variable than the one above then there
is a unique solution, choosing xn = cn and employing back-substitution to
obtain the rest of the xi .
If, finally, the system is consistent (not inconsistent) and you end up
with fewer nontrivial equations than unknowns the system can be reduced
to one in which neighboring descending pairs of equations look like
xt + ws t+1 xt+1 + · · · + ws j xj + · · · + ws n xn = cs
xj + · · · + ws+1 n xn = cs+1 .
In this case some of the variables are the first nonzero term in their
row and have a 1 coefficient. These are called pivot variables. When you
solve for the pivot variables, starting from the bottom and employing backsubstitution as you move up, the other variables become free parameters
and there are an infinite number of solutions obtained by arbitrary choices
of these free parameter values. We note in particular that a homogeneous
system with fewer equations than variables always has (infinitely many)
nonzero solutions.
Just to have it on record, we note that if there are k pivots in this last
case then there will be n − k free parameters.
These are the only three possibilities.
This is all very awkward to do, or even to think about, and the effort to
carry this out scales badly as the size of the system grows. But the work is
very mechanical: multiplying and adding. Just the thing machinery is good
at!
8.1. Exercise. For each of the three systems below, solve the system and
present your answer in parametric form, or as a single solution, or assert
that no solution exists, whichever is appropriate.
First System:
3x1 + 2x2 + 5x3 = 1
7x1 − 6x2 + x3 = 9
10x1 − 4x2 + 6x3 = 10
OVERVIEW OF REAL LINEAR ALGEBRA
Second System:
3x1 + 2x2 + 5x3 = 1
7x1 − 6x2 + x3 = 9
10x1 − 4x2 + 6x3 = 8
Third System:
3x1 + 2x2 + 5x3 = 1
7x1 − 6x2 + x3 = 9
3x1 − 9x2 + 2x3 = 5
33
9. Einstein Summation Convention.
In this note we often use the Einstein Summation Convention: a (possibly
long) sum a1 s1 +· · ·+an sn can be written simply as ai si , with the summation
over all possible values of i understood. The convention can be used to
compress a sum of indexed products where within each product the indices
are repeated exactly once.
The symbol i could be replaced by any (unused) symbol and the same
summation would be meant. The index of summation is sometimes called a
“dummy index.”
When subscripts or superscripts over which summation is not being taken
appear, we assume one instance of the sum is indicated for each possible
value of that index.
We list below some examples.
a 1 b1 + a 2 b2
a1 b1 + a2 b2 + c1 d1 + c2 d2
a1 b11 + a2 b12
and
a1 b21 + a2 b22
a1 b11 = 3 and
a2 b22 = 3
⇔ either ai bi or ak bk
⇔ either ai bi + ci di or ai bi + ck dk
⇔ ai bki
(k = 1 or 2 implied)
⇔ ai bii = 3
(Not a sum: i = 1 or 2 implied)
Einstein invented this notation so that messy sums (of exactly the kind
we will be working with throughout this course) will be easier to look at,
and it can be a bit shocking to see how much this convention cleans up a
discussion.
34
LARRY SUSANKA
10. Matrices.
A matrix is a rectangular arrangement of numbers, organized into horizontal rows and vertical columns. The size of a matrix is given by a pair
m × n where m represents the number of rows and n is the number of
columns. The number of rows is always given first. So the members of Rn
we were working with before are n × 1 matrices.
The numbers inside a matrix are called entries and the location of an
entry is given by specifying its row and column, counting from the upper
left corner. Usually, the entries of a generic matrix will be given by lower
case letters corresponding to the name of the matrix, with subscripts or
superscripts indicating row or column. Thus:


 1
x
m1 1 m1 2 . . . m1 n
m2 1 m2 2 . . . m2 n 

 x2 



i
M = (mi j ) = 
..
..
..  and x = x =  ..  .

 .
.
.
. 
mk 1 mk 2 . . . mk n
xk
In notation for entries of a matrix, if there are two superscripts or two
subscripts, the first number usually refers to the row number of the entry.
If there is one index as a superscript and one subscript, the superscript is
(almost always) the row number of the entry.
Two matrices are said to be equal if they are the same size and all entries
are equal.
Matrices of the same size can be added by adding corresponding entries.
This operation is called, oddly, matrix addition. There is a matrix of each
size filled with zeroes that acts as an additive identity matrix. It is called
the zero matrix. It is always denoted by “0” even though matrices of
different sizes should probably not be denoted by the same symbol. As with
zero vectors, this ambiguity doesn’t seem to cause much problem. Context
determines the shape.
Any matrix can be multiplied by a number by multiplying every entry in
the matrix by that number. This operation is called scalar multiplication.
Matrices of certain sizes can be multiplied by each other in a certain
order in an operation called matrix multiplication. Specifically, an n × m
matrix on the left can be multiplied by an m × k matrix on the right. The
number of columns of the left matrix must equal the number of rows of the
right one. The product matrix will be n × k.
If A = (aij ) is m × n and B = (bij ) is n × k then the product matrix
C = (cij ) = AB is defined to be the m × k matrix whose entries are
cij = ait btj .
Note the Einstein summation convention in action. This is mk different
equations, one for each row-column combination of the entries of C. On the
OVERVIEW OF REAL LINEAR ALGEBRA
35
right side of the equation the index of summation is t. So you are multiplying
entries of A against entries of B and adding. You move across the ith row
of A, and down the jth column of B, adding these numerical products as
you move along.
Matrix multiplication is not commutative. For one thing, if AB is defined
there is no reason for BA to be defined. This will only happen if A and B
are m × n and n × m, respectively. But even then, and even if m = n, it is
not typical for these matrices to commute with matrix multiplication.
It is quite easy to show for matrices A, B, M and N and number c that
A+B =B+A
A(M + N ) = AM + AN
c(M N ) = (cM )N = M (cN )
(M + N )A = M A + N A
(A + B) + M =A + (B + M )
whenever these products and sums are defined.
It is a bit messier to show that matrix multiplication is associative: (AB)C =
A(BC). To see this let AB = D so dij = ait btj and let BC = M so that
muv = bus csv . So the ith row jth column entry of (AB)C is
i t w
diw cw
j = a t bw c j
while the ith row jth column entry of A(BC) is
i w s
aiw mw
j = a w bs c j .
Since these entries are all the same (except for an exchange of dummy indices) the two matrices are equal.
The number of steps (individual multiplications and additions) needed to
multiply an m × k matrix by a k × n matrix is exactly mn(2k − 1). We will
describe this situation by saying the work required to do this task is “of the
order mnk,” neglecting constant factors and additive terms of lower-thanmaximum degree.
x 3 1
10.1. Exercise. A =
2 7 T
10.2. Exercise. A = 7 3 1


6 3
and B = 2 7 .
Find AB.
8 1
 
6

and B = 2 .
Find AB and BA.
8
36
LARRY SUSANKA
11. The Identity and Elementary Matrices.
There are several important square matrices which we now discuss, along
with a small shopping cart of operations on matrices.
First is the n × n multiplicative identity matrix In . It is the one and
only matrix that satisfies
In M = M
and
AIn = A
whenever A is any m × n matrix and M is any n × k matrix. (Can you prove
there is at most one matrix like this for each n?)
It is the matrix
In
=

1
0

 ..
.
0
0 ...
1 ...
..
.
0
...

0
0

..  .
.
1
Any matrix has what is called a main diagonal: the ordered list of
entries with equal row and column number. They are called the diagonal
entries of the matrix.
The identity matrix is a square matrix with ones along the main diagonal
and zeroes off the main diagonal.
The entries in the identity matrix are denoted δi j . So if i = j the value
of δi j = 1. Otherwise δi j = 0. This function (of its subscripts) has several
uses, and is called the Kronecker delta function.
If A is any n × n matrix and if there is another n × n matrix B with
AB = BA = In then B is called the inverse of A, denoted A−1 . It is easy
to show that if A has an inverse at all then it has just one inverse: inverses
are unique.
A matrix with an inverse is called invertible or nonsingular. Matrices
without inverses are called singular.
One of the important early tasks will be to find inverses, or know when
they do not exist.
We note the following useful fact: If A1 , . . . , Aj is a list of two or more
n × n matrices which all have inverses then the product A1 · · · Aj has an
inverse and
−1
(A1 · · · Aj )−1 = A−1
j · · · A1 .
In other words, under these conditions the inverse of the product is the
product of the inverses of the factors in reverse order.
Finally, we come to two types of simple square matrices that do have
inverses, the elementary row matrices of the first, second and third kinds.
OVERVIEW OF REAL LINEAR ALGEBRA
37
An elementary row matrix of the first kind is obtained when you
take the identity matrix and add a multiple of one of its rows to another.
An elementary row matrix of the second kind is obtained when you
take the identity matrix and multiply one of its rows by a nonzero constant.
An elementary row matrix of the third kind is obtained when you
take the identity matrix and switch two rows.
Elementary row matrices of the third kind, and all products of such matrices, have only one nonzero entry in each row, and only one nonzero entry
in each column, and those nonzero entries are all 1. Matrices with this
property are called permutation matrices.
11.1. Exercise. Prove that the product of two permutation matrices is also
a permutation matrix. Any permutation matrix is the product of elementary
row matrices of the third kind.
If you multiply any elementary row matrix S by any compatible matrix M ,
placing S on the left, the result will be as if you had performed the operation
that created S on M . For instance, left multiplication by a permutation
matrix re-orders the rows of M .
11.2. Exercise. (i) Find a 3 × 3 matrix T that, when multiplied on the left
of matrix M , as T M , will add 2 times the third row of M to the second row
of M , and leave the first and third rows of M alone.
(ii) Find a 3 × 3 matrix S that, when multiplied on the left of matrix M ,
as SM , will multiply the first row by 2 and leave the second and third rows
of M alone.
(iii) Find a 3 × 3 matrix U that, when multiplied on the left of matrix
M , as SM , re-orders the rows of M so that the first row is sent to the third
row, the third row is sent to the second row and the second row is sent to
the first row.
(iv) What are the inverses of S, T and U ?
(v) Describe in words the inverse matrices for elementary row matrices
of the three kinds.
5 T
6 3
−1
−1
11.3. Exercise. A
=
and B
=
. Find (AB)−1 .
W 7
2 7
38
LARRY SUSANKA
12. Special Matrices.
Matrices can sometimes profitably be thought of as being composed of
“blocks” or “submatrices.” Below we have

 1 

R
m1 1 m1 2 . . . m1 n
m2 1 m2 2 . . . m2 n 
 R2 




M = (mi,j ) = 
 = C1 C2 . . . Cn
..
..
..  =  ..

 .

.
.
. 
mk 1 mk 2 . . . mk n
Rk
where the Ri are the rows of M and the Ci are the columns of M .
For instance for numbers xi and y i you could write the matrix products
 1 
 1 
R
y
 y2 
 R2 




C1 C2 . . . Cn  .
(x1 x2 . . . xk )  ..
 = y i Ci .
 = xi Ri or

 ..

 .
yn
Rk
So the sum on the left could be called a “weighted” sum of the rows Ri and
the sum on the right a “weighted sum” of the columns Ci . The “weights”
xi and y i are a measure of how significant each row or column is in the final
sum.
There is a matrix operation called “transpose” that can be applied to
any matrix. It takes a matrix and switches the row and column designations
of the entries. The rows become columns, and the columns become rows.
If M is an m × k matrix then its transpose, denoted MT , is a k × m
matrix. If B = M T then bi j = mj i .
 
14
T
2 1
1234

=
3 5 .
4157
47
If v and w are vectors in Rn then the dot product v · w can be represented
as the matrix product v T w.
Transpose is one of those things that is easier to explain by example than
by formula.
T
It is obvious that M T = M . It is also a pretty easy calculation (a test
to see if you can use the Einstein summation notation) to show that
(AB)T = B T AT .
So similarly to the situation with inverses, the transpose of a product is the
product of transposes in reverse order.
−1
This implies that if A has an inverse then so does AT and AT
=
T
−1
A
.
OVERVIEW OF REAL LINEAR ALGEBRA
39
An elementary column matrix is the transpose of an elementary row
matrix. They act on the right rather than the left to add a multiple of a
column of a compatible matrix to a different column, or to multiply a column
by a nonzero constant, or to switch two columns. If you wish, you can justify
T
this by examining R AT
where R is an elementary row operation.
A matrix is called symmetric if it is its own transpose: M T = M . Such
a matrix must be square. Diagonal matrices are matrices having zeroes
off the main diagonal. A square diagonal matrix is symmetric.
A matrix is called skew symmetric if it is the negative of its own transpose: M T = −M . A matrix like this must be square and have only zeroes
on the main diagonal.
A matrix that has zeroes below the main diagonal is called upper triangular, and a matrix that has zeroes above the main diagonal is called
lower triangular. A matrix that is both upper and lower triangular is, of
course, diagonal.
12.1. Exercise. The product of two (compatible) upper triangular square
matrices is upper triangular. The product of two (compatible) lower triangular square matrices is lower triangular.
There is an important group of invertible matrices called orthogonal
matrices. They have the property that their inverse is their transpose:
that is, P −1 = P T . If you think of the columns of such a matrix as vectors,
the equation P T P = In implies that these vectors are perpendicular to each
other, and each has length 1. The same fact is true of the n rows of an
orthogonal matrix, stood up as columns.
12.2. Exercise. (i) The product of two orthogonal matrices (of the same
size) is itself orthogonal.
(ii) Any permutation matrix is orthogonal.
We will find occasion later in these notes to consider linear combinations
of powers of a square matrix. For an n × n matrix A we define for positive
exponent k the matrix Ak to be the obvious product of A by itself k times.
We define A0 to be the n × n identity matrix.
Given polynomial f (t) = ck tk + ck−1 tk−1 + · · · + c1 t + c0 we then define
f (A) by
f (A) = ck Ak + ck−1 Ak−1 + · · · + c1 A + c0 In .
Last on the list of things we need to define here is trace. That is a
function that takes a square matrix and returns the sum of the diagonal
entries of that matrix.


762
For instance trace9 3 0 = 10.
530
40
LARRY SUSANKA

5
12.3. Exercise. (i) A = W
5

T
7
6
Find AT .
(ii) Two of the following matrices must be equal, assuming the matrices
are compatible sizes. Circle these two.
(AB)T
AT B T
B T AT .


x
y
z
1/3 .
12.4. Exercise. A = 2/3 2/3
Find x, y and z so that
2/3 −1/3 −2/3
A is an orthogonal matrix. Then, without doing any further calculation,
produce A−1 .
12.5. Exercise. Find f (A) where f is the polynomial f (t) = 3t3 − 2t + 7
and A is the square matrix


5 7 2
A = 7 7 1 .
1 5 6
12.6. Exercise. Find all possible orthogonal matrices of the form
3/5 a
A =
.
b c
1 2 3
T
12.7. Exercise. Find trace C C where C =
.
4 5 6
13. Row Reduction.
Given any matrix


m1 1 m1 2 . . . m1 n
m2 1 m2 2 . . . m2 n 


M = (mi,j ) = 
..
..
.. 

.
.
. 
mk 1 mk 2 . . . mk n
we want to perform elementary row operations on this matrix by multiplying
it on the left by elementary row matrices, converting it into simpler form.
We will be interested in two special forms here.
The first of these is called row echelon form.
In any matrix, we will call the first nonzero entry (counted from the left)
the “leading coefficient” of the row it is in, and the zero entries to the left
of a leading coefficient will be called “leading zeroes” of that row.
A matrix is in row echelon form if any rows without leading coefficient
(that is, a row of zeroes) are on the bottom and each row after the first has
more ‘leading zeroes” than the row above it.
OVERVIEW OF REAL LINEAR ALGEBRA
41
Any matrix can be “reduced” to this form by consecutive left-multiplications
by elementary row matrices of type one only.
For many purposes, row reduced echelon form (shorthand rref ) is
more useful. A matrix is in that form if it is in row echelon form and each
leading coefficient is 1 and every leading coefficient is the only nonzero entry
in it’s column. These are called pivot columns.
The transformation to row reduced echelon form can be carried out by
consecutive left-multiplications by elementary row matrices of type one to
create a row reduced matrix with one nonzero entry in each pivot column,
followed by elementary row matrices of type two to clean up the pivots.
The row reduced echelon form of a matrix A is denoted rref (A).
The total number of operations (additions or multiplications) to carry
this out is on the order of nk 2 or n2 k, whichever is bigger.
A proof of these facts using induction on the number of columns in the
original matrix is not too hard to produce.
By hand, this procedure is arduous for 4 × 4 matrices and essentially
impossible for a human if the matrix is much larger. But these calculations
are quite manageable for any calculator.
There is a way to keep track of the steps taken by your calculator as it
produces the rref of a k × n matrix M .
We create a k×(n+k) block matrix ( M Ik ) and reduce this to rref. The
result will be a k × (n + k) block matrix ( S P ) where S is the rref for M .
The product of the elementary matrices used to accomplish the reduction
will accumulate in the right half of this block matrix, as the invertible k × k
matrix P .
P is the matrix for which P M = S, the rref for M .
Let’s look at a special case. If k = n and if the rref of ( M In ) has In
as the left block then the right block is M −1 . To reiterate: If the reduction
process takes j steps using elementary row matrices L1 , . . . , Lj and if
Lj Lj−1 · · · L2 L1 ( M
In )
= ( Lj Lj−1 · · · L2 L1 M
= ( In
Lj Lj−1 · · · L2 L1 In )
Lj Lj−1 · · · L2 L1 )
then Lj Lj−1 · · · L2 L1 is M −1 .
We will argue now that if M has an inverse then the rref for M must be
the identity. If not, then the rref would have a row of zeroes on the bottom
in the first block. But then
(0 0 . . . 0 0) = (0 0 . . . 0 1)Lj Lj−1 · · · L2 L1 M.
42
LARRY SUSANKA
The matrix A = Lj Lj−1 · · · L2 L1 M is the product of invertible matrices so
is itself invertible. This leads to the contradiction
(0 0 . . . 0 0) = (0 0 . . . 0 0) A−1 = (0 0 . . . 0 1) AA−1
= (0 0 . . . 0 1) In = (0 0 . . . 0 1).
So we have the important conclusion that M is invertible if and only if
its rref is the identity matrix and in that case both M and M −1 are the
product of elementary matrices.
The reduction process indicated above for an n × n matrix takes on the
order of n3 steps, the same order (surprisingly) as the number of steps needed
to multiply two n × n matrices.
13.1. Exercise.

1 4 7
A = 6 1 9 
0 4 −3

Find the inverse of A using row reduction on the block matrix ( A I3 ).
(You should be aware that what you are doing is left-multiplying by elementary row matrices, which accumulate in the right block.)
13.2. Exercise. The inverse of an invertible lower triangular matrix is lower
triangular. The inverse of an invertible upper triangular matrix is upper
triangular.
14. Matrix Form For Systems of Linear Equations.
The system of equations
m1 1 x1 + m1 2 x2 + · · · + m1 n xn
m2 1 x1 + m2 2 x2 + · · · + m2 n xn
..
..
..
.
.
.
mk 1 x1 + mk 2 x2 + · · · + mk n xn
= b1
= b2
..
.
= bk
can be converted to matrix form

   
m1 1 m1 2 . . . m1 n
x1
b1
m2 1 m2 2 . . . m2 n   x2  b2 

   
Mx = b ⇐⇒ 
..
..
..   ..  =  ..  .

.
.
.  .   . 
mk 1 mk 2 . . . mk n
xn
bk
Finding the column matrix x that makes the matrix equation true is
exactly the same as solving the system above it for x1 , . . . , xn .
OVERVIEW OF REAL LINEAR ALGEBRA
43
We also come to an interesting fact, thinking of M as a block matrix
with columns C1 , . . . , Cn . There is a solution to the matrix equation above
exactly when b is a weighted sum of the columns of M .
Mx = (C1 C2 · · · Cn ) x = x1 C1 + x2 C2 + · · · + xn Cn = b.
The numbers xi we are looking for are the weights on the columns.
If k = n so the matrix M is square and if M has an inverse we can find
the unique solution as x = M−1 b. In applications it often happens that
the entries in M are determined by fixed elements of the problem, but the
“target” b varies. Using this method, you can recycle the work needed to
create M−1 : find it once and then apply it to various columns b as required.
If M−1 = (H1 H2 · · · Hn ) then
x = (H1 H2 · · · Hn ) b = b1 H1 + b2 H2 + · · · + bn Hn .
So the solution for any b is given as a weighted sum of the columns of M−1 ,
with the entries of b as weights.
But if M has no inverse we can’t go this route.
The augmented matrix for the system is:


m1 1 m1 2 . . . m1 n b1
m2 1 m2 2 . . . m2 n b2 


 ..
..  .
..
..
 .
. 
.
.
mk 1 mk 2 . . . mk n bk
The steps needed to solve the system correspond, in matrix language, to
left multiplying the augmented matrix by various elementary matrices until
the product matrix is in rref.
If any row in this rref matrix is of the form 0 0 . . . 0 k where k is nonzero
then that row corresponds to the equation 0 = k so there is no solution: the
original system was inconsistent.
Any rows without pivots (all-zero rows) correspond to the comforting but
uninformative equation 0 = 0 and we can ignore them.
This leaves

0 ...
0 . . .
.
.
.
.
 ..
0 ...
us to consider rref matrices of the form
1 w 1 i1 . . .
0 w 1 i2 . . .
0 w 1 ik
0
0 ...
1 w 2 i2 . . .
0 w 2 ik
..
..
..
..
..
..
.
.
.
.
.
.
..
..
..
..
..
..
.
.
.
.
.
.
0
0 ...
0
0
...
1 w 1 ik
...
...
...

c1
c2 
.. 

. .
.. 
.
ck
In the matrix above, the pivot columns have only one nonzero entry. The
1 indicated in a pivot column is the first nonzero entry in its row, and every
other entry in its column is zero, and every entry which is to the left and
below this 1 is zero.
44
LARRY SUSANKA
We can now, if we wish, write down the equations to which this matrix
corresponds.
xi1 −1 + w1 i1 xii +
···
···
···
xi2 −1 + w1 i2 xi2 +
···
···
· · · =c1
···
···
· · · =c2
..
..
..
.
.
.
xik −1 + w1 ik xik + · · · =ck
The pivot variables xi1 −1 , xi2 −1 , . . . xik −1 occur only once each in the equations above and are the variables corresponding to the pivot columns. The
other variables are free parameters, and you can solve for the pivot variables
in terms of these.
14.1. Exercise. Linear systems can be readied for solution by matrix methods in two ways.
First, as a matrix equation:
Ax = b
where A is the coefficient matrix, x is the “variable” column, and b is the
“constant” column.
Second, by creating the augmented matrix
(A b).
Ready the three systems from Exercise 8.1 for solution by matrix methods
in these two ways.
15. Example Solutions.
This is awfully messy to look at, but there are ways of keeping things
organized. You should play with the two examples found below until you
understand everything about them: you must become expert at creating
and interpreting solutions like this. We will, repeatedly and for different
purposes, use both of the solution methods described.
For our first example, we will

1
A = 1
0
let A be the matrix

3 0 5 7 0 5
3 1 11 9 0 12
0 1 6 2 1 16
which corresponds to the augmented matrix for system of equations
x1 + 3x2
1
2
+
3
5x4 + 7x5
4
x + 3x + x + 11x + 9x
5
=5
= 12
x3 + 6x4 + 2x5 + x6 = 16
OVERVIEW OF REAL LINEAR ALGEBRA
45
Entering matrix A into your best-Linear-Algebra-friend (that is your calculator) and hitting it with the rref stick produces


1 3 0 5 7 0 5
1 6 2 0 7
rref (A) =  0 0 1 9
0 0 0 0 0 Define the 3 × 6 matrix D and column matrices x, p, C2 , C4 and C5
and C7 :

x1
x 2 


 1
 3
x
1 3 0 5 7 0
x 


x3 
D = 0 0 1 6 2 0 x = 
p
=
x 4 
 5
0 0 0 0 0 1
x6
x 
x6
 
 
 
 
5
7
5
3
C2 = 0 C4 = 6 C5 = 2 C7 = 7 .
9
0
0
0

The matrix equation
Dx = C7
is equivalent to the original system, and the solution, an equation for the
pivot variables in terms of free parameters x2 , x4 and x5 can given by
Method One
p = − x2 C2 − x4 C4 − x5 C5 + C7
(x2 , x4 , x5 are free parameters)
 1
 2
x
x
3



x
or, equivalently,
= − ( C2 C4 C5 ) x4  + C7 .
6
x
x5
46
LARRY SUSANKA
A different way to represent the solution is as follows. Let

 
−5
−3
0
1
 
 
−6
0



K2 =   and K4 = 
1
 
0
0
0
0
0
 
 
5
−7
0
0
 
 
7
−2

 
K5 = 
 0  and K7 = 0 .
 
 
0
1
0
9

There will be one of these columns for each non-pivot variable (the free
parameters) plus one more (though if C7 had been the zero column, so
the original system was homogeneous, K7 would be the zero column.) The
general solution is given by
Method Two
x = x2 K2 + x4 K4 + x5 K5 + K7
or, equivalently,
(x2 , x4 , x5 are free parameters)
 2
x

x = ( K2 K4 K5 ) x4  + K7 .
x5
You will note that each non-pivot variable is a free parameter, and is
associated with a column Ki that has 1 in a row where all of the different
Kj columns, including K7 , have 0. This will be important for us later: we
will say that K2 , K4 and K7 are linearly independent as a consequence of
this.
Let’s look at another example. Here A is the augmented matrix

0
0
A = 
0
0
1
8
8
0
2
0
1
8
9
6
3
3
5
9
6
9
1
0
5
9
3
4
1
1

5
5

7
4
OVERVIEW OF REAL LINEAR ALGEBRA
47
for the system of equations
x2 + 2x3 + 9x4 + 5x5 + x6 + 3x7 = 5
8x2
+
6x4 + 9x5
+
4x7 = 5
8x2 + x3 + 3x4 + 6x5 + 5x6 + x7 = 7
8x3 + 3x4 + 9x5 + 9x6 + x7 = 4
Delivering the rref encouragement produces


1
0 0 0 0
578/543 −176/543 207/181
0 0 1
0 0 1198/543 −346/543 228/181 
.
rref (A) = 
0 0 0 1
0
73/1629 269/1629 349/543 
1 −530/543 338/543 −161/181
0 0 0 0 Define the 4 × 7 matrix D and column matrices x, p, C1 , C6 and C7 and
C8 :


0 1 0 0 0 578/543 −176/543
0 0 1 0 0 1198/543 −346/543

D = 
0 0 0 1 0 73/1629 269/1629 
0 0 0 0 1 −530/543 338/543
 1
x
x2 
 2
 
 3
x
0
x 
 4
x3 
0



x = 
p = 
C1 = 
x5 
x4 
0
x 
 6
0
x5
x 
x7






578/543
−176/543
207/181
 1198/543 
−346/543
 228/181 





C6 = 
 73/1629  C7 =  269/1629  C8 =  349/543  .
−530/543
338/543
−161/181
The matrix equation
Dx = C8
is equivalent to the original system, and the solution, an equation for the
pivot variables in terms of free parameters x1 , x6 and x7 can given by
48
LARRY SUSANKA
Method One
p = − x1 C1 − x6 C6 − x7 C7 + C8
(x1 , x6 , x7 are free parameters)
 2
 1
x
x
x 3 



or, equivalently,  4  = − ( C1 C6 C7 ) x6  + C8 .
x
x7
x5
A different way to represent the solution is as follows. Let
 


1
0
0
 −578/543 
 


0
−1198/543
 





K1 = 
0 and K6 =  −73/1629 
0
 530/543 
 


0


1
0
0




0
0
 207/181 
 176/543 




 228/181 
 346/543 







K7 = 
−269/1629 and K8 =  349/543  .
−161/181
 −338/543 








0
0
0
1
There will be one of these columns for each non-pivot variable (the free
parameters) plus one more (though if C8 had been the zero column, so
the original system was homogeneous, K8 would be the zero column.) The
general solution is given by
Method Two
x = x1 K1 + x6 K6 + x7 K7 + K8
or, equivalently,
(x1 , x6 , x7 are free parameters)
 1
x
x = ( K 1 K 6 K 7 ) x 6  + K 8 .
x7
Each non-pivot variable is a free parameter, and is associated with a column Ki that has 1 in a row where all of the different Kj columns, including
K8 , have 0. None of these Ki can, therefore, be written as a linear combination of the others.
OVERVIEW OF REAL LINEAR ALGEBRA
49
Now I will grant you that the work above is fairly ugly. But no calculations
were done by hand. The work involved nothing more than writing down
entries and keeping track of what it meant.
15.1. Exercise. Solve the systems mentioned in Exercise 14.1 and give solutions, for those with free parameters, in the two forms shown in the example
solution.
15.2. Exercise. Solve the following systems in the two forms shown in the
example solution.
(i) First System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5
7x1 − 6x2 + x3 + 7x4 − 8x5
9x1 − 4x2 + 6x3 + x4 − x5
10x1 − 4x2 + 6x3 + 14x4 − 16x5
16x1 − 10x2 + 7x3 + 8x4 − 9x5
x1 − x2 + 7x3 + 8x4 − x5
11x1 − 5x2 + 13x3 + 22x4 − 17x5
=
=
=
=
=
=
=
1
9
10
10
19
1
13
(ii) Second System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5
7x1 − 6x2 + x3 + 7x4 − 8x5
9x1 − 4x2 + 6x3 + x4 − x5
10x1 − 4x2 + 6x3 + 14x4 − 16x5
16x1 − 10x2 + 7x3 + 8x4 − 9x5
x1 − x2 + 7x3 + 8x4 − x5
=
=
=
=
=
=
1
9
10
10
19
1
(iii) Third System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5
7x1 − 6x2 + x3 + 7x4 − 8x5
9x1 − 4x2 + 6x3 + x4 − x5
10x1 − 4x2 + 6x3 + 14x4 − 16x5
16x1 − 10x2 + 7x3 + 8x4 − 9x5
=
=
=
=
=
1
9
10
10
19
(iv) Fourth System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5
7x1 − 6x2 + x3 + 7x4 − 8x5
9x1 − 4x2 + 6x3 + x4 − x5
10x1 − 4x2 + 6x3 + 14x4 − 16x5
=
=
=
=
1
9
10
10
(v) Fifth System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1
7x1 − 6x2 + x3 + 7x4 − 8x5 = 9
9x1 − 4x2 + 6x3 + x4 − x5 = 10
50
LARRY SUSANKA
(vi) Sixth System:
(vii) Seventh System:
3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1
7x1 − 6x2 + x3 + 7x4 − 8x5 = 9
3x1 + 2x2 + 5x3 + 7x4 − 8x5 = 1
16. Determinants Part One: The Laplace Expansion.
We are going to discuss a way of creating a number for each square matrix
called the determinant of the matrix.
First, the determinant of a 2 × 2 matrix M = mij is m11 m22 − m12 m21 .
This number is denoted det(M). So, for instance
45
det
= 4 · 2 − 5 · 3 = −7.
32
We reduce the task of calculating one n×n determinant to the calculation
of n different (n − 1) × (n − 1) determinants. These smaller determinants
are themselves broken up and the process continues until we arrive at 2 × 2
determinants at which point the final answer is calculated as a gigantic
weighted sum of many 2 × 2 determinants.
The procedure, called Laplace expansion, requires the selection of one
row or one column in the n × n matrix. It is by no means obvious that the
answer will not depend on which row or column you pick for this step. That
the answer will not depend on this choice requires a proof and that proof
is pretty involved and would take too much time for us here. So we punt:
you may learn the proof, if you wish, in your next Linear Algebra class. I
would be happy to direct any student who can’t rest without tieing down
this loose end to a readable source.
As important as determinants may be in the great scheme of things, they
are somewhat of a side issue for us in this particular class. I will spend
exactly one class day on this and the next section, outlining the facts about
determinants you need to know. I do advise you leave it at that for now.
Examine the pattern in the matrix below:


+ − + − ···
− + − + · · ·


+ − + − · · ·


.. .. .. ..
. . . . ···
When the sum of row and column number is even you have “+” and when
that sum is odd you have “−.”
OVERVIEW OF REAL LINEAR ALGEBRA
by
51
Now suppose we wish to find the determinant of n × n matrix M given

m1 2 . . . m1 n
m2 2 . . . m2 n 

.. .
..
..
.
.
.
mn 1 mn 2 . . . mn n
We pick any row or column. In practice you will look for a row or column
with lots of zeroes in it, if there is one. Since we have to pick one, let’s pick
row 2.

m1 1
 m2 1

 ..
 .
We look at the first entry in that row. It is in a spot corresponding to
a minus sign in the “sign matrix.” We affix the minus sign to m2 1 and
multiply that entry by the determinant of the matrix obtained by deleting
from M the row and column of m2 1 .
We proceed in this way across the row, affixing either a “+” or “−” to
the entry there and multiplying by the determinant obtained by deleting
from M the row and column of that entry. We then add up these n different
(smaller) weighted determinants.
Here is an example.


9
2 7
det  3 −6 1
−8 2 3
2 7
9 7
9 2
= (−1)(3) det
+ (−6) det
+ (−1)(1) det
2 3
−8 3
−8 2
= (−3)(6 − 14) + (−6)(27 + 56) − (18 + 16) = −508.
You just have to find a few 3 × 3 or 4 × 4 determinants to get the idea.
We reiterate: doing the analogous calculation, expanding around any
other row or any column, will produce the same number for the matrix M .
17. Determinants Part Two: Everything Else You Need to Know.
A permutation of the set {1, . . . , n} is a one-to-one and onto function
σ : {1, . . . , n} → {1, . . . , n}.
You can conceive of them as a “switching around” of the order of these integers. There are a lot of permutations. Actually, there are n! of them. That
means, for instance, there are more than 1025 ways of “switching around”
the first 25 integers.
Each permutation σ can be built in stages by switching a pair of numbers
at a time until they are all sent to the right place. It is a fact (fairly
hard to show) that if you can build a permutation by an even number of
pair-switches then every way it can be built will require an even number
52
LARRY SUSANKA
of switches. This implies that if you can build a permutation by an odd
number of pair-switches then every way it can be built will require an odd
number of switches. A permutation is called “even” or “odd” depending on
this.
We assign the number 1 or −1 to a permutation depending on if the
permutation is even or odd. The notation sgn(σ) is used for this assignment,
and it is called the “signum” of the permutation σ.
The determinant of an n × n matrix A is defined to be
X
sgn(σ) a1 σ(1) a2 σ(2) · · · an σ(n) .
det(A) =
all permutations σ
This sum is over all possible ways of picking one entry from each row and
each column in the matrix A, with a minus sign attached to half of these
possible choices.
Using the definition directly is not really a feasible means of calculating
a determinant, though you might give it a try when n = 3. This definition
is good for proving things about determinants.
Some of the things you prove are better methods of calculating determinants, or avoiding a calculation entirely.
We learned in the last section that an n × n determinant can be “expanded” around any row or any column, given as the “signed sum” of n
smaller determinants. The proof that you can do this comes from careful
examination of the definition above. It may be proved, for instance, by
induction on the size of the determinant.
The method, Laplace expansion, let’s us avoid thinking about permutations. But it is no better than the original definition as far as how many
steps are required to calculate a determinant is concerned. It takes on the
order of n! steps too.
So we need a better way, and finding that way has the added benefit
of giving us new facts about determinants. Once again, these facts are all
proved by careful examination of the original definition of determinant.
Fact 1: If A is square, det(AT ) = det(A).
Fact 2: If you multiply any row or any column of a square matrix by
the number k, the determinant changes by factor k.
Fact 3: If you switch two rows or two columns of a square matrix you
change the determinant by a factor of −1. So if two rows (or two columns)
are multiples of each other, the determinant must be 0.
OVERVIEW OF REAL LINEAR ALGEBRA
Fact 4: If every
sum, a determinant
nants:

s 1 1 + t1 1
 s 2 1 + t2 1

det 
..

.
53
entry in the first column of a square matrix contains a
of that matrix can be found as a sum of two determi-

m1 2 . . . m1 n
m2 2 . . . m2 n 


..
..

.
.
sn 1 + tn 1 mn 2 . . . mn n




s1 1 m1 2 . . . m1 n
t1 1 m1 2 . . . m1 n
s2 1 m2 2 . . . m2 n 
t2 1 m2 2 . . . m2 n 




= det  ..
+
det
 ..

..
..
..
..  .
 .


.
.
.
.
. 
sn 1 mn 2 . . . mn n
tn 1 mn 2 . . . mn n
By looking at transposes and switching another column with the first, we
see that the corresponding fact is true if we have any row or column whose
entries are written as a sum.
Fact 5: If you recall our discussion of row echelon form, we can create
that form by elementary row operations solely of type one: “add a multiple of
one row to another.” Doing one of these operations on a square matrix does
not change its determinant. You can do the same thing (examine transposes)
by using elementary column operations of the type “add a multiple of one
column to another.” If the matrix is n × n the final result in either case is
a triangular matrix that has the same determinant as the original matrix.
Fact 6: The determinant of an n × n triangular matrix A is the product
of the diagonal entries: only one term in the determinant definition sum is
nonzero. That term is a1 1 a2 2 · · · an n .
Fact 7: If a square matrix M is broken into blocks and among these
blocks are square blocks D1 , . . . , Dk arranged along and exactly covering
the main diagonal of M , and if all entries beneath each of these blocks down
to the nth row are zero then
det(M ) = det(D1 ) · · · det(Dk ).
This is, of course, a generalization of Fact 6 and is proved using Facts 5 and
6.
Fact 8: Any n × n matrix A can be reduced to an upper triangular
matrix R by H1 · · · Hk A = R where each Hi is an elementary row operation
matrix of type one. Similarly (look at transposes) any n × n matrix B can
be reduced to a lower triangular matrix S by BC1 · · · Cm = S where each
Ci is an elementary column operation matrix of this simple type.
So
det(A) = det(H1 · · · Hk A) = det(R) = r1 1 r2 2 · · · rn n
det(B) = det(BC1 · · · Cm ) = det(S) = s1 1 s2 2 · · · sn n .
and
54
LARRY SUSANKA
This reduction process takes on the order of n3 steps, much better than
n!.
Fact 9: The matrix RS from above is diagonal with nonzero entries
r1 1 s 1 1
r2 2 s 2 2
...
rn n s n n
running down along its main diagonal. This means
det(AB) = det(H1 · · · Hk A BC1 · · · Cm )
= det(RS) = r1 1 s1 1 r2 2 s2 2 · · · rn n sn n .
So we conclude that the following very important equation holds for any
square (compatible) matrices A and B:
det(AB) = det(A) det(B).
Fact 10:
If A has an inverse then
1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ).
So if A has an inverse then det(A) 6= 0 and det(A−1 ) = (det(A))−1 . On the
other hand, if det(A) = 0 then A cannot have an inverse.
17.1. Exercise.
Calculate


−8
2 0 7
9
 0
6 7 4
1



det −8w 2w 0 7w 9w
.
 0
0 0 2
5
8
0 0 1
5
17.2. Exercise. C is the elementary 4 × 4 matrix that adds twice the second
row to the third. D is the elementary 4 × 4 matrix that multiplies the second
row by 9. Also, for the following 4 × 4 matrices A and B we know that
det(A) = 6
and
det(B) = 2.
For each of the following, calculate the determinant or indicate that there
is not enough information to determine the answer.
(i) det(AB) =
(ii) det(5B) =
(iv) det(A−1 ) =
(v) det(B T ) =
(vii) det(CA) =
(viii) det(DB) =
(iii) det(A + 7B) =
(vi) det(ABA−1 ) =
OVERVIEW OF REAL LINEAR ALGEBRA
55
18. Linear Transformations from Rn to Rm .
A linear transformations from Rn to Rk is a function f with domain Rn
and with range in Rk , indicated by notation f : Rn → Rk , which satisfies:
f (u+cv) = f (u)+c f (v) for all members u and v in Rn and any constant c.
“Left multiplication by an n × k matrix” is the prototypical linear
transformation from Rn to Rk , and (this is really important) any linear
transformation f from Rn to Rk , however it might have been presented to
you, actually is left multiplication by the k by n matrix
M = f (e1 ) · · · f (en ) .
This formula f (x) = M x is easily seen to be true when x is one of the ei .
The linearity of f then shows that the equation is true for any x in Rn .
f (x) = f (xi ei ) = xi f (ei ) = M x.
The matrix M is called the matrix of f .
A linear transformation is completely determined by what it does to the
basis vectors so if two linear transformations agree on the basis vectors they
agree on any vector: they are the same linear transformation.
That provides a rather straightforward way to see if a function given by
some formula is linear or not. Evaluate f (ei ) for i = 1, . . . , n and create
matrix M . See if the formula f (x) for generic x is the same as M x. If they
are equal, f is linear. If they aren’t, it’s not.
A particularly simple (and therefore important) example of a linear transformation is a linear function f : Rn → R. The matrix of a function like that
is 1×n, a row matrix. Obviously these look an awful lot like members of Rn ,
which are column matrices. The set of these row matrices is denoted Rn∗
and called the dual of Rn . Members of Rn∗ are called linear functionals
when they are thought of as matrices for a linear transformation on Rn .
More generally, any function f : Rn → Rm has m different coordinate
functions f 1 , . . . , f m given by
 1

f (x)


f (x) =  ...  .
f m (x)
The function f : Rn → Rm is linear exactly when all
m coordinate functions are linear functionals.
56
LARRY SUSANKA
A linear transformation is nothing more than a higher-dimensional version
of direct variation. You may recall that a real variable y is said to vary
directly with a real variable x that there is a constant k for which y = kx.
If you know a single nonzero point on the graph of this relationship you
can pin down k and then you know everything about the variation. It is a
straight line through the origin with slope k.
For a linear transformation from Rn to Rm each of the m different range
variables y 1 , . . . , y m is directly proportional to each of the n different domain
variables x1 , . . . , xn . There are mn different variation constants, one for
each pair of variables, and these are the entries of the matrix of the linear
transformation.
18.1. Exercise. If f : Rk → Rm and g : Rm → Rn are both linear then so is
f ◦ g : Rk → Rn . If M f is the matrix of f and M g is the matrix of g then
M f M g is the matrix of f ◦ g.
Here are some very important linear transformations:
(1) One empty column (or row) ina determinant.
For in
· 2 6
stance given the “mostly filled” matrix  · 8 3 we can define
· 8 1
3
g : R → R by

 1
v 2 6
g(v) = det v 2 8 3 .
v3 8 1
(2) Dot product against a fixed vector w. This is the function
f : Rn → R given by f (v) = w · v.
(3) Projection onto a line through the origin containing a
v·w
w.
vector w. This is given by P rojw (v) = w·w
(4) Projection onto a plane (or hyperplane) perpendicular to a
vector w. The formula for this is CoP rojw (v) = v − P rojw (v).
(5) Reflection in a plane (or hyperplane) perpendicular to a vector w. We calculate this by Ref lw (v) = v − 2 P rojw (v).
(6) Inversion given by Inv(v) = −v.
(7) Rotation in R2 counterclockwise by angle θ given by
Rotθ (v) = ( v 1 cos(θ) − v 2 sin(θ), v 1 sin(θ) + v 2 cos(θ) ).
OVERVIEW OF REAL LINEAR ALGEBRA
57
(8) Rotation in R3 by angle θ around the e3 axis given by
Rot θ, e3 (v) = ( v 1 cos(θ) − v 2 sin(θ), v 1 sin(θ) + v 2 cos(θ), v 3 ).
18.2. Exercise. Create matrices for the eight functions above, using w =
(1, 2, 3) for (2)-(5), and verify that all eight are linear. Note that if w is a
unit vector the matrix of P rojw is w wT . More generally, if w is nonzero
T
the matrix of P rojw is wwTww .
19. Eigenvalues.
We define eigenvalues and eigenvectors of a linear transformation f
and associated matrix M when range and domain are both Rn .
A real number λ is called an eigenvalue for M (or for f if M is the matrix
of a linear transformation) if there is a nonzero vector x for which M x = λx.
M x = λx is equivalent to
λx − M x = (λIn − M ) x = 0.
There will be a nonzero solution to this equation exactly when
det(λIn − M ) = 0.
This determinant is an nth degree polynomial in λ, called the characteristic polynomial, which could have real roots: the real eigenvalues. If n is
odd, it will always have at least one. So the first thing to do is find those
roots. Since finding the roots of an nth degree polynomial algebraically is
an arduous, perhaps impossible, task you will often in practice be forced
to estimate these eigenvalues using hardware. Your graphing calculator can
get numerical estimates for the roots of this polynomial. If the polynomial
has nice rational roots, as many rigged example problems do, graphing this
polynomial obtained from the determinant function built into your calculator and seeing where the graph crosses the x axis will enable you to find all
of the eigenvalues exactly.
For a particular eigenvalue λ, the set of solutions for M x = λx is called
the eigenspace for M and the eigenvalue λ. Every vector (except the zero
vector) in an eigenspace for eigenvalue λ is an eigenvector for λ.
The eigenspace for eigenvalue 0 is a set we have run into before. It is the
solution space to the homogeneous system determined by matrix
M. More generally, the eigenspace for eigenvalue λ is the solution space to
the homogeneous system determined by matrix λIn − M .
On an eigenspace, the linear transformation has essentially trivial effect.
It acts on any vector there by simply multiplying by a constant. That is a
very easy process to understand.
58
LARRY SUSANKA
To actually find eigenvectors for a known eigenvalue λ you solve (λIn −
M )x = 0. The columns associated with free parameters in our “Method
Two” solution are eigenvectors, and any eigenvector for this eigenvalue is a
linear combination of these columns.
If we are forced to use approximate eigenvalues, these columns will of
course only be “approximate” eigenvectors. Thinking about what “approximate” means can be a bit tricky. Practical folk such as Engineers spend a
lot of time thinking about this in applied mathematics classes.
In the last six examples of the last section, the domain and range of the
transformation are the same, so they might have eigenvalues and eigenvectors12.
19.1. Exercise. Find eigenvectors and eigenvalues for the linear functions
(3)-(8), whose matrices you created in Exercise 18.2.
19.2. Exercise. Find eigenvectors and eigenvalues for F (x) = Ax and
G(x) = Bx and H(x) = Cx where
4 −2
0 −1
6
7
A=
and B =
and C =
.
1 1
1 0
−2 −3
19.3. Exercise. Find eigenvectors and eigenvalues for L(x)
N (x) = Bx and K(x) = Cx where

−8 2 0





1 2 8
−12 12 14
0 6 7
A = 0 12 0 B =  −6 6 7  , C = 
 0 0 −2
0 0 0
2 1 1
0
0 1
0 0 0
= Ax and
7
4
6
2
2

9
1

9
.
5
5
20. Real Vector Spaces and Subspaces.
After becoming familiar with Rm and related linear functions we now
make definitions of objects that have features similar to these. Many of
the problems you will be working on for a while are designed to see if the
definitions we make apply in a given situation or not. Mathematicians often
argue about definitions. Good definitions make hard things easy to think
about.
We make these abstract definitions for the following simple and practical
reason. It was observed how useful the theorems about Calculus and Geometry and angles and solving linear equations and so on could be in Rn .
It was observed that only a few properties of Rn and dot product were ever
used in proving these theorems. It was observed that many of the objects
12If you think carefully about the meaning of eigenvalues and eigenvectors you can
determine them without a calculation in each of these cases. Be sure, however, to actually
do the calculation for inversion and a variety of reflections, projections and rotations.
OVERVIEW OF REAL LINEAR ALGEBRA
59
that kept popping up in disparate applications had all these properties. So
all those theorems would be valid without change for any of these objects,
and not just Rn !
You pay a price “up front” to learn these definitions, which might themselves have subtle aspects, but once these are learned the problems you
usually encounter are simpler to think about and solve.
We now define a vector space and subspaces of these vector spaces. These
are objects that “act like” Rn and lines and planes through the origin.
A real vector space is a nonempty set V together with two operations
called vector addition and scalar multiplication that satisfy a collection
of properties. Vector addition “acts on” pairs of members of V . Scalar
multiplication “acts on” a pair one of which is a real number and the other
of which is a member of V .
We insist that both vector addition and scalar multiplication must be
closed in V, and by that we mean that the result of applying these operations on members of V or real numbers always produces a member of V .
You cannot leave V by doing these operations to members of V and real
numbers.
There must be a distinguished member of V , always denoted 0, for which
v + 0 = v for all members v of V . You distinguish this member of V from
the real number 0 by context. They are different, unless V = R.
For each v in V there must be a member u of V for which v + u = 0. u
can be denoted −v, and is called the negative or opposite of v.
Vector addition must be commutative and associative: that is, v + w =
w + v and (v + w) + u = v + (w + u) for any members v, u and w of V .
The two distributive laws must hold: (r + s)v = rv + sv and r(v + w) =
rv + rw for all real numbers r and s and any members v and w of V .
Finally, (rs)v = r(sv) and 1v = v for all real numbers r and s and any
v in V .
To show that some situation in the world (such as the collection of arrowvectors we examined) “looks like” or “is” a vector space, you must define
two operations and show (or just assume and see how that works out) that
the ten properties are true.
In a more abstract setting, to show a set with two operations is a vector
space (or not) requires one to check that all ten requirements are actually
true (or not) for these operations. Usually it will be pretty obvious if a
condition is true, and counterexamples easy to come by if false.
A subspace of a vector space V is a subset W of V which is, itself, a
vector space with the operations it inherits from V .
60
LARRY SUSANKA
To show a subset W of a vector space V is a subspace you need only
show that scalar multiplication and vector addition are closed in W .
The other eight conditions, required of the operations on W , are automatically true if you (somehow) already know that V is a vector space.
The set containing just the zero vector is a subspace of every vector space.
Any one-element vector space is called a trivial vector space.
Also (obvious but worth mentioning) every vector space is a subspace of
itself.
R itself is a very simple real vector space.
Suppose D is any nonempty set and let V be any vector space. Define
VD to be the set of all functions with domain D and range contained in V .
Define pointwise addition and pointwise scalar multiplication on V D
in the obvious way:
If f, g ∈ V D and c ∈ R
define f + g to be the function (f + g)(d) = f (d) + g(d) for all d ∈ D
and define (cf )(d) = cf (d) for all d ∈ D.
V D is a vector space with pointwise addition and scalar multiplication.
All of the “abstract” vector spaces you will actually use for something,
both in this text and almost anywhere else13, are subspaces of vector spaces
of this type. Once you identify a nonempty subset of such a space, all you
need to do is verify closure of the two operations and you can conclude your
subset is a vector space in its own right.
The set of matrices of a specific shape with usual operations forms a
vector space. We will often denote the m × n matrices with real entries by
Mm×n .
Rn = Mn×1 , of course, provides the prototypical example of this type.
The diagonal, upper triangular and lower triangular matrices form subspaces of Mm×n .
Remember that a matrix is defined by its entries. We choose to visualize those entries in a rectangular array. That helps us organize complex
operations such as matrix multiplication or row reduction in an efficient
way. But it is the subscripted real entries that define the matrix. So if
D = { (i, j) | 1 ≤ i ≤ m and 1 ≤ j ≤ n and i, j ∈ N } a matrix is nothing more than a function m : D → R. The big rectangular symbol used
to denote this function is not truly relevant. Matrix addition and scalar
multiplication are the pointwise operations defined above on RD .
13Traditionally a student is exposed to one or two others as odd examples or counterexamples with “surprise” value. It will be very rare to encounter any of them in any
applications.
OVERVIEW OF REAL LINEAR ALGEBRA
61
The set of square n × n matrices is an important special case. The sets of
symmetric, skew symmetric and traceless (i.e. trace equals zero) matrices
are each subspaces of this vector space.
The set of real valued functions whose domain is a specified interval of
the real line is an important example. The continuous functions with this
domain, the set of differentiable functions, the set of polynomials and the
set of polynomials of third degree or less (again, with this domain) are four
subspaces of this vector space.
We will denote the set of real polynomials in variable t defined on R by
P(t). The polynomials of degree at most n will be denoted Pn (t).
We use this particular kind of function space a lot here: they are fairly
familiar to most people taking this class and provide some examples that
are clearly unlike Rn in some ways.
A real sequence is a function f : N → R. The set of convergent real
sequences is a subspace of RN with pointwise operations.
A formal power series in one real variable is a formal sum P
(we do not
n
presume convergence of the sequence of partial sums) of the type ∞
n=0 an t .
We add two of these power series and multiply by scalars in the obvious way,
by adding like powers and distributing a scalar factor across all terms in a
formal series. This example is nothing more than an interpretation of the
last example. The power series is determined by the sequence a : N → R
used to create it, and addition and scalar multiplication match pointwise
operations on the sequences. The set of power series that converge on a
fixed symmetric interval around 0 constitute a vector subspace of the formal
power series.
There are many other examples, libraries full of them, each one requiring
its own setup. Just to give you a taste, we examine a rather odd one.
Let V be the set of positive real numbers. Define operation ⊕ on V by
v ⊕ w = vw. Define a scalar multiplication ⊛ by r ⊛ v = v r for any real
number r and member v of V . Then V is a vector space with ⊕ for vector
addition and ⊛ as scalar multiplication.
20.1. Exercise. Show, by verifying all ten properties explicitly, that the
function space V D , where D is a nonempty set and V is a real vector space,
is itself a real vector space as claimed above with pointwise addition and
scalar multiplication.
20.2. Exercise. Suppose given a continuous function g : [0, ∞) → R. Let
RS ∞denote the set of all those real-valued functions f : [0, ∞) → R for which
0 f (t)g(t)dt exists.
20.3. Exercise. Show that the intersection of two subspaces (of the same
vector space) is a subspace.
62
LARRY SUSANKA
20.4. Exercise. Decide the exact conditions under which the union of two
subspaces (of the same vector space) is a subspace.
20.5. Exercise. We define the sum of two subspaces U and W of the
vector space V to be the set { u + w | u ∈ U and w ∈ W }. We denote this
set U + W. Show that U + W is a subspace of V .
20.6. Exercise. Let A = { (x, y) ∈ R2 | y = ±x }. Is A a subspace of R2 ?
20.7. Exercise. Let B = { (x, y, z) ∈ R3 | 2x − 3y + 4z = 1 }. Is B a
subspace of R3 ?
20.8. Exercise. Let C = { (0, 0) } ∪ { (x, y) ∈ R2 | x2 + y 2 > 1 }. Is C a
subspace of R2 ?
20.9. Exercise. Let D = { (x, y) ∈ R2 | x2 + y 2 < 1 }. Is D a subspace of
R2 ?
20.10. Exercise. Let F = { (x, y) ∈ R2 | (x, y) · (1, 7) = 0 }. Is F a subspace
of R2 ?
20.11. Exercise. Suppose p and v are fixed members of Rn . Let
G = { x ∈ Rn | x = p + tv for some t ∈ R }.
When is G a subspace of Rn ?
20.12. Exercise. Suppose A is a fixed 3 × 3 matrix. Let
Is H a subspace of R3 ?
H = { x ∈ R3 | Ax = 7x }.
20.13. Exercise. Suppose A is a fixed 3 × 3 matrix. Let
K = { x ∈ R3 | Ax = 0 }.
Is K a subspace of R3 ? What does this have to do with a solution set for a
homogeneous system?
20.14. Exercise. Let G be the set of polynomials in variable t of even degree.
(Constant functions are of degree zero.) Is G a vector space?
20.15. Exercise. Let
Is W a subspace of R3 ?
W = { (x, y) ∈ R2 | y ≥ 0 }.
20.16. Exercise. Show that the example operations defined at the end of
text of Section 20 above make [0, ∞) into a real vector space.
20.17. Exercise. (i) Consider the set R2 together with “scalar multiplication” given by c ⊙ x = 0 for all real c and all vectors x, together with the
usual vector addition. Which of the ten properties that define a vector space
hold for these two operations?
(ii) Now consider the same question when also ordinary vector addition
is replaced by the operation v ⊕ w = 0 for all vectors v and w.
OVERVIEW OF REAL LINEAR ALGEBRA
63
20.18. Exercise. Consider the set R2 together with ordinary scalar multiplication and “vector addition” given by the operation v⊕w = (v 2 +w1 , w2 +v 1 )
for all vectors v and w. Which of the ten properties that define a vector space
hold for these two operation?
21. Basis for a Vector Space.
A linear combination of vectors is a finite sum of numerical multiples
of vectors, all from the same vector space.
The span of a set S of vectors in a vector space V is the set of all
vectors that can be written as a finite sum
a1 v 1 + · · · + ak v k
ai are scalars, vi are in S.
The span of a set of vectors S, denoted Span(S), is always a subspace of
V . We say that S spans V if Span(S) = V .
We define linear dependence and its negation, linear independence.
A set of vectors is linearly dependent if at least one of these vectors can
be written as a linear combination of the others. Linear dependence can
be phrased in terms of the existence of a nontrivial solution to a certain
homogeneous system of equations.
A set S of vectors is linearly independent if whenever v1 , . . . , vk
are different members of S the equation
x1 v 1 + · · · + xk v k = 0
has only the zero solution
x1 = x2 = · · · = xk = 0.
S will be linearly dependent if there are distinct (that is, different)
members v1 , . . . , vk of S for which the equation above has a nontrivial (that
is, not all xi are zero) solution.
Of course any set containing the zero vector is linearly dependent. A
set containing just a single nonzero vector is linearly independent. Any
set containing two vectors where one is a multiple of the other is linearly
dependent. But any set containing just two vectors for which neither one is
a multiple of the other is linearly independent.
In context, the repetitive word “linearly” is often dropped, and we refer,
simply, to dependent or independent sets of vectors.
We now define a basis of a vector space and the closely related and
computationally convenient concept of ordered basis.
A basis for a vector space V is a linearly independent set of vectors that
spans V.
64
LARRY SUSANKA
An ordered basis is a listing of the different members of a basis in a
particular order. We will almost always use ordered bases. Sometimes we
will refer to an ordered set of vectors as a list.
We have a theorem (proved below) that states that if a vector space has a
basis, and there is a finite number of vectors in a basis of that vector space,
then that number cannot vary. This number is called the dimension of the
vector space. It is denoted dim(V).
If V is the trivial vector space it is said to be zero dimensional. If V is
nontrivial but has no finite basis, V is said to be infinite dimensional.
It is convenient to observe that any set of nonzero vectors S (which may
be a dependent set) can be trimmed to contain a basis for Span(S)—select
independent vectors from S, one at a time, till no independent directions
remain.
So every nontrivial vector space has a basis, which may be selected from
any spanning set.
On the other hand, if T is any linearly independent set of vectors in a
vector space V then T is contained in a basis for V . You “grow” the basis
from T by adding independent vectors from V one at a time until there are
no independent directions left in V .
So a vector space cannot have dimension exceeding that of any containing
space.
Any linearly independent set in a vector space can be extended to a
basis of the space. Any spanning set contains a basis.
These two important theorems remain true for infinite dimensional spaces,
even though the somewhat easier proofs we could create in a finite dimensional setting fail. The mathematics required to handle these cases efficiently
is beyond the scope of our goals here, and we leave these proofs for later
classes.
This lacuna is less of a problem than you might suppose: most of the
vector spaces you will see can be (or are) given explicitly in terms of a basis
from the outset and the benefits of (typically nonconstructive) existence
proofs in practice are (usually) minimal.
It is very important to note that if S is an independent set of vectors
then each member of Span(S) can be written in only one way as a linear
combination of members of S, except for order of terms in the sum. If S
is a basis of V , so Span(S) = V , then every member of V can be written
in one and only one way (except for order) as a linear combination of
vectors from S.
21.1. Exercise. Prove the last boxed statement above.
OVERVIEW OF REAL LINEAR ALGEBRA
65
21.2. Exercise. Which of the following are linearly independent sets?
{ (1, 2) }.
{ (1, 2), (2, 4) }.
{ (3, 6, 9), (11, 14, 26), (1, 2, 8) }.
21.3. Exercise. Is the span of the following set all of R3 ?
{ (4, 5, 2), (1, 6, 2), (10, 22, 8) }.
21.4. Exercise. Suppose S is a set of vectors and a1 s1 + · · · + ak sk is
a nontrivial linear combination involving members si ∈ S. Let us suppose
further that all ai 6= 0. Let T be the set consisting of all members of S except
for s1 , which has been replaced by a1 s1 + · · · + ak sk . Show that Span(S) =
Span(T ).
Note that rref (A) is obtained from matrix A by replacing rows of A, one
after another, by combinations of rows of just this type. Conclude that the
span of the rows of A is the same as the span of the rows of rref (A).
21.5. Exercise. Let S be the set of vectors
S = { (1, 3, 6, 2), (4, 3, 6, 2), (5, 6, 12, 4), (6, 9, 18, 6), (8, 1, 7, 1) }.
Find a very nice basis for Span(S), defined (somewhat subjectively, but you
know one when you see one) to be a basis whose vectors have lots of zero
entries and few fractional entries and which are obviously independent.
21.6. Exercise. Let T be the set of polynomials
T = { t2 − 3t, t2 + 3t, 7t }.
Is T a basis for Span(T )? If yes, prove it. If not, find a basis for Span(T ).
21.7. Exercise. Find a basis for the vector space
{ at2 + b(t2 + t) + ct ∈ P (t) | a, b, c, d ∈ R }.
21.8. Exercise. We know that the diagonal matrices, the upper triangular matrices, the symmetric matrices, the skew symmetric matrices and the
traceless matrices all form subspaces of Mn×n . What is the dimension of
each of these subspaces?
21.9. Exercise. Does { Sin(t), et } span
{ a Sin(t) + b et + c Sin(t)et | a, b, c ∈ R }?
21.10. Exercise. Let P (t) denote the set of all polynomials in variable t.
Let W be the set of polynomials
W = { f ∈ P (t) | f (1) = 0. }.
Is W a subspace of P (t)? If no, prove it. If yes, find a basis for W .
66
LARRY SUSANKA
21.11. Exercise. Prove that the polynomial tn is not in the span of
{ 1, t, . . . , tn−1 }.
21.12. Exercise. Suppose B1 , . . . , Bi is a linearly independent list of vectors
in Rn and suppose v1 , . . . , vt is a linearly independent list of vectors in Ri .
So it must be that t ≤ i ≤ n.
Let M be the n × i block matrix given by
M = (B1 . . . Bi )
and define vectors S1 , . . . , St by Sj = Mvj for j = 1, . . . , t.
Show that S1 , . . . , St is a linearly independent list of vectors.
hint: We note two facts. First, the only solution to My = 0 is y = 0.
Second, (using the Einstein summation convention here) if xj vj = 0 then
xj = 0 for all j.
With these two facts in hand, we suppose xj Sj = 0. But then
so xj vj = 0 so xj = 0 for all j.
0 = xj Sj = xj Mvj = M xj vj
This leads to the desired conclusion.
22. The Span of Column Vectors.
In this section we are going to find out how to see if a finite list of column
vectors is independent, and how to select a basis from among them if they
are dependent.
We have now seen other types of vector spaces than Rn . After we discuss
coordinates you will see that the approach of this section (and the next) will
apply to these other spaces too.
Suppose given column vectors C1 , . . . , Cn . Our first goal is to determine
dependency among these vectors. Dependency is found through a nontrivial
solution to the equation
x1 C1 + · · · + xn Cn = 0.
This corresponds to the matrix equation
( C1 · · · Cn )x = 0.
If we solve the equation by doing row reduction to rref there are two
possibilities.
First, all columns could be pivot columns, in which case the only solution
is the “xi = 0 for all i ” solution, and the columns C1 , . . . , Cn form an
independent set and so are a basis for Span({ C1 , . . . , Cn }).
The other case is where some of the columns are pivot columns but others are not. This divides the variables into two groups: pivot variables
OVERVIEW OF REAL LINEAR ALGEBRA
67
xi1 , . . . , xik and free variables xj1 , . . . , xjn−k . The free variables can be
chosen arbitrarily, and then the pivot variable values are determined by formulas involving these free variables as we have seen. But the point here
is that there are nontrivial solutions to the matrix equation above, so the
original list of columns is a dependent list.
But we can actually get more out of this.
If we have any choice of free variables and determine the pivot variables
from them we find, moving free variable terms to the right, that
xi1 Ci1 + · · · + xik Cik = − xj1 Cj1 − · · · − xjn−k Cjn−k .
By choosing free variable xj1 = −1 and all other free variables to be 0
we find that we can write the “free column” Cj1 in terms of the “pivot
columns.” That means it can be deleted from the list of columns without
affecting the span of the columns.
We can say the same thing about all the free columns: they can all be
deleted without affecting the span of the columns.
Having done the deletion of all columns associated with free variables and
recreating the matrix equation above, we find that all columns in the row
reduced matrix are pivot columns, and therefore the reduced list is a basis
of the span of all the columns.
It is worth noting that the dimension of this vector space is the number
of nonzero rows in the rref of the original (as well as the shortened) matrix
of column vectors: there is one nonzero row for each pivot. The rref of the
shortened matrix is a block matrix with identity matrix Ik on top of a zero
block.
Let’s apply this setup to a specific situation.
Let S be the set of vectors
S = { (1, 3, 6, 2), (4, 3, 6, 2), (5, 6, 12, 4), (6, 9, 18, 6), (8, 1, 7, 1) }.
This generates matrix equation
   

 x1
0
1 4 5 6 8  2  
x
3 3 6 9 1   3  0

   
6 6 12 18 17 x4  = 0 .
x  0
2 2 4 6 1
0
x5
Hitting this with the rref stick yields
   

 x1
0
1
0 1 2 0  2  
x
  0
0 1 1 1 0

 x3  = 0 .
  
0 0 0 0 1 
x4  0
0 0 0 0 0
0
x5
68
LARRY SUSANKA
After deleting the superfluous free columns
{ (5, 6, 12, 4), (6, 9, 18, 6) }
we have a basis for Span(S), given by the pivot columns
{ (1, 3, 6, 2), (4, 3, 6, 2), (8, 1, 7, 1) }.
In case you need an explicit expression of the dependency, we have
 1
 
 
x
−1
−2
x2  = x3 −1 + x4 −1 .
0
0
x5
Choosing x3 = −1 and x4 = 0 we have C3 = C1 + C2 .
Choosing x3 = 0 and x4 = −1 we have C4 = 2C1 + C2 .
It is worth noting the relation between the entries in the vector to the right
of the free variable and the specific combination of pivot vectors generating
the dependency of a “free” vector in terms of the “pivot” vectors.
23. A Basis for the Intersection of Two Vector Subspaces of Rn .
Suppose { C1 , . . . , Ck } is an ordered basis for a subspace V of Rn and
{ B1 , . . . , Bi } is an ordered basis for a subspace W of Rn . We know that
V ∩ W is a vector subspace of Rn , but how can we determine what it is?
How do we represent it?
A vector v is in V ∩ W exactly when it can be represented in terms of both
bases listed above. Specifically, there are numbers x1 , . . . , xk and y 1 , . . . , y i
for which
v = x 1 C 1 + · · · + x k C k = y 1 B1 + · · · + y i Bi .
So we are seeking all solutions to x1 C1 + · · · + xk Ck − y 1 B1 − · · · − y i Bi = 0.
This corresponds to the block matrix equation
Mz = (C1 · · · Ck − B1 · · · − Bi )z = 0
where z is the column (x1 , . . . , xk , y 1 , . . . , y i ).
In rref (M), all of the variables x1 , . . . , xk must be pivot variables because
{ C1 , . . . , Ck } is a linearly independent set. The row reduction on M is
performed one column at a time, moving from left to right. The exact same
elementary row operations (left multiplication by elementary matrices) that
clean up the first k columns of M will do the same for the block matrix
(C1 · · · Ck ). Since all k columns in this last matrix are pivot columns, these
identical first k columns must be pivot columns in rref (M) as well.
If all of the y 1 , . . . , y i are pivot variables too then the only solution is the
all-zero solution and the intersection V ∩ W is {0}.
OVERVIEW OF REAL LINEAR ALGEBRA
69
On the other hand if any of the y 1 , . . . , y i are free then V ∩ W will not
be trivial.
Suppose y j1 , . . . , y jt are the free variables among the y 1 , . . . , y i . The solution coefficient values x1 , . . . , xk and also y 1 , . . . , y i can be written in terms
of them.
A solution expression y 1 B1 + · · · + y i Bi is a generic member of the intersection.
For each q between 1 and t let vq be the vector in Ri whose entries are
the solution coefficients y 1 , . . . , y i obtained by choosing y jq = 1 and all the
other free variables equal to 0. The pivot variables among the y 1 , . . . , y i are
determined by these free variable choices.
v1 , . . . , vt is a linearly independent list of vectors: each of these columns
has entry value of 1 in a row where all the rest have a 0.
23.1. Exercise. Show that y = a1 v1 + · · · + at vt is the vector containing the
list of solution coefficients corresponding to the choices y j1 = a1 , . . . , y jt =
at .
Appealing to Exercise 21.12 we find that Sq = (B1 · · · Bi ) vq for q =
1, . . . , t is a linearly independent list of t members of the intersection. And
any member of the intersection can be written as
v = y 1 B1 + · · · + y i Bi = y j 1 S 1 + · · · + y j t S t
for choices of the free variables y j1 , . . . , y jt .
So Span({S1 , . . . , St }) = V ∩ W , which therefore has dimension t. We
have, along the way, proved the following important result for any subspaces
V and W of Rn :
dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ).
Here is an example of the things we built above.
We are given two subspaces of R4 as the span of ordered bases:
V = Span({ (1, 3, 6, 2), (4, 3, 6, 2), (8, 1, 7, 1) })
and
W = Span({ (9, 3, 0, 2), (0, 3, 1, 2), (5, 1, 7, 9) }).
We want to determine and, if nontrivial, find a basis for V ∩ W .
Create the matrix

1
3
M = 
6
2
4
3
6
2

8 −9 0 −5
1 −3 −3 −1
.
7 0 −1 −7
1 −2 −2 −9
70
LARRY SUSANKA
This matrix has six columns and four rows, so we know right now, without
further work, that there will be at least two free variables and that V ∩ W
will be nontrivial.


1
0 0 0 −49/18 −263/3
0 1
0 0
23/9
347/3 
.
rref (M) = 
0 0 1
0
0
−25 
1
0 0 0 5/6
20
This means:
5
y 1 = − y 2 − 20y 3
6
2
2
y =y
y3 = y3.
So a generic member of the intersection can be written in parametric form
as
v = y 1 B1 + y 2 B2 + y 3 B3
 
 
 
5
0
9
 
 
 
5 2
3 1
2 3
3 3
= − y − 20y   + y   + y  
7
1
0
6
9
2
2

   

   
9
0
9
5











5
3
3
3
  +   + y 3 −20   + 1
= y2 
−
 6 0 1

0 7
2
2
2
9
 


175
−45



y2 
3

 − y 3  59  .
(y 2 and y 3 are free parameters.)
=




−7
6
6
31
2
So the two vectors in the last line form a basis for V ∩ W .
Using these two vectors as rows in a 2 × 4 matrix and reducing to rref
produces the (very) marginally better basis


 
636
0





 636 
0
 .

, 
−75 147 





349
−5
OVERVIEW OF REAL LINEAR ALGEBRA
71
24. ✸ Solving Problems in More Advanced Math Classes.
Students in a Linear Algebra class14 typically have a bit of an emotional
roller-coaster ride as the course progresses. You are all able calculators, and
have been rewarded for that in your past math classes. That is why you are
here. And the first few weeks play to that strength. You are learning how
to do a bunch of calculations, mostly related to things you have seen before
such as vectors and angles and solving systems of linear equations.
But now the game is changing. We are introducing more abstract ideas.
In many of the problems you are asked if one or more of the definitions apply
to an object given in the problem. You might be asked if a set with certain
operations is a vector space, a subspace, or if a function is linear. You might
be asked if a set of vectors spans a given vector space, or if it is a dependent
or independent set of vectors. You might be asked to identify a nullspace,
or a columnspace or an eigenspace, or to change bases from one basis to
another. All of these require a calculation step . . . but which calculation are
you to do? And why? And what if (horrors) there are two different different
kinds of steps you must take, each with their own calculation, before you
can draw a conclusion? Typically, for the middle weeks in a class like this,
there is a lot of angst and frustration. Scores on quizzes and tests might
drop. Then (for almost everyone, I hope) there is a sequence of those “AHA”
moments we all like so much and students become much more successful and
efficient, just in time for the last test and the final.
In all math classes, from this point onward, there is a sequence of steps
you take in analyzing a problem, assuming that the techniques of the subject
apply.
In Linear Algebra, for instance, you must identify the vector space of
interest and specify unambiguously the linear functions of interest. If there
is additional structure, such as a means of determining angles in the space,
that needs to be identified or specified. Application subject areas each have
their own vocabulary and issues they care about but these refer to common
facts about matrices and the like, in disguise. The question you would like
to answer is rephrased in terms used in this class.
You must clarify: 1) What is the question? If it takes several steps to
answer, separate the steps so you don’t get lost. Don’t try to do them
all at once. For instance to show a set is a basis, you must show it is
linearly independent and spans. These are very different properties. 2)
What technique will be used? Often it involves solving some linear system
of equations. Identify this technique. Remember that sometimes there are
several techniques you could use to show the same thing. Sometimes one
way is much easier than another! 3) Carry out the calculation. 4) Draw
an explicit conclusion. Remind yourself in writing of what you have just
14Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
72
LARRY SUSANKA
accomplished by the calculation. Sometimes the calculation takes a long
time and it is easy to lose track of why you did it. 5) If it is a multi-step
question, sum up after you have verified all the parts of the argument and
make the concluding assertion. It is all about communication—with others
and with yourself.
Whew. Lots to do, but believe it or not people can learn to do this. It’s
also kind of fun after you get the hang of it. And learning to isolate assumptions, identify the question, select a technique, carry out the calculation and,
finally, draw a valid conclusion in this rather pristine environment will make
you a more powerful problem solver in other areas of study as well.
25. Dimension is Well Defined.
The proof of this result, found below, is one of two proofs in these notes
that you should think about enough so you thoroughly understand it and
could recreate it.
Suppose V is a vector space and v1 , . . . , vn is a list of vectors that span
V : that is, a list of n different vectors from V so that every vector in V can
be written as a linear combination of the vectors on this list.
We will show that any list z1 , . . . , zn+1 of n + 1 vectors from V must be
dependent. In particular, we will show
there are constants a1 , . . . , an+1
Pn+1that
j
which are not all 0 and for which j=1 a zj = 0.
This will imply that no basis for V can have more than n members, and
one can deduce (can you?) that if v1 , . . . , vn is a basis (that is, it not only
spans V but is an independent list of vectors too) no basis can have fewer
than n members either.
On with the proof. We don’t use the Einstein summation convention here:
keeping track of the length of each sum is part of the discussion.
Since v1 , . . . , vn spans V , for j = 1, . . . , n + 1 there are constants cij for
P
which zj = ni=1 cij vi .
The matrix C formed from these
 1
c1

 c2
C = cij =  .1
 ..
cn1
constants
c12
...
2
c2
...
..
.
cn2
...

c1n+1
c2n+1 

.. 
. 
n
cn+1
has more columns than rows: it is an n by n + 1 matrix. This means that
the left-multiplication linear transformation
f : Rn+1 → Rn
defined by
f (w) = Cw
is not one-to-one. In particular there is a nonzero solution a ∈ Rn+1 to the
equation Ca = 0.
OVERVIEW OF REAL LINEAR ALGEBRA
73
Upon examining the n entries of Ca, which are all 0, we see that


n+1
n+1
n
n
n+1
n
X
X X
X
X
X

a j zj =
aj
cij vi =
aj cij  vi =
0 vi = 0.
j=1
j=1
i=1
i=1
j=1
i=1
This was the result we were looking for.
26. Coordinates.
If S = { v1 , · · · , vn } is an ordered basis for V and p = a1 v1 + · · · + an vn
we define the S-coordinates of p to be
 
a1
 a2 
 
[p]S = [ a1 v1 + · · · + an vn ]S =  ..  = a1 e1 + · · · + an en .
.
an
Each vector in V has unique S-coordinates.
The function [·]S : V → Rn is called the S-coordinate map.
No matter what the vector space was at the outset all questions bearing
on the vector structure on the space can be translated to a question involving
these S-coordinates and the standard ordered basis En = { e1 , . . . , en } for
Rn , our paradigmatic vector space.
After we answer the corresponding question in Rn we transfer the answer
back to V using the S-coordinate map.
There is a word for this situation in mathematics: isomorphic, meaning
“same form.” Any finite dimensional real vector space is isomorphic to Rn
for some unique integer n.
We also note an obvious fact: if x is in Rn the coordinates for x with
basis En , which we could denote [x]En , is just x itself.
Up to this point we had only one interpretation of coordinates: those in
terms of the only basis we knew about, En . There was no need to specify
the basis, the “language we were speaking.” That is no longer the case.
When there is more than one basis in sight, you must take explicit
note of the basis of V to which the coordinates refer. Otherwise there
is no way to interpret the meaning of the coordinates, and that kind of
sloppiness is a huge source of confusion.
We bring up now a point that bears remembering and which we will use.
Suppose v1 , . . . , vk are members of V and m1 , . . . , mk are members of Rn
and these vectors are related by [vi ]S = mi for a basis S for V and all i.
74
LARRY SUSANKA
Then each member of the span of { v1 , . . . , vk } is associated with exactly one member of the span of { m1 , . . . , mk }, and conversely, by the
S-coordinate map.
Entire subspaces of V can be moved to Rn and back, and subspaces of
Rn are associated with subspaces of V . The set { v1 , . . . , vk } is linearly
independent exactly when { m1 , . . . , mk } is linearly independent.
26.1. Exercise. (i) Find coordinates [w]T where
t
et − e−t
e + e−t
t
−t
w = 3e − 5e + 1 and T =
.
, 1,
2
2
(ii) Find coordinates [v]S where
7 2
0 0
0 1
0 1
1 0
v=
and S =
,
,
,
.
−1 3
0 1
1 0
0 1
0 0
27. ✸ Position Vectors and Coordinates.
We now consider an example of particular importance to applications,
the relationship between coordinates and physical space15. We suppose the
arrow-vectors in space form a three dimensional vector space, as they certainly seem to do by our common experience. If someone disagrees it is up
to this critic to point out where that assumption leads us to error. If the
critique is validated, and only at that point, the rules of our science-game require us to revise our opinion. But until then, and as long as the assumption
continues to lead us to verifiable—and verified—conclusions, we will carry
on. This assumption has been found to be a (very) useful approximation.
When we want to identify the arrow-vectors in space with R3 we commonly select an explicit ordered basis S = { v1 , v2 , v3 } of three arrowvectors which incorporate our sense of three independent directions. These
arrow-vectors might be perpendicular to each other in space if that is convenient, though we do not require it. These vectors are to be our “measuring
sticks” in three different directions and any arrow-vector can be written as
p = p1 v1 + p2 v2 + p3 v3 which we associate to [p]S = (p1 , p2 , p3 ) in R3 .
Recall from Section 6 that if we want to use a position vector to describe
a specific location A in space we first must identify a base point E in space,
the corner of your laboratory or some other convenient place.
Let’s suppose the displacement vector from E to A is the vector p from
above.
15Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
OVERVIEW OF REAL LINEAR ALGEBRA
75
Then the position vector to A with base point E is a specific arrow that
starts at E and ends at A denoted
AE = EE + (AE − EE) = EE + p.
If someone else has a different idea of where the base point should be
located in space, as is bound to occur from time to time, we will need
different position vectors. Let’s say their base point is a location B in
space, and the displacement vector from B to E is given by
EE − BE = EB − BB = c = c1 v1 + c2 v2 + c3 v3 .
According to our earlier discussion on page 28,
AB = EB + (AB − EB) = BB + (EB − BB) + (AB − EB)
= BB + c + p.
If we are to associate position vectors in the world with position vectors
in Rn we must get the base points involved in an explicit way.
We intend to associate base point E in the world with the standard origin
in R3 . To make this association explicit in the notation, we use 0E to denote
the position vector of (0, 0, 0) with base point (0, 0, 0) in R3 .
For base point E in the world, we define for position vector AE
[ AE ]S, E = [ EE + (AE − EE) ]S, E
= 0E + [ AE − EE ]S = 0E + [p]S .
We associate the position vector EE in the world with position vector 0E
in R3 . For any other position vector AE, we add to that the S-coordinate
map applied to the displacement of A from the base point E.
It is a straightforward matter to compute the position vector in R3 associated with the position vector of A using a new base point B.
[ AB ]S, B = [ BB + (AB − BB) ]S, B
= [ BB + (EB − BB) + (AB − EB) ]S, B
= 0B + [ c + p ]S = 0B + [c]S + [p]S .
We reiterate: it makes no sense to talk about a position vector to a point
in the world represented as a position vector in R3 , unless one specifies both
the ordered basis S in the world and the base point E in the world. You can
do it by repeatedly reminding people of basis and base point in words, or
by use of an explicit notation embedded in the calculations where the issue
will arise.
The function [·]S, E that implements the association using notation is
called a position map.
76
LARRY SUSANKA
28. Linear Functions Between Vector Spaces.
Suppose f is a function with domain V and range in W , denoted f : V →
W . If V and W are vector spaces we say f is linear when:
f (u+cv) = f (u)+c f (v) for all members u and v in V and any constant c.
As an example, the S-coordinates functions from the last section are linear.
If A is an ordered basis of V and x = xi vi is a representation of x as a
linear combination of members of A and if f is linear then f (x) = f (xi vi ) =
xi f (vi ). We conclude:
A linear function is completely determined by what it does to any
spanning subset of its domain space. In particular, any two linear functions that agree on a basis are actually the same function.
The kernel of f is the set of those v in V for which f (v) = 0. We will
denote this set ker(f ).
The image of f is the set of those w in W which are f (v) for some v in
V . We will denote this set image(f ).
The kernel is the set of vectors “killed” by f , while the image is the
collection of all “outputs” of f . Both kernel and image are vector subspaces,
of V and W respectively.
If W = V , so f : V → V , we define eigenvectors and eigenvalues for f .
The number λ is called an eigenvalue for f if there is a nonzero vector
x for which f (x) = λx. The nonzero vector x is called an eigenvector for
λ and f . The set of all x for which f (x) = λx (the set of eigenvectors for
λ plus the zero vector) is a subspace of V , called the eigenspace for f and
eigenvalue λ.
If A is a basis for V and C is a basis for W there is a matrix that
corresponds to the effect f has on the A-coordinates of vectors in V in
terms of C-coordinates of vectors in W .
Specifically if A = { a1 , . . . , an } and W has dimension m we define:
MC←A =
[f (a1 )]C · · · [f (an )]C
(Each [f (ai )]C is an m by 1 column.)
MC←A is called the matrix of f with respect to bases A and C.
For any x = xi ai in V ,
MC←A [x]A = [f (x)]C .
OVERVIEW OF REAL LINEAR ALGEBRA
77
The proof is, again, nothing deeper than a calculation:
 1
x
 2
x 
MC←A [x]A = [f (a1 )]C · · · [f (an )]C  .. 
 . 
xn
= xi [f (ai )]C = [f (xi ai )]C = [f (x)]C .
A picture illustrating this situation is found below:
W ✛
Commuting Square
f
[·]C
(reversible)
V
[·]A
(reversible)
❄
MC←A
Rm ✛
❄
Rn
A diagram as above is called a commutative diagram by mathematicians. It means that if you start with a member of a set at the “corner” V
and apply the functions on the edges as indicated by the arrows you will end
up with the same member of Rm , whether you follow the path across the
top and down by [·]C , or if you go down first by [·]A and across by using the
matrix MC←A . Diagrams of this kind help us visualize (and so remember)
the relationships among functions. If an arrow is indicated to be reversible,
we mean that the function it connotes has an inverse function and the modified diagram with the function replaced by its inverse (and arrow reversed
at that place) is commutative too.
We remark that if any m by n matrix M is thought of as a linear function
from Rn to Rm by left multiplication, and domain and range have the standard bases En , then MEm←En = M . There was no need for this notational
specificity when the only bases we had were the standard bases.
But now we must be explicit about the basis in both domain and range:
otherwise there is no way to know what the entries in the matrix mean.
Solutions to f (v) = b can be found by solving MS←S x = [b]S and transferring the answer back to V . The image and kernel of f can be found the
same way.
The characteristic polynomial is defined for f : V → V by using any
basis S for both domain and range to create a matrix MS←S for f . The characteristic polynomial for f is then defined to be the characteristic polynomial
for MS←S .
The eigenvalues for MS←S , and the vectors in V associated with the eigenvectors for this matrix are the eigenvalues and eigenvectors for f .
Though the matrix MS←S depends on the choice of basis S, the characteristic polynomial, and the eigenvalues and eigenvectors for f in V found
78
LARRY SUSANKA
using this matrix, will not depend on this choice. Some of that is obvious
now, but it will be completely clear after we discuss matrices of transition
from one basis to another.
28.1. Exercise. Let S be the ordered basis { t, et , e2t } for function space
V and let T be ordered basis { et , e2t } for function space W . Consider the
d2
linear function dt
2 : V → W . Find MT←S for this function. What is the
kernel of this function? What is the image of this function?
28.2. Exercise. Let S be the ordered basis { (1, 2, 3), (1, 0, 1), (0, 1, 5) } for
R3 and define the linear function F : R3 → R3 by


33 −35 7
F (x) = Ax where A = −10 −26 10 .
−1 −85 41
First give ME3←E3 for this function. Then find MS←S for this function.
28.3. Exercise. (i) Is the function W : P (t) → R given by
Z 1
(3 + t)g(t) dt
W (g) =
0
linear? Prove it is or show why not.
(ii) Is the function K : P (t) → R given by
Z 1
2 + g(t) dt
K(g) =
0
linear? Prove it is or show why not.
28.4. Exercise. Is the function H : R2 → R2 given by
W (x) = (x1 x2 , x1 + x2 )
linear? Prove it is or show why not.
28.5. Exercise. If f : V → V is linear and λ is an eigenvalue for f show
that the eigenspace for this eigenvalue is a vector subspace of V .
d
28.6. Exercise. Find the matrix MT←S of dt
: V → W where V has ordered
t
t
basis S = { te − e , Cos(t) } and W has ordered basis T = { Sin(t), tet }.
29. Change of Basis.
Suppose A = { a1 , . . . , an } and B = { b1 , . . . , bn } are two ordered bases
for a vector space V . We do not rule out the possibility that V = Rn and
A or B is the standard basis. We remind the reader that [x]En = x, where
En is the standard basis of Rn . This is a pretty common situation.
In any case, we will let PB←A be the n by n matrix
PB←A = [a1 ]B . . . [an ]B .
OVERVIEW OF REAL LINEAR ALGEBRA
79
PB←A is called the matrix of transition from A to B. Its columns
are the coordinates of the “old” A vectors in terms of the “new” basis B.
You can think of it as an “automatic translator” from the A-language
to the B-language.
It is a fact that
[x]B = PB←A [x]A .
The proof: Suppose x = xi ai . Then
i
PB←A [x]A = PB←A [x ai ]A =
[a1 ]B

x1
 2
x 
. . . [an ]B  .. 
 . 

xn
= xi [ai ]B = [xi ai ]B = [x]B .
A picture indicating what is happening here is found below:
V
Commuting Triangle
(all arrows reversible)
[·]B
[·]
❅ A
❘
❅
✠ P
n ✛ B←A
R
Rn
Since PA←B PB←A [x]A = [x]A , the product PA←B PB←A is the identity matrix.
−1
PB←A
is the matrix of transition PA←B from B to A,
so its columns are the A-coordinates of the vectors in B.
One more little time-saver: Sometimes there will be a nice basis A, but
two more useful bases B and C. The bases B and C have to be given to
you somehow. Usually both are given in terms of some easy basis such as
A. If you want to find the matrix of transition PC←B you can proceed via
the basis A by
−1
PC←B = PC←A PA←B = PA←C
PA←B .
Generally in applications of this material to specific examples you should
be very cautious before launching into some horrifying messy calculation.
You want to off-load this drudgery to hardware, where it belongs.
Almost always there is a basis in which a linear transformation has
an easy-to-compute matrix. Almost always there are easy-to-use coordinates close at hand. Use them!
Enter the transition matrices into an electronic assistant and maneuver
around using the shaded formulae above and below.
80
LARRY SUSANKA
29.1. Exercise. Consider the polynomial space P2 (t) with ordered bases
A = { 3t + 1, 5t, t + t2 } and B = { 1, t, t2 } and C = { 1 − t2 , 1 − t, 1 + t2 }.
Find the matrix PA←C that converts C-coordinates to A-coordinates.
30. Effect of Change of Basis on the Matrix for a Linear Function.
Finally, we consider changes in the matrix of a linear transformation
f : V → W under change of basis.
Suppose we wish to change from basis A to basis B in V and from basis
C to basis D in W . Let PB←A be the matrix of transition from A to B in V
and let PD←C be the matrix of transition from C to D in W .
Suppose MC←A and MD←B are the matrices of f with respect to the implied
coordinates. These matrices involve the coordinates of members of V and
W with respect to certain bases in V and W . Specifically, if x is a generic
member of V , we find MC←A [x]A = [f (x)]C and MD←B [x]B = [f (x)]D . So:
PD←C MC←A [x]A = PD←C [f (x)]C = [f (x)]D = MD←B [x]B = MD←B PB←A [x]A .
This implies PD←C MC←A = MD←B PB←A and so
−1
MD←B = PD←C MC←A PB←A
= PD←C MC←A PA←B .
MD←B = PD←C MC←A PA←B .
This shaded equation, applied to coordinates [x]B is
MD←B [x]B = PD←C MC←A PA←B [x]B .
The internal dialog you use to describe this equation (from far right to
left) is:
“Change from B-coordinates of x to A-coordinates. Do the f -thing to
these coordinates, producing C-coordinates for f (x). Then switch to Dcoordinates. The result is just as if you had used MD←B on [x]B directly.”
A picture that could be useful here is the following.
PA←B
Rn ✛
❦
◗
◗
MC←A
Commuting Prism
(horizontal arrows reversible)
✑
◗
[·]A
❄
V
f
◗
[·]C
✸
✑
✑
[·]B
❄✑
W
MD←B =PD←C MC←A PA←B
❄
✲ Rm
✸
✑
✑
PD←C
Rm
❦
◗
◗
Rn
[·]D
OVERVIEW OF REAL LINEAR ALGEBRA
81
From a different standpoint, any n × n invertible matrix P whatsoever
could be construed as a matrix of transition, so if V is an n-dimensional
vector space and MA←A is the matrix of a linear function f : V → V in a
basis A then the matrix P −1 MA←A P will be the matrix of f in a different
basis.
Matrices that are related this way are called similar. Similar matrices
share many properties. For instance, the characteristic polynomials of similar matrices are identical, so they share eigenvalues. See Exercise 39.2 for
more.
30.1. Exercise. Define F : M2×2 → M2×2 by F (X) = AXB where
7 2
1 0
A=
and B =
.
−1 3
3 2
Find the matrix MS←S of F with
0 0
0
S=
,
0 1
1
respect to ordered
1
0 1
1
,
,
0
0 1
0
basis
0
.
0
Does F have eigenvalues? If so, find bases for each eigenspace.
d
f . Create
30.2. Exercise. Define K : P2 (t) → P2 (t) by K(f ) = 3f + dt
matrices MS←S and MT←T where
S = 1, t, t2
and T = 1 − t − t2 , t + 1, 3 − 2t2 .
Does K have eigenvalues? If so, find bases for each eigenspace.
31. ✸ Effect of Change of Basis on a Position Vector.
We saw in Section 27 what is involved when we create coordinates for
position vectors, and how those coordinates change when we adjust the
base location. In this section16 we examine how coordinate changes affect
position vectors.
We also generalize slightly: we suppose the underlying space has displacements which form an n dimensional vector space, rather than the 3
dimensional setting of that earlier section.
Let’s get specific.
Suppose the displacements in our world have two different ordered bases
S = { v1 , v2 , . . . , vn } and
T = { w1 , w2 , . . . , wn }
We are going to be creating position vectors in our world. We have two
different possible base points in mind for our position vectors: E and B.
16Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
82
LARRY SUSANKA
Let p denote the displacement vector from E to location A and let c
denote the displacement vector from base point B to base point E.
So the position vector to location A relative to base point E is
AE = EE + (AE − EE) = EE + p.
On the other hand, the position vector to location A relative to base point
B is
AB = BB + (EB − BB) + (AB − EB) = BB + c + p.
As we did in R3 we let 0B denote the position vector of (0, . . . , 0) using
(0, . . . , 0) as base point when we intend to represent position vectors in our
world with base point B using position vectors in Rn .
So comparing two different position maps, one with base point E and
basis S, the other with base point B and basis T we have
[ AE ]S, E = 0E + [p]S
and
[ AB ]T, B = 0B + [c]T + [p]T .
These two position vectors in Rn represent the same point in our world but
they will, of course, generally have different coordinates.
Now let’s suppose an astronaut at location E using basis S reports observations of something happening at location A to home base at location B.
Home base uses basis T .
The astronaut reports the incident by radio, sending, to home base, the
displacement [p]S from E to A. The folks there want to interpret what that
means.
Home base knows where the astronaut is: at position vector 0B + [c]T ,
using B as base point and T as basis.
Changing the reported coordinates from basis S to basis T using matrix
PT←S gives
[p]T = PT←S [p]S .
So home base knows the position coordinates of the incident are
[ AB ]T, B = 0B + [c]T + [p]T = 0B + [c]T + PT←S [p]S .
As a minor embellishment, it is entirely possible that it might be easier
for the astronaut to know where she is in relation to base than for base to
know where the astronaut is. In that case the astronaut would report the
base displacement [−c]S along with [p]S . And then
[ AB ]T, B = 0B + PT←S [c]S + PT←S [p]S .
OVERVIEW OF REAL LINEAR ALGEBRA
83
32. ✸ Effect of Change of Basis on a Linear Functional.
Let’s consider for a moment17 a linear function f : V → R and basis A
for V . We noted before that the matrix for f is a row matrix, a member of
Rn∗ .
ME1←A = ( f (a1 ) f (a2 ) . . . f (an ) )
f (x) = ME1←A [x]A .
Changing bases to new basis B for V is done using the change of basis matrix
PB←A :
f (x) = ME1←A [x]A = ME1←A PA←B (PB←A [x]A ) = ME1←B [x]B .
The point is, to change to coordinates in terms of B you left multiply the
coordinates of vectors by PB←A but you right multiply a functional, a row
matrix, by PA←B .
If you take the point of view that functionals are more important than
vectors you would call PA←B the matrix of transition from basis A to B, not
the matrix of transition from basis B to A as we would do. That vocabulary
is indeed used in some texts and it can be confusing, but the usage is not
totally illogical.
The confusion is exacerbated when we (incorrectly but conveniently) think
of functionals as dot product against vectors. As long as we move from one
T
orthonormal basis to another all is well, because in that case PA←B = PB←A
and the erroneous idea is not revealed. But it is still wrong in principle,
even if it gives you the answer you want in carefully chosen cases.
The problem comes up often in physics and engineering. For instance an
electric field is usually given as a vector, but in fact it is a functional. Let’s
think of an example that helps illustrate the distinction.
The physical scenario we have in mind consists of two types of items in
the air in front of your eyes with you seated, perhaps, at a desk.
First, we have actual physical displacements, say of various dust particles
that you witness. These displacements are what we usually think of as
vectors.
Second, we have stacks of flat paper, numbered like pages in a book, each
stack having its own characteristic uniform “air gap” between pages, which
are parallel throughout its depth. We make no restriction about the angle
any particular stack must have relative to the desk. We consider these pages
as having indeterminate extent, perhaps very large pages, and the stacks to
be as deep as required, though of uniform density.
The magnitude of a displacement will be indicated by the length of a line
segment connecting start to finish, which we can give numerically should we
decide on a standard of length. Direction of the displacement is indicated
17Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
84
LARRY SUSANKA
by the direction of the segment together with an “arrow head” at the finish
point.
The magnitude of a stack will be indicated by the density of pages in the
stack which we can denote numerically by reference to a “standard stack” if
we decide on one. The direction of the stack is in the direction of increasing
page number. This is the example that matches the electric field. The
“pages” are equipotential surfaces, more dense where the field is big.
We now define a coordinate system on the space in front of you, measuring distances in centimeters, choosing an origin, axes and so on in some
reasonable way with z axis pointing “up” from your desk.
Consider the displacement of a dust particle which moves straight up 100
centimeters from your desk, and a stack of pages laying on your desk with
density 100 pages per centimeter “up” from your desk.
If you decide to measure distances in meters rather than centimeters, the
vertical coordinate of displacement drops to 1, decreasing by a factor of
100. The numerical value of the density of the stack, however, increases to
10, 000.
When the “measuring stick” in the vertical direction increases in length
by a factor of 100, coordinates of displacement drop by that factor and
displacement is called contravariant because of this.
On the other hand, the stack density coordinate changes in the same
way as the basis vector length, so we would describe the stack objects as
covariant.
Though we have discussed the geometrical procedure for defining scalar
multiplication and vector addition of displacements, we haven’t really shown
that these stack descriptions can be regarded as vector spaces. There are
purely geometrical ways of combining stacks to produce a vector space structure on stacks too: two intersecting stacks create parallelogram “columns”
and the sum stack has sheets that extend the diagonals of these columns.
But the important point is that if stacks and displacements are to be
thought of as occurring in the same physical universe, and if a displacement
is to be represented as a member x of R3 , then a stack must be represented
as a linear functional, a member M of R3∗ . You cannot represent both (at
the same time) as members of R3 . Otherwise they would have to change the
same way when you change yardsticks. And yet they don’t.
There is a physical meaning associated with the number M x.
It is the number of pages of the stack corresponding to M which cross the
shaft of the displacement corresponding to x, where this number is positive
when the motion was in the direction of increasing “page number.”
OVERVIEW OF REAL LINEAR ALGEBRA
85
It is obvious on physical grounds that this number must be invariant: it
cannot depend on the vagaries of the coordinate system used to calculate
it. That is the meaning of
M x = M P −1 P x = M P −1 (P x)
and why coordinates of functionals and vectors must change in complementary ways.
33. Effect of Change of Basis on the Trace and Determinant.
Any linear transformation f : V → V has a square matrix MS←S with
respect to ordered basis S for V . If you use another basis T then there is a
square matrix P with MT←T = P MS←S P −1 . It follows
det(MT←T ) = det(P MS←S P −1 ) = det(P ) det(MS←S ) det(P −1 ) = det(MS←S ).
That means you can define the determinant for f , denoted det(f ), to be
the determinant of the matrix of f with respect to any convenient basis.
An identical argument shows that the characteristic polynomial for any
matrix for f does not depend on S. It is called the characteristic polynomial for f .
Finally, let MS←S = (ai j ) and MT←T = (bi j ) and P = (pi j ) and P −1 =
(qi j ).
Recall the Kronecker delta function δi j and that In = (δi j ). So qi t qt j =
δi j .
MT←T = P MS←S P −1
so
bi j = pi t at s qs j .
So trace(MT ) = bi i = pi t at s qs i = qs i pi t at s = δs t at s = at t = trace(MS←S ).
We conclude that trace is also invariant under choice of basis, and define
trace for f to be the trace of any matrix for f .
Trace, determinant and characteristic polynomial for a linear transformation can be calculated using the matrix for the linear transformation
in any basis.
Though the matrix used to calculate them will differ from one basis to
another, trace and determinant and characteristic polynomial will not.
86
LARRY SUSANKA
34. Example: How To Use Convenient Bases Efficiently.
The example below is 2-dimensional, so the calculations done by hand
are not too bad. Really, a person would not do this in 2-dimensions, but it
illustrates what is going on quite well. And even in this case it is somewhat
easier to do everything possible with calculator and matrix methods.
As has been emphasized, you should try to organize calculations so you
have to do practically nothing by hand. If you think first before calculating
that can usually be arranged. Remember even the simplest calculation in 2
dimensions is likely to “scale up” very badly to higher dimensions. In three
dimensions and beyond it is so much easier (for us) if we use hardware and
matrices that doing it any other way is actually wrong, unless you have a
very pertinent reason.
Humans are just not good at doing hundreds of arithmetic operations
without error. People also forget what they are doing in the middle of a
mess like that. That is why we have hardware to do this, and keeping it all
straight is exactly why we humans invented and use Linear Algebra.
2 2t
Let’s start with a function space V = Span
t e , te2t
and two
ordered bases
A = t2 e2t + te2t , t2 e2t − te2t
and B = 3t2 e2t + 4te2t , t2 e2t − 6te2t .
2t 2t Define W to be the function space Span
te , e
with two ordered
bases
C = 3te2t − e2t , te2t + e2t
and D = te2t − e2t , 4te2t + 3e2t .
Finally, we consider the linear transformation F : V → W defined by
F (x) =
d x
2x
−
.
t
dt t
For instance, choosing (at random) x = 3t2 e2t + 5te2t we have
F 3t2 e2t + 5te2t = 2 3te2t + 5e2t − 3e2t + 6te2t + 10e2t = −3e2t .
One needs to show F is linear, but also it is not totally obvious that F
is into W : that is, F (x) is always a combination of e2t and te2t and never
yields an output involving t2 e2t . You should be able to show that too.
Anyway, assuming those two tasks are accomplished, let’s determine the
matrices of transition PA←B and PC←D and the matrix MD←B for F . The
problem is all four bases are messy. But all four are given in terms of nice
bases. Remember that V and W are given to you as the spans of ordered lists
of independent simple vectors. This will virtually always be the case: there
will be some nice easy basis around, even if it is not just given explicitly as
it is here. Look for these bases!
OVERVIEW OF REAL LINEAR ALGEBRA
87
We let S be the ordered basis { t2 e2t , te2t } for V and T be the ordered
basis { te2t , e2t } for W . These are the obvious “nice” bases. They will
make our work very easy.
PS←A
1 1
=
1 −1
PT←C
Also
and
3 1
, PS←B =
,
4 −6
3 1
1 4
=
, PT←D =
.
−1 1
−1 3
d
F (s1 ) = F t2 e2t = 2te2t − te2t = −e2t = −t2
dt
d
F (s2 ) = F te2t = 2e2t − e2t = 0.
dt
So in case we care, F has nullity 1, and rank 1. (Why?)
0
The T -coordinates of F (s1 ) are
and the T -coordinates of F (s2 ) are
−1
0
.
0
0 0
So
MT←S =
−1 0
and we have everything we need to answer any question involving F and
the four bases. Notice I had to do two easy differentiations and a couple
of subtractions to create the matrix for F . And as for these four matrices
of transition, I had to do nothing except read off the coefficients! This is
typical.
It’s true, I don’t have the answers I wanted quite yet. But all subsequent
work will be done by matrix multiplication using hardware if I want! Here
it is:
PA←B
PC←D
−1 3 1
1 1
=PA←S PS←B =
4 −6
1 −1
7/2 −5/2
=
.
−1/2 7/2
−1 3 1
1 4
=PC←T PT←D =
−1 1
−1 3
1/2
1/4
=
.
−1/2 13/4
88
LARRY SUSANKA
MD←B =PD←T MT←S PS←B =
12/7 4/7
=
.
−3/7 −1/7
−1 1 4
0 0
3 1
−1 3
−1 0
4 −6
So I guess we are done. But what do these matrices mean, what do they
have to do with the functions in V and W , or the calculation given by F ?
Let’s track through with the typical function x = 3t2 e2t + 5te2t that we
looked at above. We calculated that F (x) = −3e2t .
3
23/22
3
=
.
[x]S =
and [x]B = PB←S
5
−3/22
5
3
23
b1 − 22
b2 . (Check
The coordinates on the right mean that x should be 22
this!)
12/7 4/7
23/22
12/7
[F (x)]D = MD←B [x]B =
=
.
−3/7 −1/7
−3/22
−3/7
3
Apparently F (x), which we know to be −3e2t , is 12
7 d1 − 7 d2 . (Once again,
you might want to verify this.) We could also check by calculating
1 4
12/7
0
[F (x)]T = PT←D [F (x)]D =
=
.
−1 3
−3/7
−3
We can actually go farther with this and since we have already created
all these matrices we might as well see what else they can do for us.
MC←A = PC←D MD←B PB←A
−1 1/2
1/4
12/7 4/7
1/4
1/4
7/2 −5/2
=
=
.
−1/2 13/4
−3/7 −1/7
−3/4 −3/4
−1/2 7/2
Also, the A-coordinates of x and the C-coordinates of F (x) are (calculating each two ways, just to check that it “works” and everything consistent)
[x]A =PA←S [x]S = PA←B [x]B
−1 1 1
3
7/2 −5/2
23/22
4
=
=
=
.
1 −1
5
−1/2 7/2
−3/22
−1
[F (x)]C =PC←T [F (x)]T = PC←D [F (x)]D
−1 3 1
0
1/2
1/4
12/7
3/4
=
=
=
.
−1 1
−3
−1/2 13/4
−3/7
−9/4
Let’s check to see if the results match:
1/4
1/4
4
3/4
MC←A [x]A =
=
.
−3/4 −3/4
−1
−9/4
So they agree! Well of course they do. It looks like a lot of calculation up
above. But virtually all of it was just checking to make sure everything was
OVERVIEW OF REAL LINEAR ALGEBRA
89
consistent and writing down the entries of matrices. But I already knew
how that would all work out: I proved the theorems that said it would
be consistent in class. And there was no need to write down the entries
beyond those four easy-basis matrices and two F evaluations on page 87.
Subsequent multiplication was all done by hardware.
34.1. Exercise. Consider the ordered bases
A = { 3t+1, 5t, t+t2 }, B = { t2 , t, 1 } and C = { 3t2 −t+1, t−2, t2 +t−1 }
of P2 (t). Find PB←A and PA←B and PA←C .
34.2. Exercise. Consider the function G : R2 → R2 defined by
1 2
G(x) = Ax where A =
.
5 7
Let S be the ordered basis S = { (1, 2), (1, −1) } and T = { (−1, 7), (1, 5) }.
(i) What are the S-coordinates of (6, 1)?
(ii) What is the matrix MS←S of G in basis S?
(iii) What is PS←T ?
2
2
The matrix of linear
function H : R → R in basis T (in both domain
9 −3
and range) is
.
0 1
(iv) What is the matrix of H in basis S?
34.3. Exercise. Use a nicer intermediary basis to simplify the calculations
to find the matrix MS←S of Exercise 30.1.
35. Bases Containing Eigenvectors.
Suppose f : V → V is linear and B = { b1 , . . . , bn } is a basis for V . If the
matrix MB←B for f is diagonal with diagonal entries λ1 , . . . , λn then each bi
is an eigenvector for eigenvalue λi . The converse is also true.
This is a very convenient type of basis to use if we want to understand
f . Such a basis is said to diagonalize the matrix for f and the process of
finding such a basis is referred to as diagonalization.
If x = a1 b1 + · · · + an bn is any member of V then
f (x) = λ1 a1 b1 + · · · + λn an bn .
It turns out that if the characteristic polynomial for f factors into linear
factors and produces n different eigenvalues you are guaranteed to be able
to find a basis of eigenvectors. That is because a set of eigenvectors for
different eigenvalues cannot be a dependent set, and there must be at least
one eigenvector for each eigenvalue.
A basis of eigenvectors can be referred to as an eigenbasis.
90
LARRY SUSANKA
To prove independence of a set of eigenvectors for different eigenvalues we
show that a dependency leads to a contradiction. Therefore no dependency
is possible.
If there is a dependency, then there would be such a dependency among
a minimum number of eigenvectors for different eigenvalues for f . Let
a1 y 1 + · · · + ak y k = 0
be such a relation involving the fewest possible eigenvectors yi for different
eigenvalues λi .
This obviously cannot involve only one eigenvector: eigenvectors are not
the zero vector. Nor can it involve just two eigenvectors: the same vector
cannot be an eigenvector for two different eigenvalues. So
0 = f (a1 y1 + · · · + ak yk ) = a1 f (y1 ) + · · · + ak f (yk ) = a1 λ1 y1 + · · · + ak λk yk .
But dividing this last equation by a nonzero λi and subtracting it from
the first equation produces a nontrivial relation among fewer eigenvectors.
This is the contradiction we were looking for, and we conclude that any set
containing one eigenvector each for different eigenvalues is independent.
If there are fewer than n different roots of the characteristic polynomial,
then there might or might not be a basis of eigenvectors, but at least we can
include in a preferred basis any eigenvectors that we can find, adding other
vectors to fill out the rest of the basis.
The matrix for f will be relatively simple, even if it is not diagonal, if we
do this. Later in these notes we will discuss several types of “nice” bases for
a given matrix, and in other classes the issue will be revisited for a variety
of purposes.
If the characteristic polynomial has a complex root there will definitely
not be a basis of real eigenvectors. We will consider what can be done to
handle complex eigenvalues later.
When the characteristic polynomial has real factors (all real eigenvalues)
but some of the factors are repeated there might be a basis of eigenvectors
. . . or maybe not. You simply have to find bases for the eigenspaces, put
them together into an independent set, and count. If the set has n vectors
in it, you have a basis.
A linear function that has a basis of eigenvectors is called diagonalizable.
That vocabulary refers to the fact that the matrix of such a function using
a basis of eigenvectors will be diagonal.
35.1. Exercise. The matrices of Exercise 19.3 are diagonalizable. Find
matrices of transition from the standard basis to a new basis which can be
used to convert these matrices to diagonal form.
OVERVIEW OF REAL LINEAR ALGEBRA
91
35.2. Exercise. V = { a + bt + cet + de2t | a, b, c, d ∈ R }.
2 d
d
Define D : V → V by D(f ) = dt
f .
− 2 dt
2 f
Find a basis of eigenfunctions, if possible. (Note: an eigenfunction is
just an eigenvector that happens to be a function.) Find a basis for the
kernel and the image of D.
d
g.
35.3. Exercise. Define F : P2 (t) → P2 (t) by F (g) = t dt
Find a basis of eigenfunctions, if possible. Find a basis for the kernel and
the image of F .
36. Several Applications.
If A is a square matrix we define A0 = In . If n is large An might be
difficult to calculate. But powers Dn , where D is a square diagonal matrix,
are easy to calculate. These powers are also diagonal matrices, with powers
of the eigenvalues λ1 , . . . , λn arrayed along its main diagonal.
If matrix A is diagonalizable this allows us to make sense of f (A) for many
1
real-variable functions f . For instance if P AP −1 = D we can define D 3 to
1
1
be the matrix with λ13 , . . . , λn3 arrayed along the main diagonal. But then
3
1
P −1 D 3 P = P −1 DP = A so we have found a cube root of the matrix A
as well.
More generally, if all of the eigenvalues of diagonalizable
A are inside
P
i then f (D) is
the interval of convergence of power series f (t) = ∞
a
t
i=0 i
diagonal with f (λ1 ), . . . , f (λn ) arrayed along the main diagonal. Then we
define f (A) = P −1 f (D)P . The entries of the partial sum matrices converge
to this matrix.
This can be extended to matrices that are not diagonalizable, but the
calculations are harder and rely on “canonical forms” for the matrices involved. These will be studied in more detail your next Linear Algebra class,
or in a Differential Equations class where they use such matrices to solve
systems of differential equations.
1
1 2
A
3
.
36.1. Exercise. Find A , e and Sin(A) where A =
7 6
36.2. Exercise. Suppose B =
λ 1
.
0 λ
It is easy to show that for variable t,
That means
eBt
λt
e
teλt
=
0 eλt
(Bt)n
n n
λ t nλn−1 tn
.
=
0
λn tn
92
LARRY SUSANKA
36.3. Exercise. Consider the system of differential equations
dx
= 3x + y
x(0) = 5
dt
dy
= 3y
y(0) = 4
dt
This can be converted to the matrix equation
d
3 1
5
x = Ax , A =
and x(0) =
.
0
3
4
dt
The solution is then x(t) = eAt x(0). Calculate this solution.
We now use matrices to study some sequences with recursive definition,
such as the famous Fibonacci sequence.
This sequence starts out with two “seed” values f0 and f1 and defines
other members of the sequence by fn+1 = fn + fn−1 for n ≥ 1.
The behavior of the sequence is somewhat mysterious: as defined here,
you must know all of the previous n values before you can calculate fn+1 .
Our goal here is to find a formula for fn for any seed values that does not
require this.
The sequence can be computed as
n 1 1
fn+1
f1
for n ≥ 0.
=
fn
f0
1 0
1 1
The matrix A =
has characteristic polynomial
1 0
λ − 1 −1
p(λ) = det ( λ I − A ) = det
= (λ − 1)λ − 1 = λ2 − λ − 1.
−1
λ
This can be factored using the quadratic formula, producing two eigenvalues. One is positive and bigger than 1, while the other is negative and
less than 1 in magnitude.
√
√
1− 5
1+ 5
and λ2 =
.
λ1 =
2
2
These two eigenvalues have some interesting and peculiar properties. For
instance λ1 λ2 = −1 and also, since both are roots of the characteristic
polynomial, λ21 = λ1 + 1 and λ22 = λ2 + 1, facts which might help simplify
calculations we have to do.
The Cayley-Hamilton Theorem says that every square matrix “satisfies”
its characteristic polynomial, and in this context that means
0 = p(A) = A2 − A − I = (λ1 I − A)(λ2 I − A) = (λ2 I − A)(λ1 I − A)
where 0 here refers to the zero matrix and I is the 2 by 2 identity matrix.
We don’t need to prove the general theorem here: an easy calculation shows
OVERVIEW OF REAL LINEAR ALGEBRA
93
that it is, indeed, true that 0 = A2 − A − I. We are going to use this fact
to help us produce an eigenvector for each eigenvalue without doing messy
arithmetic, a technique you might want to remember in other contexts.
Since (λ1 I−A)(λ2 I−A) = 0 any vector that is in the range of λ2 I−A must
be killed—sent to the zero vector—by λ1 I − A: that is, it is an eigenvector
for λ1 . The “output” of λ2 I − A as a linear transformation contains both
columns of λ2 I−A. So any nonzero column of λ2 I−A must be an eigenvector
for eigenvalue λ1 .
By an identical argument, any nonzero column of λ1 I − A must be an
eigenvector for eigenvalue λ2 .
We calculate these matrices:
λ1 − 1 −1
λ1 I − A =
=
−1
λ1
λ2 − 1 −1
λ2 I − A =
=
−1
λ2
√
5−1
2
−1
√
−1− 5
2
−1
−1
√
1+ 5
2
−1
√
!
1− 5
2
!
We note here, by way of making mathematical conversation, that we are
in a two dimensional space, and eigenvectors for different eigenvalues must
be linearly independent, and there must be at least one for each eigenvalue.
So for each matrix above the second column has to be a numerical multiple
of the first column, even though it doesn’t really look like it. There can be
no more than two linearly independent vectors in a two dimensional space
such as R2 .
I want to pick an eigenvector for each eigenvalue, and fractions are a bit
of a nuisance, so I will choose twice the second column as an eigenvector in
each case. This gives eigenvectors
−2√
−2√
v1 =
for λ1 and v2 =
for λ2 .
1− 5
1+ 5
It is interesting to note that v1 and v2 are orthogonal: their dot product
is 0.
Let V be the ordered basis { v1 , v2 } which we will call an eigenbasis
because it consists of eigenvectors for A. E2 = { e1 , e2 } is the standard
basis of R2 , as usual.
The matrices of transition between these two bases are PE2←V and PV ←E2 ,
√
−1
−2√
−2√
1 + √5
2
and PV ←E2 = √
.
PE2←V =
1− 5 1+ 5
4 5 −1 + 5 −2
The first one of these is obvious, but the second is the inverse matrix of
the first and can be calculated using one of the methods we have for that
task. Simply checking that it is the inverse matrix is easy enough.
94
LARRY SUSANKA
The linear transformation given by matrix multiplication by A in basis
E2 is given by a different matrix B = PV ←E2 APE2←V when using coordinates
corresponding to basis V . And
λ1 0
B = PV ←E2 APE2←V =
0 λ2
which can be shown either by calculating the product of those three matrices
if you don’t trust what you have done so far, or simply thinking about the
meaning of the eigenbasis V .
It then follows (combine the “inside” pairs PE2←V PV ←E2 = I) that
n
λ1 0
= B n = (PV ←E2 APE2←V )n
0 λn2
= PV ←E2 APE2←V PV ←E2 APE2←V · · · PV ←E2 APE2←V P V ←E2 APE2←V
= PV ←E2 An PE2←V
λn1 0
From this we get
= PE2←V
PV ←E2 .
0 λn2
f1
f1
=
Now suppose we start out with seed vector
.
f0 E
f0
2
a
Let
be the coordinates of the seed vector in the eigenbasis:
b
√
−1
a
f1
f1
(1 + √5)f1 + 2f0
= √
= PV ←E2
=
.
b
f0 V
f0
4 5 (−1 + 5)f1 − 2f0
An
Now comes the payoff for this work.
It is worth pointing out that, all the checking and comments aside, up to
this point there were just three calculations involving more than addition:
we had to factorize the characteristic polynomial, we had to find the inverse
matrix PV ←E2 and we just found the eigencoordinates of the seed vector.
It now follows that
f
fn+1
n
n f1
=A
= PE2←V PV ←E2 A PE2←V PV ←E2 1
f0
fn
f0
n
n
λ1 0
f1
λ1 0
a
=PE2←V
= PE2←V
PV ←E2
0 λn2
f0
0 λn2
b
n λ1 a
=PE2←V
= λn1 av1 + λn2 bv2 .
λn2 b
There are some interesting things that can be drawn from this.
First, the magnitude of λ2 is about 0.6, so for even modestly sized n the
−4
last term in the last line above is tiny. For instance λ20
2 < 10 . So unless
a = 0 the λ1 term will utterly swamp the λ2 term very quickly.
OVERVIEW OF REAL LINEAR ALGEBRA
So λn1 av1 is a good approximation to
95
fn+1
for even modestly big n.
fn
36.4. Exercise. Verify the statements about the Fibonacci sequence from
above and conclude that
√
√
f0
f1
fn = λn1 a(1 − 5) + λn2 b(1 + 5) = √ (λn1 − λn2 ) + √ λ1n−1 − λ2n−1 .
5
5
Fibonacci enthusiasts call the number λ1 the “golden ratio” and use the
symbol φ for this eigenvalue. Note that λ2 = −φ−1 , and we can use that to
simplify the formula in various ways. For instance if f1 = f0 = 1 we have
Binet’s formula,
1 fn = √ φn+1 − (−1/φ)n+1 .
5
Any recursive real sequence Sn where later terms Sn are linear functions
of Sn−1 , . . . , Sn−k (k fixed) can be handled similarly, using k × k matrices
rather than the 2 × 2 matrices we used for the Fibonacci sequence.
37. Approximate Eigenvalues and Eigenvectors.
You can use your calculator to find approximate eigenvalues for F (x) =
Ax where


−8 2 3 7 9
 9 6 7 4 1


.
1
6
−2
6
9
A=


 9 1 −3 2 5
3 2 −7 2 5
Using hardware aid to graph the polynomial y = det(λI5 − A) we see
So there are five real eigenvalues. Each will have an eigenvector, so there is
a basis of eigenvectors. 10.8399 is an approximation to the biggest eigenvalue
λ1 (to the nearest 10−4 ) and the rest (to the same accuracy) are λ2 ≈
−9.8298. λ3 is near 8.2963, λ4 ≈ −4.9077 and λ5 is near −1.3986.
96
LARRY SUSANKA
Finding approximate eigenvectors for your approximate eigenvalues
is trickier. We can’t just solve Ax − 10.8399x = 0 because the matrix
A − 10.8399I5
is nonsingular: 10.8399 is not exactly an eigenvalue.
So let’s define our task. We want to find a vector x so that
kF (x) − λxk = kAx − λxk < 10−3 kxk,
for the approximate eigenvalue λ = 10.8399.
We saw before that if vi is an eigenvector for the ith eigenvalue λi , for
i = 1, . . . , 5, and if x = a1 v1 + · · · + a5 v5 then
Ax =
5
X
a i λi v i
and more generally
i=1
a1
An x =
5
X
ai λni vi .
i=1
So unless
= 0, as you multiply by larger powers of A the part of An x
corresponding to multiples of v1 will form a larger and increasing proportion
of An x.
n
n
λ2
λ5
An x
1
= a v1 +
v2 + · · · +
v5 .
n
λ1
λ1
λ1
If we pick a “randomly chosen” initial x then the expression on the left will
converge to a1 v1 , an eigenvector for λ1 . The problem is that we don’t know
λ1 exactly. This might not seem like such a big problem: after all, in a
practical problem we don’t know the entries of A exactly either. They are
also measured or approximated.
Still, a nicer formula which also produces a multiple of v1 (that is, unless
by colossal misfortune your randomly chosen initial vector has no v1 component) and which does not involve λ1 explicitly, is given by the recursion
relation
Ayn−1
.
y1 = x, yn =
kyn−1 k
The part of yn which is a multiple of v1 becomes proportionately overwhelming, and the magnitude of yn will converge to λ1 .
This process is called power iteration.
9.8298 100
Since 10.8399
≈ 6×10−5 , we should expect y100 would be good enough
as an approximate eigenvector for λ1 , or close to it.
In fact after 102 iterations (using e1 as the “seed”) we find that the unit
vector


0.146067089516298237
 0.896414826917890095 



w=
 0.405101325701891668 
 0.0671042007313784328 
−0.0805903594151132635
satisfies
kAw − 10.8399wk < 10−3 kwk.
OVERVIEW OF REAL LINEAR ALGEBRA
97
Though conceptually interesting, the method is grossly inefficient, requiring a very large number of calculation steps unless the ratio λλ21 is small.
One way to take advantage of estimates of the eigenvalues to improve
this situation is called inverse iteration. This relies on the fact that the
eigenvectors for
Bµ = (A − µIn )−1
are the same as the eigenvectors for A when µ is not an eigenvalue for A. In
fact, in this case if λ is an eigenvalue for A then (λ − µ)−1 is an eigenvalue
for Bµ and any eigenvector for A for eigenvalue λ is an eigenvector for Bµ
for eigenvalue (λ − µ)−1 . The point here is that if λ is close to µ, the largest
eigenvalue of Bµ will be huge in comparison to the second largest, so power
iteration of Bµ should converge comparatively quickly to an eigenvector for
A for eigenvalue λ.
Using B10.8 and iterating twice provides an approximate eigenvector about
as close to the true eigenvector as 102 iterations produced in the direct
calculation, yielding unit vector


0.146128109664179606
 0.896389101389033605 


.
0.405139215156500421
w1 = 


 0.0671255800941089886 
−0.805576015434486675
The norm of Aw1 is about 10.8404. Again, we emphasize this is achieved
after just two iterations.
This method also allows for approximation to an eigenvector for each
distinct (real) eigenvalue. Choosing µ to be (respectively, one after another)
−9.8, 8.3, −4.9 and −1.4 we obtain, after two iterations, unit approximate
eigenvectors




0.654318825949867344
−0.376569981063663017
−.447396299161207878
 −.921072301620784795 







w2 =  0.393306187768773075  , w3 = 
 −.333298382765519541  ,
−.432930248564862230
 0.0960115255988746169 
0.172003104722630806
0.172921197953743749




−.345907090362161995
0.0860602631013127478
 0.463579909121897504 
−0.693437559456339253







w4 = −.492261570378854152 , w5 = 
 0.314753866089923562 
 0.484190962465213981 
 −.756740328294183806 
.434372664356520444
0.562191385317338810
and where
kAw2 k ≈ 9.8300,
kAw3 k ≈ 8.2963,
kAw4 k ≈ 4.9077,
kAw5 k ≈ 1.3986.
98
LARRY SUSANKA
If we let W be the ordered basis of eigenvectors W = { w1 , w2 , w3 , w4 , w5 }
we can create the matrix of transition PE5←W = ( w1 w2 w3 w4 w5 ). We find
that we can approximately diagonalize matrix A as
PW←E5 APE5←W


10.839933 0.000046
0.000005
0.000003
0.000000
−0.000073 −9.829828 −0.000001 −0.000010 −0.000010


8.296279
0.000004
0.000000 
= 
−0.000679 0.000055
.
−0.000114 0.000217 −0.000001 −4.907735 −0.000001
−0.000052 0.000045 −0.000001 0.000006 −1.398648
38. Nullspace, Image, Columnspace and Solutions.
The nullspace of an m × n matrix M is the set of vectors “killed” by
left-multiplication by a matrix. It is the kernel of the linear transformation
given by left multiplication by matrix M . It is a vector subspace of Rn ,
denoted nullspace(M). If M is square and 0 is an eigenvalue of M , the
nullspace is the eigenspace of M for eigenvalue 0. You can also think of it
as the solution set of the homogeneous system determined by the coefficient
matrix M .
A nonzero vector is in the nullspace of M exactly when it is perpendicular
to all the vectors obtained as the transpose of the rows of M .
The columnspace of M is the image of the matrix as a function from
Rn to Rm .
Writing M as a block matrix with columns C1 , . . . , Cn we have
M x = (C1 C2 · · · Cn ) x = xi Ci
so the result of multiplying M by x is explicitly a linear combination of the
columns of M . The columnspace is the span of the columns of M , denoted
colspace(M).
If the nullspace of an m by n matrix M is trivial, the linear function
formed using matrix M is called “one-to-one.” Solutions to the equation
M x = b, if they exist, are unique when the linear function is one-to-one.
The columns of a matrix M form a linearly independent set
exactly when nullspace(M ) = { 0 } .
If the columns of M span Rm , the function obtained by left multiplication
by M is called “onto.” When this function is onto, there is always a solution
to M x = b. In any case, onto or not, there is a solution to M x = b exactly
when b is in the columnspace of M .
Suppose we want the set of solutions to M x = b where M is an m by n
matrix. If b is in the columnspace and p is any particular solution and N
is the nullspace of M then the solution set is p + N , which we define to be
OVERVIEW OF REAL LINEAR ALGEBRA
99
the set of all vectors of the form p + n for n ∈ N . The solution set will not
be a subspace unless p is in N , in which case p + N = N.
The kernel and image corresponding to any linear function f between
finite dimensional spaces V and W can be found by applying the above
remarks to the matrix of the function with respect to coordinates.
If you find a bases for the nullspace and the columnspace of the matrix,
the coordinate maps can be used to find vectors in V and W respectively
which comprise bases for ker(f ) and image(f ).
The vocabulary “one-to-one” and “onto” is applied to f when the property
holds for any (and hence every) matrix for f .
Rephrasing the comments from above regarding solution sets to apply to
a more general vector space, if p is an element of the vector space V and N
is a subspace of V define the set p + N to be the set of all vectors of the
form p + w where w is from N . This is the equivalent of a plane or line
in space, possibly not through the origin. It will go through the origin only
when p is in N , in which case p + N = N .
38.1. Exercise. A function f : V → W is one-to-one and onto exactly when
an inverse function f −1 : W → V exists. Show that in this case if f is linear
so is f −1 . Show that if f is one-to-one and onto then dim(V ) = dim(W ).
38.2. Exercise. (i) Find the columnspace

1 3
A = 2 0
1 1
and the nullspace of the matrix

4
2 .
2
(ii) Find the image and kernel of the function given by G(x) = Ax.
(iii) Can the equation G(x) = (1, 1, −3), with G as above, be solved?
(iv) Find the image and kernel of the function F given by the formula
F (x) = P rojw (x) for w = (1, 2, 3).
39. More on Nullspace and Columnspace.
Identifying a vector subspace explicitly is often accomplished by locating
a basis for that subspace. Finding a basis for a nullspace or a columnspace,
for instance, can be tedious. Here are some timesaving tips to accomplish
these tasks.
Convert the m by n matrix A = (ai,j ) to row reduced echelon form R =
(ri,j ) by elementary row operations.
R = M A where M is the invertible m × m matrix that is the product
of the elementary row matrices used to produce R. So A = BR where the
m × m matrix B is the product, in reverse order, of the “opposite” of those
elementary row operations used to formed R.
100
LARRY SUSANKA
If the rows of a matrix are thought of as 1 × n matrices, their span is
called the rowspace of the matrix. Left multiplication by an elementary row
matrix does not change the rowspace, so A and R have the same rowspace.
Let’s suppose that there are exactly k nonzero rows in the echelon matrix
R, so the common rowspace of A and R has dimension k.
Looking at A = BR as the product of column block matrices we have
(A1 . . . Aj . . . An ) = (B1 . . . Bm ) (R1 . . . Rj . . . Rn ) .
Focusing attention on the jth column we get Aj = BRj or, expanding,
Aj = r1,j B1 + · · · + rm,j Bm = r1,j B1 + · · · + rk,j Bk
because ri,j = 0 whenever m ≥ i > k.
So we have, explicitly, all n columns of A as linear combination of these
first k columns of B, so the dimension of the columnspace is some number
t which cannot exceed k, the dimension of the rowspace of A.
The same fact is true of AT , the transpose of A, whose rows are the
columns of A. We know therefore that k cannot exceed t, and conclude that
the dimension of the columnspace of A is exactly k.
For any matrix, the dimension of the columnspace
is the same as the dimension of the rowspace.
We now make three important points.
First, a more careful examination of the linear combination of columns
above shows that the columns in the original matrix A corresponding to the
pivot columns in the echelon matrix form a linearly independent set of k
columns, which therefore comprise a basis of the columnspace of A.
To see this, note that B is invertible so its columns are linearly independent. If Rj is a pivot column of R with single nonzero entry (with value 1
there, of course) located at row s then the above block equation
Aj = r1,j B1 + · · · + rs,j Bs + · · · + rm,j Bm
identifies Aj as Bs .
Second, as noted above, reduction to rref is obtained by elementary operations on the rows of a matrix, which does not alter the rowspace. These
rows are in a relatively simple form in comparison to the starting rows.
Therefore, if you want a nice basis (defined as having as few nonzero entries
as possible) for the span of a set of vectors, take the transpose of the matrix
having these vectors as columns and hit it with the rref stick. The rowspace
is unchanged by this. The transpose of the nonzero rows will be a basis for
the original columnspace.
Third, because it involves left-multiplication by invertible matrices, it
is easy to show that the process of reduction to rref does not change the
OVERVIEW OF REAL LINEAR ALGEBRA
101
nullspace of a matrix. If there are k nonzero rows in the echelon matrix,
they can be used to form a homogeneous system of k independent equations
in n unknowns satisfied by (and only by) the vectors in the nullspace, which
therefore has dimension n − k.
This observation shows that the dimension of the nullspace (the nullity
of A, also denoted nullity(A)) plus the dimension of the columnspace (the
rank of A, also denoted rank(A)) must add to n, the dimension of the
domain.
39.1. Exercise. Find a basis for the nullspace and a basis for the columnspace
of A, where


4 2 3 9
A =  3 6 9 1 .
−1 4 6 −8
39.2. Exercise. Show that similar matrices have the same rank and nullity.
40. Rank Plus Nullity is the Domain Dimension.
By analogy with the discussion for matrix transformations, the rank of
any linear transformation f (not just one given by a matrix) is the dimension
of image(f ). This dimension is denoted rank(f ). The nullity of a linear
transformation of f , is the dimension of ker(f ). This dimension is denoted
nullity(f ).
We will now give another proof of a very important fact, proved for matrices in the last section. This proof here has the advantage of being direct,
but also uses many important ideas.
For any linear function, the dimension of the kernel plus the dimension
of the image is the dimension of the domain.
If f : V → W is linear
rank(f ) + nullity(f ) = dim(V ).
In the course of understanding this proof you must clarify your understanding of the central ideas used in it. I advise you to think about it until
you understand it completely and can reproduce it. This is the second (and
last) proof which I identify as special in this way in these notes.
Suppose f : V → W is linear, and V is an n dimensional space. We know
that both image(f ) and ker(f ) are subspaces of W and V , respectively.
Suppose k = rank(f ) = dim(image(f )) and m = nullity(f ) = dim(ker(f )).
We will show that m + k = n.
The proof goes as follows.
image(f ) and ker(f ) have bases { g1 , . . . , gk } and { w1 , . . . , wm }, respectively. For each i, the vector gi is in image(f ) so there is a vector hi ∈ V
102
LARRY SUSANKA
with f (hi ) = gi . We will show that S = { h1 , . . . , hk , w1 , . . . , wm } is a
basis for V , and so k + m = n.
To do that we must show that S is linearly independent and that S spans
V.
Suppose ai hi + bk wk = 0.
Because the wk are in ker(f )
0 = f (0) = f (ai hi + bk wk ) = ai f (hi ) + bk f (wk ) = ai f (hi ) = ai gi .
Since the gi form a basis for image(f ) they are independent. That means
ai = 0 for all i. So the equation given above was really just bk wk = 0, and
since the wk are independent we are forced to conclude that bk = 0 for all
k too. So S is an independent set of vectors.
Now suppose v is a generic member of V . So f (v) ∈ image(f ) and
therefore can be written as f (v) = ai gi . So f (v) = ai f (hi ) which implies
f (v − ai hi ) = 0. So v − ai hi ∈ ker(f ). So there are constants bk with
v − ai hi = bk wk . We conclude that v = ai hi + bk wk and S spans V . The
proof is complete.
40.1. Exercise. (i) Suppose H : R4 → R4 is linear and
Image(H) = Span{ (1, 3, 6, 0), (1, 2, 1, 2) }.
Are there two different vectors v and w with H(v) = H(w) = (1, 3, 6, 0)?
(ii) Suppose F : R4 → R2 is linear and
Ker(F ) = Span{ (5, 3, 6, 0), (8, 2, 9, 2) }.
Must there be a solution to F (v) = (1, 3)?
(iii) G : R5 → R2 is linear. What are possible values for nullity(G)?
41. Sum, Intersection and Dimension.
If U and W are subspaces of V we defined U + W to be the set of all
vectors of the form u + w where u is from U and w is from W . This set is
called the sum of U and V . The sum is itself a vector subspace of V .
The overlap of the two subspaces, U ∩ W , is also a vector subspace of V .
In fact, U ∩ W is a subspace of U and of W and of U + W .
We saw that if all spaces are subspaces of Rn that there is an interesting
relationship between the dimensions of these various spaces. We assume
here only that U + W is finite dimensional.
Let S be a basis { s1 , . . . , si } for U ∩ W . Using S as a starter set, we add
members { r1 , . . . , rj } of U to create a basis R = { s1 , . . . , si , r1 , . . . , rj } of
U . Again using S as a starter we add members { t1 , . . . , tk } of W to create
basis T = { s1 , . . . , si , t1 , . . . , tk } of W .
OVERVIEW OF REAL LINEAR ALGEBRA
103
So dim(U ∩ W ) = i and dim(U ) = i + j and dim(W ) = i + k.
Note (prove, if you wish) the following: if any linear combination bx rx +
x
y
z
y is in U ∩ W then all the b must be 0. Similarly, if c sy + d tz is in
z
U ∩ W then all the scalars d must be 0.
cy s
We will now show that A = { r1 , . . . , rj , s1 , . . . , si , t1 , . . . , tk } is a basis
for U + W so that dim(U + W ) = i + j + k.
It is easy to see that A spans U + W . That is because every member of
U + W is the sum of a member of U and a member of W . Since each of
these members are linear combinations of members of A, so is their sum.
It remains only to demonstrate that A is linearly independent.
Suppose
bx r x + c y s y + d z t z = 0
for certain scalars bx , cy and dz .
That means bx rx + cy sy = −dz tz . The left side is in U and the right side
is in W . So both sides are actually in W and U : that is, bx rx +cy sy ∈ U ∩W .
But by our earlier remark this means all the bx are 0. By identical argument,
all the dz are zero.
So the original equation was cy sy = 0 which, since S is a basis, implies
all the cy are 0 too.
So A is a basis of U + W . We conclude that:
dim(U ) + dim(W ) = dim(U ∩ W ) + dim(U + W ).
41.1. Exercise. Show that the boxed statement above remains true in case
any one of the four spaces involved is infinite dimensional, in the sense that
both sides of the equation must then involve at least one infinite dimensional
space.
41.2. Exercise. (i) Suppose G : R5 → R4 is linear and nullity(G) = 2.
Suppose vectors v = (1, 2, 7, 0) and w = (1, 0, 0, 1) are not in image(G).
T
What is dim( Span({ v, w }) image(G) )? Why?
(ii) K : R5 → R2 is linear and rank(K) = 2. Also, { v1 , v2 , v3 } is an
independent set of three vectors in R5 , none of which is in ker(K).
T
What values of dim( Ker(K) Span( { v1 , v2 , v3 } ) are possible?
42. Direct Sum.
Suppose given two nontrivial subspaces U and W of vector space V .
The overlap of the two subspaces, U ∩ W , is also a vector subspace of V .
If this intersection is just {0} we write U ⊕ W for the sum U + W of the
two subspaces U and W . The sum is called a direct sum in this case.
104
LARRY SUSANKA
The important thing about a direct sum is that every vector in it can be
written in exactly one way as the sum of a vector from W and a vector from
U.
To see this, we suppose a vector v in U ⊕ W can be written as v =
w1 + u1 = w2 + u2 where both w1 and w2 are in W and u1 and u2 are in
U.
But then w1 − w2 = u2 − u1 and so these differences must be in both W
and V . Since U ∩ W = {0} we have w1 − w2 = u2 − u1 = 0. This means
that w1 = w2 and also u2 = u1 .
This uniqueness is important and useful. So we are left with the problem
of finding U ∩ W , which might be useful for something even if it is not {0}.
We learned how to find a basis for U ∩ W in section 22.
2
2−1
2
42.1. Exercise. V
=
Span
t
+
1,
t
and
W
=
Span
t,
t
3
and U = Span
t + 1, t2 − 1
.
Is V + W = V ⊕ W ? Is V + U = V ⊕ U ? Is U + W = U ⊕ W ?
More generally, if W1 , W2 , . . . , Wk are nontrivial subspaces of V we can
form the subspace W of V consisting of the span of all the vectors in any of
the Wi .
Every vector in W can be written as a sum
w1 + w2 + · · · + wk
where wi ∈ Wi for all i.
We define the sum W1 + W2 + · · · + Wk to be W .
Of particular interest are those sums where the representation of each
vector in W is unique. Specifically, we write W = W1 ⊕ W2 ⊕ · · · ⊕ Wk
and call the sum a direct sum, when
0 = w1 + w2 + · · · + wk , where wi ∈ Wi for all i, implies wi = 0 for all i.
42.2. Exercise. Show that if W = W1 ⊕ W2 ⊕ · · · ⊕ Wk and if
v 1 + v 2 + · · · + v k = w1 + w2 + · · · + wk ,
where vi and wi are in Wi for all i then vi = wi for all i.
If v is any nonzero vector define the one dimensional subspace
Rv = { rv | r ∈ R }.
42.3. Exercise. If { v1 , v2 , . . . , vk } is a basis of W then
W = Rv1 ⊕ Rv2 ⊕ · · · ⊕ Rvk .
OVERVIEW OF REAL LINEAR ALGEBRA
105
43. A Function with Specified Kernel and Image.
We now create a linear function f : V → W with specified kernel and
image in vector spaces V and W .
Throughout this section, we suppose given a basis S = { s1 , . . . , sk , sk+1 , . . . , sn }
of V and independent vectors T = { t1 , . . . , tk } in W .
Let Y = span({ sk+1 , . . . , sn }) and U = span({ t1 , . . . , tk }).
Any x in V can be written in a unique way as
x = c1 s1 + · · · + ck sk + ck+1 sk+1 + · · · + cn sn .
We define f by f (x) = c1 t1 + · · · + ck tk .
So f sends the first k terms in the sum to the same combination involving
the independent members of T , and sends the last n − k terms to 0. It is
easy to show that f is linear.
The kernel of f is Y : If f (x) = 0 then independence of the members of
T implies xi = 0 for 0 ≤ i ≤ k which implies x ∈ Y . Clearly any member of
Y is in the kernel of f .
The image of f is, explicitly, U .
There are many different functions of this kind whose image is U . They
correspond to all the different ways of selecting a basis t1 , . . . , tk for U .
There are also many different functions of this kind whose kernel is Y .
They correspond to different ways of adding (or prepending, I guess) vectors
to sk+1 , . . . , sn to complete a basis for V .
43.1. Exercise. Extend t1 , . . . , tk to a basis T = { t1 , . . . , tk , tk+1 , . . . , tm }
for W . Find the matrix MT←S for the function f : V → W described above.
43.2. Exercise. Suppose F : V → W is linear and V = A ⊕ B and W =
C ⊕ D. We will also suppose that F (a) ∈ C and F (b) ∈ D for all a ∈ A
and b ∈ B. Finally, find bases S = { s1 , . . . , sk , sk+1 , . . . , sn } for V and
T = { t1 , . . . , tj , tj+1 , . . . , tm } for W , selected so that { s1 , . . . , sk } is a basis
for A and { t1 , . . . , tj } is a basis for C.
What does the matrix of F with respect to these bases look like?
Suppose V = Rn and W = Rm . Consider the matrices
PEn←S = (s1 · · · sn )
and
K = (t1 · · · tk ).
PS←En x gives coordinates of x in basis S so if J is the matrix formed from
the top k rows of PS←En then Jx ∈ Rk will be the vector consisting of the
first k S-coordinates of x.
106
LARRY SUSANKA
The k rows of the k×n matrix J are linearly independent so the columnspace
will have dimension k. Since the range is Rk that means the function obtained by left multiplication by J will be onto Rk . The kernel of this function
is obviously span({ sk+1 , . . . , sn }).
The columnspace of the m × k matrix K is the span of the independent
vectors t1 , . . . , tk of Rm and so has dimension k.
Let f : Rn → Rm be given by f (x) = KJx.
ker(f ) = span({ sk+1 , . . . , sn }) and image(f ) = span({ t1 , . . . , tk }).
43.3. Exercise. Create the matrix with respect to standard bases in domain
and range for a linear transformation H : R5 → R3 for which
       
1
0 

 1










2
1
2

       







ker(H) = span  6 , 6 , 6 
 and


5 0 5 








3
3
1
     
1 
 1




0 , 1  .
image(H) =


1
4
This involves a judicious choice of intermediary bases in domain and range.
Done properly, there will be no calculation done by hand. There are many
correct answers.
44. Inner Products.
An inner product on a real vector space V is a real valued function,
which we will denote h· , ·i, defined on ordered pairs of vectors and for which,
for all constants c and vectors v, w and z:
1. hv, wi = hw, vi
2. hcv, wi = chv, wi and hv + w, zi = hv, zi + hw, zi
3. hy, yi > 0 for any nonzero vector y
These properties have names. The first property is called symmetry.
The function h·, ·i is called linear in its first slot by virtue of the second
property. It is called positive definite if the third property holds.
An inner product allows you to import the concept of magnitude, distance
and angle into your vector space.
p
The norm of a vector v is denoted kvk and defined to be hv, vi. The
distance between vectors v and w is denoted d(v, w) and defined to be kv−
wk. The angle θ between two vectors is defined by hv, wi = kvk kwk cos(θ).
OVERVIEW OF REAL LINEAR ALGEBRA
107
v and w are called orthogonal (to each other) if hv, wi = 0.
With an inner product we have the extremely important concept of projection.
For vector v and nonzero vector w define P rojw (v) =
hv, wi
w.
hw, wi
Note that for any vector v, the vector v − P rojw (v) is orthogonal to w:
when you take away the part of v which lies in the direction of w, the part
remaining is orthogonal to w.
Among other properties, the norm satisfies the triangle and CauchySchwarz inequalities: for all pairs of vectors v and w
| kvk − kwk | ≤ kv + wk ≤ kvk + kwk and |hv, wi| ≤ kvkkwk.
A vector space endowed with an inner product is called an inner product
space.
45. The Matrix for an Inner Product.
Every inner product h·, ·i on Rn can be given as a formula involving a
square matrix G. In fact,
hx, yi = xT G y where G has entries gi j = hei , ej i
and the ei are the standard basis vectors of Rn .
This is easy to prove if x and y themselves are standard basis vectors,
and the general fact follows by linearity of matrix multiplication.
Here are the most important facts about the matrix of an inner product,
proved in Sections 52 and 56.
The matrix of any inner product must be symmetric with only
positive eigenvalues.
On the other hand any symmetric matrix is diagonalizable with an
orthogonal matrix of transition. So there is a basis of orthonormal
eigenvectors for the matrix of any inner product.
And if all the eigenvalues of a symmetric matrix are positive it can
be used to create an inner product.
If the matrix G of an inner product on Rn is diagonal, the inner product
is called a weighted Euclidean inner product. The dot product itself
corresponds to the identity matrix.
If V is any n-dimensional vector space with inner product h·, ·i and S an
ordered basis for V then the matrix GS defined by gi j = hsi , sj i can be used
to calculate inner products in V . Specifically,
108
LARRY SUSANKA
hx, yi = [x]TS GS [y]S
and the matrix GS itself corresponds to an inner product on Rn .
45.1. Exercise. Prove the statement in the box above.
45.2. Exercise. None of the matrices listed below could be the matrix of an
inner product. Explain why: which of the defining properties of an inner
product would be violated?






5 1 0
5 0 0
−2 0 0
1 8 0 
0 0 0 
 0 8 0
1 0 10
0 0 10
0 0 10
45.3. Exercise. The graphs of the characteristic polynomials of three 4 × 4
symmetric matrices are shown below. (Recall that you can create such graphs
directly using the determinant function in your calculator.) Decide in each
case if the matrix will be or cannot be the matrix of an inner product on R4 .
45.4. Exercise. Decide which of the following functions could be inner products. Find the matrices of those that are.
(i) F : R3 × R3 → R , F (x, y) = 7x1 y 1 + x2 y 1 + x1 y 2 + 3x2 y 2 + 4x3 y 3 .
(ii) G : R3 × R3 → R , G(x, y) = 7x1 y 1 + 2x2 y 1 + x1 y 2 + 3x2 y 2 + 4x3 y 3 .
(iii) K : R3 × R3 → R ,K(x, y) = 2x1 y 1 + 3x2 y 1 + 3x1 y 2 + 2x2 y 2 + 4x3 y 3 .
(iv) Find the angle between (1, 3, 2) and (1, 1, 1) with respect to any inner
products you found above.
OVERVIEW OF REAL LINEAR ALGEBRA
109
45.5. Exercise. At least one of the functions defined below on P2 (t) × P2 (t)
is an inner product. Identify which are and calculate the matrix for each of
these. Then find the angle between t2 + 5 and t − 1 with respect to each inner
product.
R1
(i) F (x, y) = 0 x(t)y(t)dt
R1
(ii) G(x, y) = 0 (x(t) + 1)y(t)dt
R1
(iii) H(x, y) = 0 t x(t)y(t)dt
Rt
(iv) K(x, y) = 0 x(u)y(u)du
46. Orthogonal Complements.
If v is a vector in inner product space V we define v⊥ to be the set of
all vectors orthogonal to v. This set is called the orthogonal complement of
the vector v.
More generally, if W is a nonempty subset of an inner product space V
we define W ⊥ to be the set of all vectors orthogonal to every member of
W . It is called the orthogonal complement of W , and read aloud as “W
perp.”
46.1. Exercise. (i) Show that W ⊥ is a subspace of V .
(ii) If H ⊂ K ⊂ V and H is nonempty then K ⊥ ⊂ H ⊥ .
(iii) If H is a basis of subspace W of V then H ⊥ = W ⊥ .
⊥
(iv) Show that W ⊥ = Span(W )⊥ and Span(W ) = W ⊥ .
(v) If v is a nonzero vector, show that V = Rv ⊕ v⊥ .
If W and U are two subspaces of V , we define W + U to be the set of
all vectors of the form w + u where w is a generic member of W and u an
arbitrary member of U . W + U is a subspace of V too.
It is obvious that W ∩ W ⊥ = {0} and we will see in section 48 (extend
an orthonormal basis of W to an orthonormal basis of all of V ) that V =
W + W ⊥ . So the sum is direct.
V = W ⊕ W ⊥ when W is a subspace of inner product space V.
Now consider a function f : Rn → Rm given by left matrix multiplication
f (x) = Ax
A is an m × n matrix.
The matrix AT produces another function f˜: Rm → Rn given by left
matrix multiplication
f˜(x) = AT x
AT is an n × m matrix.
110
LARRY SUSANKA
There is an interesting relationship between the kernels and images of
these two functions, important enough that some authors refer to it as the
Fundamental Theorem of Linear Algebra.
Since the rows of AT are the columns of A, any member of ker f˜ must
be perpendicular to every column of A, and so must be perpendicular
to
˜
any linear combination of such columns. It follows that v ∈ ker f exactly
when v ∈ image(f )⊥ .
⊥
= ker(f ).
Swapping A with AT we find, analogously, that image f˜
Remember that image f˜ is the span of the transposed rows of A.
Rn =image f˜ ⊕ ker(f )
and
Rm = image(f ) ⊕ ker f˜
and the direct summands are orthogonal complements.
nullity(f ) = n − r
nullity f˜ = m − r.
rank(f ) = rank f˜ = r
With this result inhand,
let S = { s1 , . . . , sr , sr+1 , . . . sn } where { s1 , . . . , sr }
is a basis for image f˜ and { sr+1 , . . . sn } is a basis for ker(f ).
Let T = { t1 , . . . , tr , tr+1 , . . . tm } where
{ t1 , . . . , tr } is a basis for image(f )
and { tr+1 , . . . tm } is a basis for ker f˜ .
Then the m × n matrix
MT←S = PT←Em APEn←S =
B O
O O
where B is an r × r invertible submatrix and O represents zero blocks of the
appropriate sizes at each location (if r = n or m two or all three of them
will be missing.)
46.2. Exercise. With the situation as above, create an n × m matrix C that
acts as a “partial inverse” to A in the sense that
Ir O
Ir O
PS←En
PT←Em and CA = PEn←S
AC = PEm←T
O O
O O
where the zero blocks are the appropriate sizes to form an m × m matrix on
the left and an n × n matrix on the right. Show how to use this to find a
solution for
Ax = b
when b is in image(f ).
OVERVIEW OF REAL LINEAR ALGEBRA
111
47. Orthogonal and Orthonormal Bases.
Suppose V is any vector space with inner product h·, ·i and B = { b1 , . . . , bn }
is an ordered basis. We are going to transform the basis B into a new basis
using a procedure with n steps, and will use the superscript found below to
indicate where we are in this process.
Define
v 1 = b1
and
v2 = b2 − P rojv1 (b2 ).
Generally, for i = 3, . . . , n define vectors
vi = bi − P rojv1 (bi ) − P rojv2 (bi ) − · · · − P rojvi−1 (bi ).
This produces an ordered basis of vectors vi , . . . , vi of V which are orthogonal to each other. This is called an orthogonal basis.
Sometimes we wish to go further. Letting ui = kvvii k for each i produces
an ordered orthonormal basis: S = { u1 , . . . , un }. Each vector in S has
length one and is orthogonal to every other vector in S. The procedure
which produces this basis is called the Gram-Schmidt process.
Orthonormal bases are very useful. For instance, if S = { u1 , . . . , un } is
orthonormal then
b = b1 u1 + · · · + bn un = hb, u1 iu1 + · · · + hb, un iun
for any b ∈ V.
The coefficients bi = hb, ui i on the basis vectors ui are called the Fourier
coefficients of the vector b with respect to orthonormal basis S.
47.1. Exercise. (i) Inner product h·, ·i on R2 is given by hx, yi = 7x1 y 1 +
4x2 y 1 + 4x1 y 2 + 3x2 y 2 . Find an orthonormal basis for R2 with respect to this
inner product.
R1
(ii) Inner product h·, ·i on P2 (t) is given by hx, yi = −1 60x(t)y(t)dt.
Find an orthonormal basis for P2 (t) with respect to this inner product.
47.2. Exercise. Show that h·, ·i defined on Mn×m by hA, Bi = trace(AT B)
is an inner product.
If n = m the set of symmetric matrices, which we denote Symn , and the
set of skew symmetric matrices, which we denote Skewn , are subspaces of
Mn×n . Note that any matrix A in Mn×n can be written as
A=
A − AT
A + AT
+
.
2
2
⊥
Describe Sym⊥
n and Skewn . Find an orthonormal basis for M3×3 containing
an orthonormal basis for Sym3 .
112
LARRY SUSANKA
48. Projection onto Subspaces in an Inner Product Space.
It is a fact that if you divide the members of an orthonormal basis S
into two subsets A and B then Span(A)⊥ = Span(B) and V = Span(A) ⊕
Span(B).
If p = v + w where v is in Span(A) and w is in Span(B) then:
kpk2 = kvk2 + kwk2 .
(The Pythagorean Theorem)
Suppose W is any nontrivial subspace of inner product space V . We can
find an orthonormal basis { u1 , . . . , um } for W and then extend that to an
orthonormal basis { u1 , . . . , um , um+1 , . . . un } for V . Any vector b in V can
be written in a unique way as
!
!
m
n
X
X
b=
bi u i +
bi ui = p + q
i=1
i=m+1
where p is in W , and q is in W ⊥ . The vector p is denoted P rojW (b), and
called the projection of b onto the subspace W . P rojW is a linear function
from V to V . The image of P rojW is W . The kernel of this map is W ⊥ .
The vector P rojW (b) is the unique member of W which is nearest to
b. Note P rojW (P rojW (b)) = P rojW (b) and P rojW (b) = 0 exactly when
b ∈ W ⊥ and P rojW (b) = b exactly when b ∈ W.
With b = p + q as above, we can define the reflection of b in the subspace
W to be Ref lW (b) = p − q. Reflection is also a linear function from V to
V . Its image is all of V and Ref lW (Ref lW (b)) = b.
Don’t forget that any linear transformation, including these two, has a
matrix. Creating the matrix only involves calculating the function on a
basis.
48.1. Exercise. Consider R4 with the usual inner product. Find a basis
B = { b1 , b2 } for W ⊥ where W = Span({ (0, −3, 2, 0), (−1, 1, 0, 1) }).
Then convert { (0, −3, 2, 0), (−1, 1, 0, 1), b1 , b2 } (in this order) into an
orthogonormal basis using the Gram-Schmidt process.
48.2. Exercise. Using the matrix for item (8) in Exercise 18.1 and a change
of basis as intermediary, create a matrix that will rotate points in R3 around
axis vector (1, 2, 3) (tail at the origin, as usual) by angle π/3 counterclockwise
as seen by an observer looking from the point (1, 2, 3) down onto the plane
of rotation perpendicular to (1, 2, 3) through the origin.
OVERVIEW OF REAL LINEAR ALGEBRA
113
49. A Type of “Approximate” Solution.
Now suppose we are attempting to solve the matrix equation M x = b,
where M is an m by n matrix, x is in Rn and b is a fixed member of Rm .
We know that this equation has a solution only if b is in the columnspace
of the matrix M . But what if it isn’t? It may be you would be willing to
settle for a solution that puts you as close as possible to b.
In that case let W be the columnpace of M , and let p = P rojW (b). You
can solve M x = p for this “second best” or “approximate” solution.
This approximate solution does depend explicitly on the projection, and
therefore the inner product, in use. The solution corresponding to dot product is called the least squares solution.
If we want an efficient way to calculate this solution we can proceed as
follows. First review the discussion of section 46 regarding the Fundamental
Theorem of Linear Algebra.
Since Rm = colspace(M ) ⊕ Ker M T there is a unique representation
of b as a sum p + c where p ∈ colspace(M ) and c ∈ Ker M T . Since
colspace(M )⊥ = Ker M T we have p · c = 0. The vector p is the unique
member of colspace(M ) closest to b with respect to the Euclidean norm.
If M x = p then M T M x = M T p = M T p + M T c = M T b. On the other
hand, if M T Mx = M T b then M x − b ∈ Ker M T so b = M x + k where
k ∈ Ker M T and M x ∈ colspace(M ). Since the representation of b is
unique, we find p = M x and c = k.
The least squares solution to matrix equation M x = b, where x ∈ Rn
and b is a fixed member of Rm , is the solution to M T M x = M T b.
50. Embedding an Inner Product Space in Euclidean Space.
Suppose S = { uP
1 , . . . , un } is an orthonormal basis for an inner product
space V , and q = ni=1 qi ui is a generic member of V .
The individual coordinates qi can be easily calculated (assuming you have
done the tedious up-front work of creating orthonormal S—you don’t get
something for nothing) as qi = hui , qi.
P
Also if p = ni=1 pi ui then
hq, pi = q1 p1 + · · · + qn pn = (q1 , q2 , . . . , qn )S · (p1 , p2 , . . . , pn )S = [q]S · [p]S .
That means that angles, lengths, and any other geometrical fact you might
want to know about vectors in V can be calculated using dot product on
the S-coordinates, which are ordinary members of Rn .
To reiterate: we are associating these orthonormal basis vectors ui with
the corresponding standard basis vectors ei in Rn . With this association
114
LARRY SUSANKA
the inner product on V corresponds to dot product in Rn . The matrix GS
is the identity matrix.
51. Effect of Change of Basis on the Matrix for an Inner Product.
Suppose given an inner product h·, ·i on a vector space V . Given an
ordered basis A = { a1 , . . . , an }, this inner product has a matrix GA for
which
hx, yi = [x]TA GA [y]A .
The matrix GA has ijth entry hai , aj i. This inner product will also have a
matrix GB for use on B-coordinates. The question arises as to the relationship between the two matrices. For every pair of vectors x and y we must
have
hx, yi =[x]TA GA [y]A = [x]TB GB [y]B
T
= (PB←A [x]A )T GB (PB←A [y]A ) = [x]TA PB←A
GB PB←A [y]A
T
T
=[x]A PB←A GB PB←A [y]A .
T
It follows that these two matrices are related by GA = PB←A
GB PB←A or,
if you prefer,
−1
T
−1
T
GB = PB←A
GA PB←A
= PA←B
GA PA←B .
So matrices used for inner products change a little differently, under change
of coordinates, from matrices used to represent linear transformations.
There is an interesting observation that can be made here. If both bases
are orthonormal, both GB and GA must be the identity matrix. The equality
T
above then states In = PB←A
PB←A .
−1
So the matrix of transition between two orthonormal bases satisfies PB←A
=
:
the
rows
of
the
matrix
of
transition
from
orthonormal
A
to
orthonorB←A
mal B are the columns of the matrix of transition from B to A. If you recall,
we referred to matrices like this as orthogonal.
PT
51.1. Exercise. A vector space V has

2
GA = 0
0
with respect to basis A.
The matrix of transition PA←B
inner product h·, ·i with matrix

0 0
1 0
0 5


2 0 0
is 0 −1 3. Calculate GB .
0 0 5
OVERVIEW OF REAL LINEAR ALGEBRA
115
52. A Few Facts About Complex Matrices and Vectors.
In this section we are going to review and prove some facts about the
behavior of vectors and matrices with complex numbers as entries. We do
this in order to prove the Spectral Theorem in the next section. This section
is about existence, but the proof does not provide an efficient way to find
the things whose existence is being asserted. Those calculations are done
by means you already know. The theorem merely states that the things
(eigenvectors with certain properties) exist, so it is worth your time to look
for them.
Recall that a complex number is an expression “of the form” a + bi
were i2 = −1 and the numbers a and b are real. This form is called the
standard form of the complex number.
Obviously whatever i might be it is not a real number. We perform
arithmetic involving this new number exactly as we would if it were any
other square root: whenever we see i2 we replace it by −1.
If z = a + bi we define the conjugate of z, denoted z, to be a − bi. The
magnitude of the complex number z is denoted kzk and defined by
p
√
kzk = zz = a2 + b2 .
It follows that the multiplicative inverse of z is z −1 =
z
z
=
.
zz
kzk2
It is easy to show the following facts for two complex numbers z = a + bi
and w = c + di:
z+w =z+w
Also z + z = 2a is real
and
and
(zw) = (z) (w).
i(z − z) = 2b is real.
In this section we will fatten up our matrices to (potentially) contain
complex number entries, and our vectors will be drawn from Cn , the space
of n × 1 column matrices with complex entries.
Matrix addition and matrix multiplication and scalar multiplication are
defined just as before, with complex numbers in place of real numbers.
The concepts of spanning set, linear independence, basis and also the
process of finding complex solutions to a system of equations with complex
coefficients are unchanged when applied to Cn , where scalars and entries
and the coefficients of linear combinations are all drawn from C rather than
R.
There is one additional operation when entries and scalars are allowed to
be complex. If A = (ai j ) is any m × n matrix with complex entries we define
A to be the matrix (ai j ).
116
LARRY SUSANKA
It is an easy exercise to show that if A, B and C are of compatible shapes
(so the following operations make sense)
AB + C = ( A ) ( B ) + C
A + A is real
i(A − A) is real.
The norm of the complex vector x ∈ Cn is denoted kxk and defined to
be
p
√
kxk = x · x = xj xj .
For any complex number c we find kcxk = kckkxk. Also, kxk = 0 exactly
when x = 0.
Note that any x ∈ Cn can be written in a unique way as a + bi where
a, b ∈ Rn . Then kxk2 = kak2 + kbk2 .
T
If B is any m × n matrix, the n × m matrix B∗ is defined to be B , the
conjugate transpose of B.
So, for example, the square norm of vector x is x∗ x.
52.1. Exercise. Suppose x, y ∈ Cn .
(i) Show that x∗ y = 0 exactly when y∗ x = 0. If the condition holds, x
and y are said to be orthogonal (to each other.)
(ii) Show that x∗ y = xT y = yT x.
(iii) If x 6= 0 show that
x
kxk
If x 6= 0 define Projx (y) =
has norm 1.
x∗ y
x∗ x x
and CoProjx (y) = y − P rojx (y).
(iv) Show that x∗ CoP rojx (y) = 0.
(v) Show how to adapt the Gramm-Schmidt process so that it can be applied to produce, from any (complex) ordered basis { v1 , . . . , vn } of Cn a
basis of vectors which are all orthogonal (pairwise) and of norm 1.
A square matrix M is called Hermitian or self-adjoint exactly when
M∗ = M.
A square matrix Q is called unitary exactly when Q∗ = Q−1 , so for these
matrices QQ∗ = Q∗ Q = In .
The columns and rows of a unitary matrix are of (complex) norm 1, and
if Ci is the ith column of unitary Q then C∗i Cj = δi j . The columns of an
n × n unitary matrix form a (complex) basis of Cn , and these basis vectors
are orthogonal.
Unitary matrices are the complex equivalent of orthogonal matrices. Of
course, orthogonal matrices are unitary.
It is easy to see that the product of (compatible) unitary matrices is
unitary.
OVERVIEW OF REAL LINEAR ALGEBRA
117
53. Real Matrices with Complex Eigenvalues.
Now suppose that A is a real square matrix, and the characteristic polynomial P (x) is defined for A, just as before.
By the Fundamental Theorem of algebra, p(x) can be factored in a
unique way (except for order) as
p(x) = (x − λ1 ) · · · (x − λn ),
the product of n linear factors involving the (possibly) complex eigenvalues
λ1 , . . . , λn .
For each distinct eigenvalue there is at least one eigenvector, this time
with potentially complex entries.
Since p(x) is a real polynomial, we have
p(x) = (x − λ1 ) · · · (x − λn ) = p(x) = (x − λ1 ) · · · (x − λn ).
So the complex roots come in conjugate pairs.
Also, if Ax = λx then
Ax = Ax = λx = λ x
so the conjugate of an eigenvector is also an eigenvector, but for the conjugate eigenvalue.
Eigenvalues for real square matrices come in conjugate pairs.
If x is an eigenvector for complex eigenvalue λ
then x is an eigenvector for eigenvalue λ.
In particular, x cannot be a real vector if λ is not real.
We finish off with a calculation that pertains to real vectors that are linear
combinations of conjugate eigenvectors.
Let x = xr + ixi where xr and xi are both real: the real and imaginary
parts of the eigenvector x.
Note that xr 6= kxi for any constant k. That is because if there were such
a k, we would have
x = kxi + ixi
and so
and
x = kxi − ixi
1
1
x
x=
k+i
k−i
so xi would be a nonzero multiple of both eigenvectors and therefore an
eigenvector for two different eigenvalues, an impossibility.
xi =
So xi and xr are both nonzero and an independent pair of real vectors.
Suppose Z is a real vector and a complex linear combination of x and x:
Z = (a + bi)x + (c + di)x = (a + bi)(xr + ixi ) + (c + di)(xr − ixi ).
118
LARRY SUSANKA
Multiplying this out and using the fact that Z is real and also that xi and
xr are independent, it is easy to see that c = a and d = −b and so
Z = 2axr − 2bxi = (a + bi)x + (a − bi)x.
In other words, Z is this real linear combination of real vectors xr and xi .
54. Real Symmetric Matrices.
We now consider the situation where A is a real symmetric matrix.
If x is any complex vector then
xT Ax = xi ai j xj = xi aj i xj = xj aj i xi = xT Ax.
Since the only numbers that are their own conjugates are real, we conclude
that xT Ax is real.
Now suppose that λ is an eigenvalue for A and x is an eigenvector for
that eigenvalue.
xT Ax = xT λx = λxT x = λ kx1 k2 + · · · + kxn k2 .
The numbers on the far left and the far right above are real and nonzero, so
λ is real too.
We conclude that all eigenvalues of a real symmetric matrix are real.
There is a real eigenvector for each real eigenvalue.
Finally, suppose y1 is a real eigenvector for λ1 and y2 is a real eigenvector
for λ2 for real symmetric A. Then
λ1 y1T y2 = (A y1 )T y2 = y1T A y2 = y1T (λ2 y2 ) = λ2 y1T y2 .
Unless λ1 = λ2 we must have y1T y2 = y1 · y2 = 0. So eigenvectors for
different eigenvalues are orthogonal.
But we can go a bit further in this direction.
Suppose y1 is a normalized eigenvector for λ1 and real symmetric A. Let
W = y1⊥ . Let { y2 , . . . , yn } be an ordered orthonormal basis for W and let
T be the orthonormal basis { y1 , . . . , yn } of Rn .
Using the symmetry of A and for i > 1
y1 · Ayi = y1T Ayi = (Ay1 )T yi = λ1 y1T yi = λ1 0 = 0.
That means Aw ∈ W for all w ∈ W . And
λ1 0
−1
P AP =
0 B
for an (n−1)×(n−1) real symmetric block matrix B, and where the matrix
P is orthogonal, whose columns are the basis vectors of T .
OVERVIEW OF REAL LINEAR ALGEBRA
119
Suppose that matrix B is orthogonally diagonalizable. That means there
is an ordered orthonormal basis C = { c1 , . . . , cn−1 } of Rn−1 so that


λ2 0 . . .
0
 0 λ3 . . .
0


Q−1 BQ =  ..
..
.. 
.
.
. 
0
0
...
λn
where the columns of Q are the members of orthonormal basis C.
Using block matrix notation define vectors
0
di+1 =
∈ Rn for i = 1, . . . , n − 1
ci
and let d1 = e1 . So D = { d1 , d2 , . . . , dn } is a basis for Rn . Let R denote
the matrix whose columns are this ordered basis, in order. Then
0
−1 λ1
R
R = R−1 P −1 AP R
0 B


λ1 0 . . .
0
 0 λ2 . . .
0


=  ..
..
..  .
.
.
. 
0
0
...
λn
The matrix P R is the product of two orthogonal matrices so is itself
orthogonal. Its columns h1 , . . . , hn form an ordered orthonormal basis H of
Rn . So


λ1 0 . . .
0
 0 λ2 . . .
0


A = PEn←H  ..
..  PH←En
..
.
. 
.
0
0
...
λn
and H constitutes an orthonormal basis of eigenvectors for A.
These are the elements needed to create an induction argument (on dimension18) to prove the facts which prompted our excursion into the realm
of complex matrices. These are results, in the end, about real matrices and
vectors.
All roots of the characteristic polynomial of a real symmetric matrix
A are real. Eigenvectors for different eigenvalues are orthogonal. There
is a basis of eigenvectors for A. So there is an orthonormal basis S and
orthogonal matrix of transition PS←En so that PS←En APEn←S is diagonal.
18The student is encouraged to complete this argument as a challenging exercise.
120
LARRY SUSANKA
55. Real Skew Symmetric Matrices.
At this point we abandon symmetric matrices and consider a real skew
symmetric matrix A.
55.1. Exercise. Modify the argument from above, where we showed that real
symmetric matrices have real eigenvalues, to show instead that all nonzero
eigenvalues of a real skew symmetric matrix (i.e. AT = −A) are pure complex.
Suppose λi is nonzero and a pure complex eigenvalue for real skew symmetric A, and x = a + bi is an eigenvector for this eigenvalue, where
a, b ∈ Rn .
A(a + bi) = λi(a + bi) = −λb + λai.
So Aa = −λb and Ab = λa. Also
λaT b = (λa)T b = (Ab)T b = bT AT b = −bT Ab = −λbT a = −λaT b.
That means a · b = 0 so a and b are orthogonal.
Similarly, if λ1 i and λ2 i are different nonzero eigenvalues with eigenvectors
y1 = a + bi and y2 = c + di, respectively, then
λ1 aT y2 = λ1 aT c + λ1 aT di.
But also
λ1 aT y2 =(Ab)T y2 = bT AT y2 = −bT Ay2 = −bT λ2 iy2
=λ2 bT d − λ2 bT ci.
By a very similar calculation we have
λ1 bT y2 = λ1 bT c + λ1 bT di.
But, again
λ1 bT y2 = − (Aa)T y2 = −aT AT y2 = aT Ay2 = aT λ2 iy2
= − λ2 aT d + λ2 aT ci.
Equating the four real and complex components shows that
aT c = aT d = bT c = bT d.
So not only are y1 and y2 orthogonal, but the four component vectors
{ a, b, c, d } form a real orthogonal set of vectors.
55.2. Exercise. This next exercise is quite a challenge, but it is handled
in the same way (induction on rank) as the corresponding argument for
symmetric matrices.
Suppose A is a real skew symmetric n × n matrix. Then there is a real
orthogonal matrix of transition PEn←S so that PS←En APEn←S is block diagonal
OVERVIEW OF REAL LINEAR ALGEBRA
121
with
blocks
of two types. First, there are rank(A)/2 blocks of the form
0
λ
for various nonzero real λ. The rest are 1 × 1 zero blocks.
−λ 0
56. Orthonormal Eigenbases and the Spectral Theorem.
If V is a real inner product space with inner product h·, ·i there might be
an orthonormal basis of eigenvectors for a linear transformation f : V → V .
We are interested in exactly when this can happen.
The first thing to do is create a matrix MS←S for f with respect to some
orthonormal basis S.
If there is an orthonormal basis of eigenvectors T then the matrix of transition PS←T is from one orthonormal basis to another, and so is orthogonal:
its transpose is its inverse. The matrix MT←T would be diagonal.
MS←S = PS←T MT←T PT←S .
The product on the right is its own transpose, so it is necessary, for this
orthonormal diagonalization to happen, that MS←S is symmetric.
So let’s consider f with symmetric matrix with respect to orthonormal S
as above.
In our effort to construct an orthonormal eigenbasis then, our first step
is to find a (randomly chosen) basis for each eigenspace and use the GramSchmidt process to find an orthonormal basis for each eigenspace.
The last result of Section 52 implies that these eigenvectors, all together,
will form a basis.
It is a fact that when the matrix of a linear transformation f : V → V
with respect to any one orthonormal basis is symmetric then its matrix
is symmetric with respect to any orthonormal basis and the orthonormal
eigenvectors, found as described above, do form a basis for V , so f is
diagonalizable.
The result is a special case of a theorem called The Spectral Theorem, a beautiful and important result with many generalizations and consequences. In our context, the spectrum of a linear transformation is the
set of its eigenvalues. You will learn more about this theorem, and variants,
in later courses.
This theorem can be regarded as a matrix factorization result, often
referred to as a “decomposition” of the matrix.
When M is a real symmetric matrix there is an orthogonal matrix P
and a diagonal matrix D so that
M = P DP T = P DP −1 .
This is called an eigenvalue decomposition of M .
122
LARRY SUSANKA
All of the mathematics programs such as Maple, Mathematica, MATLAB and so on have efficient means by which these factor matrices can be
calculated—and also for the factorizations mentioned in the next sections
and many other factorizations too. Still, practical cost-benefits of slight efficiency gains of one implementation over another in special cases make the
whole subject very important and active. We won’t deal with these matters here, but only consider the nature of the factorizations these programs
produce.
57. The Schur Decomposition.
Another factorization result for square matrices is the Schur decomposition. Its proof requires only a little tweaking of the work we did to prove
the spectral theorem for real symmetric matrices.
If M is an n × n matrix there is a factorization as
M = QU Q∗ = QU Q−1
where Q is a unitary matrix and U is upper triangular with the eigenvalues of M as diagonal entries.
If M is real and has all real eigenvalues, then Q and U can be chosen
to be real matrices.
Even if there are some complex eigenvalues, if M is real the columns
of Q corresponding to real diagonal entries of U can be chosen to be
real vectors, and U itself can be chosen to have a real column through
each real diagonal entry.
This is called a Schur decomposition of M .
Suppose that any matrix up to size (n − 1) × (n − 1) has been shown to be
“upper triangularizable” by a unitary matrix of transition of the prescribed
type when real matrices and eigenvalues are involved.
Given n × n matrix A, if A is real and has any real eigenvalue λ1 , we pick
that real eigenvalue. If A is not real, or has no real eigenvalues, let λ1 be
any choice of eigenvalue.
Find normalized eigenvector y1 for eigenvalue λ1 . If λ1 and A are real
choose y1 to be real and extend to an orthonormal basis Y = { y1 , . . . , yn }
of Rn . Otherwise, extend to an orthonormal basis for Cn .
Let S be the block matrix
S = ( y1 · · · yn ) .
OVERVIEW OF REAL LINEAR ALGEBRA
123
We find that

λ1
0

S −1 AS =  ..
.
0
m1 2 . . .
m2 2 . . .
..
.
m2 n
...
 
λ1
m1 n
0
m2 n 
 
..  =  ..
.  .
K
M
0
mn n





where K is the last n − 1 entries in the top row: (m1 2 . . . m1 n ). Note that
in case A and λ1 are real so is K. The (n − 1) × (n − 1) block on the lower
right is indicated by M , which will also be real in this event.
By assumption there is (n − 1) × (n − 1) unitary P for which

m2 2 . . .
−1
−1  ..
P MP = P  .
m2 n . . .

m2 n
..  P
. 
mn n
is upper triangular, where P and the upper triangular product are real under
the prescribed conditions.
The n × n block matrix

1
0

R =  ..
.
0 ...
P
0

0




is also unitary so the product Q = SR is too, and the latter will triangularize
the matrix A.




1 0 ...
0
λ1
K
1 0 ... 0
0
 0
 0





−1
 ..
  ..
  ..

.
 .
 .

M
P
0
P
0

λ1
0

= .
 ..
K
−1
P M
0

1
 0

  ..
 .
0 ...
P
0
0

λ1
0

= .
 ..
0
KP
−1
P MP

0









As in the case of the Spectral Theorem, an induction argument finishes
the proof.
124
LARRY SUSANKA
58. Normal Matrices.
We are now in a position to determine exactly which matrices are diagonalizable using unitary matrices of transition.
An n × n matrix is called normal if it commutes with its conjugate
transpose:
M is normal if and only if M ∗ M = M M ∗ .
Unitary, Hermitian, symmetric and skew symmetric matrices are all normal,
but they are not the only normal matrices.
For instance if P is any unitary matrix, P −1 = P ∗ so P P ∗ = P ∗ P = In .
But In + P is not unitary, and there is no reason for it to have symmetry
properties, yet
(In + P )∗ (In + P ) = 2In + P + P ∗ = (In + P )(In + P )∗ .
58.1. Exercise. Suppose M is unitary, Hermitian or normal, and Q is unitary of the same size. Then Q∗ M Q is also, respectively, unitary, Hermitian
or normal.
Suppose M is real symmetric or real skew symmetric, and Q is orthogonal
of the same size. Then Q∗ M Q = QT M Q is also, respectively, symmetric or
skew symmetric.
Now suppose generic square matrix M is diagonalizable using unitary
transition matrix P : that is, the matrix D = P ∗ M P is diagonal. But then
D∗ = P ∗ M ∗ P is also diagonal, and DD∗ = D∗ D. So
M M ∗ = P P ∗ M P P ∗ M ∗ P P ∗ = P (P ∗ M P ) (P ∗ M ∗ P ) P ∗ = P DD∗ P ∗
= P D∗ DP ∗ = P (P ∗ M ∗ P ) (P ∗ M P ) P ∗ = P P ∗ M ∗ P P ∗ M P P ∗
= M ∗ M.
We have shown that any matrix which is diagonalizable by a unitary matrix
must be normal.
On the other hand, suppose M is a normal matrix. Find the Schur Decomposition M = QU Q∗ for M , where U is upper triangular and Q unitary.
Using the normality of M it is easy to see that U is normal too:
U ∗U = U U ∗.
Since U ∗ is lower triangular, the first row first column entry of U ∗ U is
a1 1 a1 1 . On the other hand the first row first column entry of U U ∗ is
a1 1 a1 1 + a1 2 a1 2 + · · · + a1 n a1 n .
That means all but the first entry of the first row of U must be 0
OVERVIEW OF REAL LINEAR ALGEBRA
125
Similarly, the second row second column entry of U ∗ U is a2 1 a2 2 . But the
second row second column entry of U U ∗ is
a2 1 a2 1 + a2 2 a2 2 + · · · + a2 n a2 n .
That means every entry in the second row of U is 0 except for the second
column.
We carry on this way for each row and conclude that U is diagonal and
M is diagonalizable using unitary matrix of transition Q.
We conclude:
A square matrix M is diagonalizable with a unitary matrix
of transition P
P ∗M P = D
P unitary, D diagonal
if and only if M is normal.
If M is a real normal matrix and all the eigenvalues of M are
real then P can be chosen to be real.
59. Real Normal Matrices.
Let’s carry on a bit further with real normal matrices.
We saw in Section 53 that complex eigenvalues of real matrices come
in conjugate pairs and eigenvectors for these pairs can be chosen to be
conjugate vectors as well. Specifically, we saw there that if λ = s + t i
is a complex eigenvalue for real matrix M with eigenvector x then x is also
an eigenvector of M but for eigenvalue λ = s − t i.
We just found that if M is normal it can be diagonalized as P ∗ M P = D
with unitary P , whose columns are eigenvectors for M .
Assume that normal real M actually has a complex eigenvalue with unit
eigenvector x = xr + ixi where xr and xi are real vectors, and that (by
reordering the columns of P and, possibly, multiplying a column by −1) x
and x are the first two columns of P . Make sure that any real eigenvalues
correspond, in P , to real columns (and not a complex scalar multiple of a
real eigenvector.)
The first two elements on the diagonal of D are λ and λ.
In Section 53 we found that xr and xi are independent vectors, but here
more is true. Since P is unitary, the product x∗ x must vanish:
0 = x∗ x = xT x = xr · xr − xi · xi + 2i xr · xi .
This means, first, that xr and xi are orthogonal and, second, that xr · xr
equals xi · xi . Since x has
√unit magnitude, these two orthogonal real vectors
each have magnitude 1/ 2.
Both xr and xi are linear combinations of x and x so these vectors are
orthogonal to all the other columns of P .
126
LARRY SUSANKA
Let Q be the unitary
matrix
√
√ obtained from P by replacing the first two
columns of P by 2 xr and 2 xi . The transpose of these real vectors are
the first two rows of Q∗ . We will refer to these first two rows as Q∗1 and Q∗2
respectively.
The first two rows of P ∗ are xT and xT , and
i Q∗
Q∗
xT = xTr − i xTi = √1 − √ 2
2
2
and
Q∗
i Q∗
xT = xTr + i xTi = √1 + √ 2 .
2
2
As a matrix equation we have P ∗ = K ∗ Q∗ where K ∗ is the block diagonal
unitary matrix
 1

−i
√
√
... 0
2
2


 √1

i
√
. . . 0
 2
∗
2


K =

 ..

..
 .

.
I
0
0
and I indicates the identity matrix of the appropriate size.
But then
D = P ∗ M P = K ∗ Q∗ M QK
The matrix KDK ∗ is
 1
√
√1
...
2
2

 √i
−i
...
 2 √
2


 ..
..
 .
.
0
0
I

0 λ


0  0

.
  ..

0
and so KDK ∗ = Q∗ M Q.
0 ...
λ ...
..
.
0
R

 √1
0  2

0
  √12


  ..
 .
0
−i
√
2
...
0
√i
2
...
..
.
0
I


0




where R is diagonal, containing the other eigenvalues.
Multiplying this out we find that, for λ = s + t i,


s
t ... 0
−t s . . . 0


∗
 = Q∗ M Q
KDK = 
 ..

..
 .

.
0 0

R
where
Q is orthogonal whose first two columns are real vectors
√
2 xi .
√
2 xr and
We can proceed in just this way with the other complex eigenvalue conjugate pairs, permuting eigenvectors for them to the front columns, constructing 2 × 2 real diagonal blocks and real orthonormal basis vectors for each.
After the complex eigenvalues are exhausted, the remaining eigenvalues are
all real, each with real eigenvector. We conclude:
OVERVIEW OF REAL LINEAR ALGEBRA
127
If M is a real normal matrix there is an orthogonal matrix Q so that
B = Q−1 M Q
is block diagonal, with either 2 × 2 blocks or 1 × 1 blocks. Each 2 × 2
block is of the form
s
t
,
−t s
one block for each complex eigenvalue pair λ = s + i t and
√i t.
√ λ=s−
The two columns in Q corresponding to this block are 2 xr and 2 xi
where xr + i xi is a unit eigenvector for λ.
The 1×1 blocks contain real eigenvalues and the corresponding columns
in Q are real eigenvectors for these eigenvalues.
60. The LU Decomposition.
If M is an m × n matrix there is a factorization as
P M = LU
where L is m × m and lower triangular and U is m × n and upper
triangular and P is an m × m permutation matrix.
If M is real, L and U are real too.
This is called an LU decomposition of M .
The calculations needed to produce the decomposition correspond, essentially, to Gaussian elimination. We describe the algorithm for the construction below.
If the first column of M is zero, let H1 = Im and proceed to examine the
second column.
If the first column has nonzero entry, pick i1 so that mi1 1 is a largestmagnitude entry in that column. Use elementary row matrices to clean out
(i.e. reduce to zero) all other nonzero entries of the first column of M . This
is accomplished by left multiplication by elementary row matrices of the first
kind. The product of these is a matrix H1 . If no “clean out” step is required
let H1 = Im .
In both cases H1 M has at most one nonzero entry in the first column. If
it has a nonzero entry, that entry is in row i1 .
Define the set of integers A2 to be { i1 } or, if i1 is not defined, the empty
set. This set will keep track of rows to be ignored in subsequent calculations.
Proceed to examine the second column.
Identify a largest-magnitude entry in the second column of H1 M which
is not in row j for any j ∈ A2 . If there is no such entry let A3 = A2 and
H2 = Im . If there is let A3 = A2 ∪ {i2 } where the specified entry is in
128
LARRY SUSANKA
column i2 and let H2 be the product of the elementary row matrices needed
to clean out any other nonzero entries in that column for all rows except
rows in A2 . Let H2 = Im if no “clean out” step is required.
Let k be the lesser of m or n. We carry on as suggested above for k steps.
At the jth step for j < k we examine the entries in the jth column of
Hj−1 · · · H1 M to see if there are nonzero entries not in a row previously
selected and recorded as a member of Aj . If there is no such entry we let
Aj+1 = Aj and Hj = Im and move on to the next column. If there is such an
entry select largest-magnitude entry in row ij , let Aj+1 = Aj ∪ {ij } and let
Hj be the product of the elementary row matrices needed to clean out any
other nonzero entries in that column for all rows except those rows recorded
as “off-limits” in Aj . Let Hj = Im if no “clean out” is required.
This procedure terminates at the completion of the kth step, and the
matrix
Hk · · · H1 M = V
is an m × n matrix which has k leading zeroes in any row designation not
in Ak . On the other hand, if ij is a row designated in Ak then any row it
for t > j has strictly more leading zeroes than does row ij .
In other words, if the rows of V corresponding to the members of Ak are
placed in a new matrix in order of their selection to Ak , and the other rows
picked however you like (they have k leading zeroes), the re-ordered matrix
will be upper triangular.
Let P be a permutation matrix that re-orders the rows of V so that
P V = U is upper triangular.
Examining the calculations above which created V applied to the rows of
the matrix P M in their permuted locations, we see that at each step the
elementary row operations are applied to place zeroes in columns at locations
beneath the row selected as a largest-magnitude entry in that column.
Performing the analogous row operations to the rows of P M in their new
locations produces
Qk · · · Q1 P M = U
where each Qj is invertible and lower triangular. The inverse and product
−1
of lower triangular matrices is lower triangular so if we let L = Q−1
1 · · · Qk
we have the decomposition
P M = LU
as required.
Implementations that take advantage of the nature of sparse matrices
(those with a high proportion of zero entries) or that reorganize the work to
cut down on roundoff error in the calculations are particularly important.
OVERVIEW OF REAL LINEAR ALGEBRA
129
You will likely see and use this decomposition more than any other in
Engineering and other applied mathematics settings where they use Linear
Algebra.
One advertised use of an LU decomposition is to more quickly solve an
equation of the form
M x = b.
You can create a solution by solving, in order
Ly = P b and then U x = y.
Because of the triangular nature of L and U these can be solved by substitution.
This method would be advantageous over direct Gaussian elimination of
the augmented matrix if M is a fixed matrix (so you just need to find the
decomposition once) but there are many different “target” b-values.
61. The Singular Value Decomposition.
Next under discussion is a very interesting factorization with many uses.
If M is any real m × n matrix there is an orthonormal basis U =
{ u1 , . . . , um } of Rm and an orthonormal basis V = { v1 , . . . , vn } of Rn
and an m × n diagonal matrix Σ so that
M = PEm←U Σ PV ←En .
Further, Σ can be chosen so that its diagonal entries σ1 , σ2 , . . . , σk ,
where k is the least of m or n, satisfy
σi ≥ σi+1 ≥ 0 for all i = 1, . . . , k − 1.
The number t of nonzero entries on the diagonal of Σ is the rank of M .
This is the singular value decomposition, or SVD.
The considerations below in which the bases U and V are found also
produce a slightly different, and somewhat more compact, factorization:
If M is any real m × n matrix of rank t there is an orthonormal
set of vectors { u1 , . . . , ut } in Rm and an orthonormal set of vectors
{ v1 , . . . , vt } in Rn and a t × t diagonal matrix Σ so that
M = P Σ QT
where P is the matrix (u1 · · · ut ) and Q is the matrix (v1 · · · vt ) of
sizes m × t and n × t, respectively.
Σ can be chosen so that its diagonal entries σ1 , σ2 , . . . , σt satisfy
σi ≥ σi+1 > 0 for i = 1, . . . , t − 1.
This is the reduced singular value decomposition, or RSVD.
130
LARRY SUSANKA
The n × n matrix M T M is symmetric so there is an orthogonal n × n
matrix PEn←V with column blocks (v1 v2 · · · vn ) for which
PV ←En M T M PEn←V = PETn←V M T M PEn←V
is diagonal with nonzero entries λ1 , . . . , λt arranged first and in nonincreasing order along the diagonal.
If M T M v = λv for nonzero λ and v then
M T M M T M v = λ2 v
and
M M T (M v) = M λv = λ(M v).
The first equality implies M v 6= 0 and the second then implies that M v is
an eigenvector for M M T for eigenvalue λ.
So each nonzero eigenvalue of M T M is an eigenvalue for M M T .
v1 , . . . , vt are linearly independent eigenvectors for nonzero eigenvalues
λ1 , . . . , λt for M T M and M v1 , . . . , M vt are eigenvectors for these same
eigenvalues for M M T .
P
Pt
Pt
i
i
If ti=1 ai M vi = 0 then M
i=1 a vi ∈ nullspace(M ).
i=1 a vi = 0 so
But the span of v1 , . . . , vt intersects the nullspace of M T M (and therefore
the nullspace of M ) only at the zero vector. The linear independence of the
vi then implies that all ai = 0. So the M vi are independent, even those
associated with the equal eigenvalues.
This means the span of the eigenspaces for nonzero eigenvalues of M M T
has dimension at least t. Switching the positions of M and M T in this
argument leads us to conclude that the span of these eigenspaces for nonzero
eigenvalues of M M T has dimension exactly t.
Since vi · vj = viT vj = 0 when i 6= j
(M vi ) · (M vj ) = viT M T M vj = viT (λj vj ) = λj vi · vj = 0.
So M v1 , . . . , M vt is an orthogonal set of eigenvectors for M M T .
Note that
0 < (M vi ) · (M vi ) = viT M T M vi = viT (λi vi ) = λi vi · vj = λi .
This means that the λi are all positive.
√
Now let σi = λi = kM vi k and ui = σi−1 M vi for i = 1, . . . , t. Then extend this orthonormal set to orthonormal basis U = { u1 , . . . , ut , ut+1 , . . . , um }
spanning all of Rm . These additional ui are eigenvectors for M M T for eigenvalue 0.
For each i = 1, . . . , t we have M vi = σi ui and also M T ui = σi vi . With
reference to these properties, the nonzero σi are called singular values of
M and the vectors ui and vi are called left and right singular vectors,
respectively, for σi and M .
The columnspace of M is Span({u1 , . . . , ut }) while the kernel of M is
trivial if t = n, and otherwise is Span({vt+1 , . . . , vn }).
OVERVIEW OF REAL LINEAR ALGEBRA
131
We now define Σ to be the m × n matrix filled with zeroes except for
entries σi for 1 ≤ i ≤ t arrayed in order down the first t entries of the main
diagonal.
We find that Σ = PU←Em M PEn←V so
M = PEm←U Σ PV ←En = u1 . . .
= σ1 u1 v1T + · · · + σt ut vtT

v1T
 
um Σ  ... 
vnT

which is the sum of t nonzero m×n matrices. There are t(m+n+1) numbers
to record to reproduce M , which could be small in comparison to mn. The
entries of the matrices ui viT are each small, consisting of all products of the
entries of unit vectors. The σi are weights that indicate how important each
combination is in the sum that forms M .
Some filtering, approximation and data compression schemes rely on discarding those terms associated with certain (such as small) σi . Keeping the
terms corresponding to the r largest singular values we have an approximation to M given as
M ≈ σ1 u1 v1T + · · · + σr ur vrT .
The right side can also be interpreted as the matrix “closest” to M of rank
r.
You must record only r(m + n + 1) numbers to reproduce this approximation, which could represent a considerable compression.
62. ✸ Loose Ends and Next Steps.
In your next class19 you could go in several directions. Most likely you
will examine, a bit more closely than we did in Section 52, the effects of
using the field of complex numbers rather than the real numbers. Complex
inner products are a bit different.
Interesting factorization results will be proved, useful in the study of Differential Equations and other areas, such as the Jordan decomposition,
which states:
19Sections marked with the diamond symbol ✸, such as this one, consist of philosophical
remarks, advice and points of view. They can be skipped on first reading, or altogether.
132
LARRY SUSANKA
Any complex matrix M can be factorized as
M = QAQ−1
where Q is a unitary matrix and A is block diagonal, with diagonal
entries of the form


λ 1 0
0 ... 0
0 λ 1
0 . . . 0
. .
.. 
 . . ... ...

.
. .

.
0 0 0 . . . 1 0


0 0 0
0
λ 1
0 0 0
0
0 λ
You will prove the Cayley-Hamilton Theorem, which can be useful
when trying to understand the eigenspace structure of V , and provides an
alternative method to find the eigenspaces among other things.
This theorem states that every square matrix M satisfies its characteristic
polynomial20.
If p(x) = xn +pn−1 xn−1 +· · ·+p1 x+p0 is the characteristic polynomial
for n × n matrix M then
p(M ) = M n + pn−1 M n−1 + · · · + p1 M + p0 In = 0.
This theorem is easy to prove if there is a basis of eigenvectors for M , and
a direct calculation shows it to be true in dimensions two and three. But it
is not so easy to show in general.
Assuming this theorem, suppose p(x) has a factor (x − α) for real α and
p(x) = (x − α)q(x) where q is of degree n − 1 and (x − α) is not a factor of
q(x).
It is quite easy to show (we assume n > 1) that q(M ) is not the zero
matrix, and any nonzero vector in the columnspace of q(M ) must be killed
by M − αIn so it is an eigenvector for eigenvalue α.
Under the condition described above, the columnspace of q(M ) is
exactly the nullspace of M − αIn , the eigenspace for eigenvalue α. Any
calculator can find q(M ) as long as the dimension is not too big.
Unfortunately, if (x − α) is a factor of p(x) more than once and you are
looking for eigenspaces there are a number of possibilities and it is probably
most efficient, from a computational standpoint, to go back to the matrix
20That every square matrix M satisfies some polynomial is easy to show. The list
2
In , M, M 2 , . . . , M n is n2 + 1 vectors in an n2 dimensional space, so the set of these
matrices must be dependent. The Cayley-Hamilton Theorem brings the minimal degree
of such a polynomial down to no more than n.
OVERVIEW OF REAL LINEAR ALGEBRA
133
M − αIn and calculate its nullspace for that eigenvalue directly. In later
classes you will examine what else can be done.
Another thing to notice about the Cayley-Hamilton Theorem is that the
coefficient pn−1 in p(x) = xn + pn−1 xn−1 + · · · + p1 x + p0 is −trace(M ) while
p0 is (−1)n det(M ). We know that det(M ) 6= 0 exactly when M has an
inverse. Cayley-Hamilton gives us a formula for it:
−1
(M n + pn−1 M n−1 + · · · + p1 M )
p0
−1
(M n−1 + pn−1 M n−2 + · · · + p1 In )M.
=
p0
In =
There might be a nonzero polynomial L(x) of lower degree than n for
which L(M ) = 0. The polynomial of least degree with leading coefficient 1
for which this is true is called the minimal polynomial for M . The minimal polynomial is a factor of the characteristic polynomial, and in fact has
all the same irreducible factors as has p(x), but possibly to lower (nonzero)
degree. The minimal polynomial is useful for the same kinds of things that
the characteristic polynomial is used to accomplish.
Factorizations of these polynomials can be used to break Rn into a chain
of subspaces, one inside the other, called invariant subspaces for M . An
invariant subspace for M is a subspace V of Rn for which M x ∈ V whenever
x ∈ V . The invariant subspaces are very useful in understanding M , and in
finding and interpreting matrix factorization results such as those we have
created in earlier sections.
They are essential in understanding simultaneous diagonalization or
simultaneous triangularization results. The basic (easiest) result in this
direction is the following fact:
Any collection of square matrices which commute with each other
can all be brought to triangular form by the same matrix of transition.
If they are all diagonalizable, they can all be brought to diagonal form
simultaneously.
Many other topics are natural extensions to what we have learned.
Tensors are a vital subject in many applied areas. You have seen two
already: inner product and determinant.
Derivatives of functions from Rn to Rm are matrices. You will study them
with the tools learned in linear algebra.
You may well spend more time on infinite dimensional spaces, using them
to study wavelet or Fourier decompositions of a function. The Laplace
and Fourier (and other) transforms are linear maps on infinite dimensional
spaces.
134
LARRY SUSANKA
If you are in Engineering you will almost certainly do a lot of careful work
analyzing how errors propagate through our matrix calculations, and how
to create numerical solutions to differential equations using matrices.
But ... the purpose of these notes is to provide a basis for this later work.
The time has come to call things off for the quarter.
Have fun in your future mathematical endeavors!
OVERVIEW OF REAL LINEAR ALGEBRA
135
136
LARRY SUSANKA
Index
A ∩ B, 10
A ∪ B, 10
A ⊂ B, 9
CoP rojw , 21, 56
CoP rojx (y)
complex vectors, 116
En , 73
GS , 107
M T , 38
MC←A , 76
P rojW (onto any subspace), 112
P rojw (onto a line), 21, 56
P rojx (y)
complex vectors, 116
Ref lW (in any subspace), 112
Ref lw , 22, 56
Span(S), 63
U + W , 62, 102
U ⊕ W , 103
V D , 60
W ⊥ , 109
W1 + W2 + · · · + Wk , 104
W1 ⊕ W2 ⊕ · · · ⊕ Wk , 104
QA (position vector), 28
δi j , 36
det(f ), 85
[·]S , 73
[·]S, E , 75
Cn , 115
Mm×n , 60
N, 9
R, 9
Rv, 104
Rn , 18
Rn∗ , 55
0, 14, 19, 34, 59
P(t), 61
Pn (t), 61
ei , 19
p + N , 99
rref (A), 41
v⊥ , 109
z, 115
~i, ~j, ~k, 19
kuk, 20, 106, 116
kzk (complex number), 115
a ∈ B, 9
colspace(M ), 98
dim(V ), 64
image(f ), 76
ker(f ), 76
nullity(A), 101
nullity(f ), 101
nullspace(M ), 98
rank(A), 101
rank(f ), 101
sgn(σ), 52
angle, 20, 106
approximate
eigenvalues, 95
eigenvector, 96
solution, 113
approximately diagonalize, 98
arrow, 12
arrow-vector, 12
augmented matrix, 43
base
point, 26
position vector, 28
basis, 63
ordered, 63
orthonormal, 111
standard, 19
block, 38
Cauchy-Schwarz inequality, 20, 107
Cayley-Hamilton Theorem, 132
characteristic polynomial, 57, 77, 85
closed operation, 59
columnspace, 98
commutative diagram, 77
complex numbers, 115
conjugate
of z, 115
transpose, 116
consistent, 32
coordinate
change, 79
functions, 55
map, 73
coordinates
general basis, 73
standard basis, 18
cross product, 22
decomposition
eigenvalue, 121
LU, 127
reduced singular value, 129
RSVD, 129
Schur, 122
137
138
LARRY SUSANKA
singular value, 129
SVD, 129
determinant, 50, 52, 85
diagonal
entries, 36
matrices, 39
diagonalizable, 90
diagonalization, 89
diagonalize, 89
approximately, 98
differential equations, 92
dimension, 64
direct sum, 103
of several subspaces, 104
direction, 11
opposite, 13
same, 13
direction cosines, 21
displacement, 11
distance, 20, 106
dot product, 20
dual, 55
dummy index, 33
eigenbasis, 89
eigenfunction, 91
eigenspace, 57, 76
eigenvalue, 57, 76
approximate, 95
decomposition, 121
eigenvector, 57, 76
approximate, 96
element of, 8
elementary
column matrix, 39
operations, 31
row matrix, 37
entries, 18, 34
equivalence relation, 13
Euclidean Space, 21
Fibonacci sequence, 92
force, 11
Fourier coefficients, 111
Fundamental Theorem
of Algebra, 117
of Linear Algebra, 110
Gram-Schmidt orthonormalization process, 111
Hermitian, 116
homogeneous system, 31
identity matrix, 36
image, 76
inconsistent, 31
infinite dimensional, 64
inner product, 106
space, 107
instances, 12
intersection, 10
invariant subspaces, 133
inverse
iteration, 97
matrix, 36
inversion, 56
invertible, 36
isomorphic, 73
Jordan decomposition, 131
kernel, 76, 98
Kronecker delta function, 36
Laplace expansion, 50
leading
coefficient, 40
zeroes, 40
least squares solution, 113
lies
in or along, 29
linear
(in)dependent, 63
combination, 18, 63
equations, 31
functionals, 55
in the first slot, 106
transformation
Rn to Rm , 55
from V to W , 76
linear combination, 14
list, 64
lower triangular, 39
LU decomposition, 127
magnitude, 11
of a complex number, 115
of a vector, 20, 106
main diagonal, 36
matrix, 34
addition, 34
factorization, 121
multiplication, 34
of a linear transformation, 55, 76
of an inner product, 107
of transition, 79
minimal polynomial, 133
OVERVIEW OF REAL LINEAR ALGEBRA
multiplication
scalar, 14
natural numbers, 9
nonsingular, 36
norm
of a complex vector, 116
of a vector, 20, 106
normal
matrix, 124
to a line or plane, 29
nullity of a matrix, 101
nullspace, 98
one-to-one, 98
onto, 98
ordered set, 10
orthogonal, 107
complement, 109
complex vectors, 116
matrices, 39, 114
orthonormal basis, 111
parametric formulas for lines, planes etc.,
30
permutation, 51
matrices, 37
perpendicular, 15, 21
pivot
column, 41
variables, 32
point-normal form, 29
points to, 29
pointwise
addition, 60
scalar multiplication, 60
position
map, 75
vector, 27
positive definite, 106
power iteration, 96
projection, 15, 107
onto a line, 21, 56
onto a plane, 21, 56
onto a subspace, 112
Pythagorean Theorem, 112
rank
of a linear transformation, 101
of a matrix, 101
real numbers, 9
recursive definition, 92
reduced singular value decomposition, 129
reflection
139
in a plane, 22, 56
in a subspace, 112
resultant, 14, 18
rotation
in R2 , 56
in R3 , 57, 112
row
echelon form, 40
reduced echelon form, 41
rowspace, 100
rref, 41
RSVD, 129
scalar
multiplication, 14
scalar multiplication, 18, 34, 59
scalars, 18
Schur decomposition, 122
self-adjoint, 116
sequence, 92
set, 8
signum, 52
similar, 81
simultaneous
diagonalization, 133
triangularization, 133
singular, 36
value decomposition, 129
values, 130
vectors, 130
skew symmetric, 39
solution to a system of linear equations,
31
parametric form, 31
span (verb or noun), 63
Spectral Theorem, 121
spectrum, 121
speed, 11
standard
basis, 19, 73
form of a complex number, 115
stick, 45
submatrix, 38
subset, 9
subspace, 59
Span(S), the span of a set S, 63
W ⊥ , the orthogonal complement of W ,
109
columnspace, 98
eigenspace, 57
image, 76, 98
kernel, 76, 98
nullspace, 98
140
LARRY SUSANKA
solutions of a homogeneous system, 57
sum
of several subspaces, 104
of two vector subspaces, 62, 102
sum of two vectors, 13
SVD, 129
symmetric, 39, 106
system of differential equations, 92
trace, 39, 85
transition matrix, 79
transpose, 38
conjugate, 116
triangle inequality, 20, 107
triangular matrix, 39
triple product, 22
trivial vector space, 60
union, 10
unit vector, 21
unitary matrix, 116
upper triangular, 39
vector, 11
addition, 14, 18, 59
space, 59
velocity, 11
zero
dimensional, 24, 64
matrix, 34
vector, 14, 19, 59
Download