Why Is Matrix Multiplication Associative?

advertisement
Why Is Matrix Multiplication Associative?
1
Introduction.
Although it is not at all obvious that the strange multiplication that has been introduced for matrices should have any nice properties at all, nevertheless the single most important property of this
multiplication is associativity.
This handout discusses two proofs of this fact.
• You will see a bare-bones outline1 of a structural proof, which indicates a reason why one might
expect this multiplication to be associative.
• You will then see the details2 of a down-to-earth computational proof. This proof gives no clue
why the mulitplication should be associative, but it leaves no doubt that it is associative.
2
The Structural Proof (Outline).
The following all turn out to be true statements.
(a): Functions of a certain type (never mind what type3 ) can be represented by matrices, in such a
way that every matrix represents a function:
function f
function f
is represented by
−→−→−→
represents
←−←−←−
matrix Mf .
matrix Mf .
(b): The composition of two functions of this type is another function of this type (and so can also be
represented by a matrix).
(c): The matrix that represents the composition f ◦ g of two of functions of this type is the product of
the matrices that represent f and g respectively:
M(f ◦g) = Mf Mg .
Now,
is clearly associative; that is, if X, Y , Z and W are any nonempty sets, and

 function composition
 h: X → Y 
if g: Y → Z , are any functions, then


f: Z → W
(f ◦ g) ◦ h = f ◦ (g ◦ h),
1
The details would take us far past your current knowledge of linear algebra.
Some of the details are left as exercises.
3
Never mind for now , that is!
2
1
because, for any x ∈ X,
(f ◦ g) ◦ h) (x) = f g h(x)
and
f ◦ (g ◦ h) (x) = f g h(x)
.
Therefore: if Mf , Mg and Mh respectively represent functions f , g, and h, then we have
Mf Mg Mh
= Mf Mg◦h
= Mf ◦(g◦h)
= M(f ◦g)◦h
= Mf ◦g Mh
=
3
Mf Mg Mh .
The Computational Proof.
We must first establish that for any choice (A, B, C) of matrices:4
• either both of


 A(BC) 

• or neither of
and
are defined,

(AB)C


 A(BC) 

and
is defined.

(AB)C
Exercise 1
For any choice (A, B, C) of matrices, let Condition (∗) (which may be true or may be false)
be the following statement.
Condition (∗) :
There exist positive integers m, n, p and q for
which
A is m × n, B is n × p,
and C is p × q.
[a]: Show that A(BC) is defined if and only if (A, B, C) satisfies Condition (∗).
[b]: Show that (AB)C is defined if and only if (A, B, C) satisfies Condition (∗).
[c]: Show that if (A, B, C) satisfies Condition (∗), then both A(BC) and (AB)C are m × q
in shape.
4
I put them inside parentheses because order matters: (A, B, C) is an element of a Cartesian product.
2
For the remainder of the handout we can restrict attention to choices (A, B, C) that satisfy Condition (∗)
(so that both products are defined). Let us put

a11
 ..
A :=  .
am1
···
b11
a1n
.. 
 ..
···
.  , B :=  .
· · · amn
bn1
···


b1p
c11
..  , and C :=  ..
 .
. 


···
· · · bnp
cp1
···
c1q
..  .
. 

···
· · · cpq
The proof that (AB)C = A(BC) will proceed entry-by-entry. We will first find the formula for a
typical element of (AB)C. Before we begin, observe that for any 1 ≤ i ≤ m and 1 ≤ k ≤ p,5
(AB)ik =
n
X
aij bjk .
(1)
j=1
Lemma 1 For any 1 ≤ i ≤ m and 1 ≤ ` ≤ q,
(AB)C
p X
n
X
=
i`
aij bjk ck` .
(2)
k=1 j=1
Proof. By the definition of matrix multiplication,
(AB)C
i`
p
X
=
(AB)ik ck`
k=1
p
X
(by formula (1) −→) =
(multiply out −→) =


n
X
k=1 j=1
p X
n
X

aij bjk  ck`
aij bjk ck` .
k=1 j=1
Exercise 2 Imitate the proof of Lemma 1 to show that
A(BC)
i`
=
p X
n
X
aij bjk ck` .
(3)
k=1 j=1
Theorem 2 Matrix multiplication is associative.
Proof. Let (A, B, C) be any choice of matrices that satisfy Condition (∗) (so that both products are
well-defined m × q matrices). For any 1 ≤ i ≤ m and 1 ≤ ` ≤ q, by Equations (2) and (3),
(AB)C
i`
=
therefore,
(AB)C = A(BC).
5
Recall that (D)ij is the entry in row i and column j of matrix D.
3
A(BC)
i`
;
Download