Why Is Matrix Multiplication Associative? 1 Introduction. Although it is not at all obvious that the strange multiplication that has been introduced for matrices should have any nice properties at all, nevertheless the single most important property of this multiplication is associativity. This handout discusses two proofs of this fact. • You will see a bare-bones outline1 of a structural proof, which indicates a reason why one might expect this multiplication to be associative. • You will then see the details2 of a down-to-earth computational proof. This proof gives no clue why the mulitplication should be associative, but it leaves no doubt that it is associative. 2 The Structural Proof (Outline). The following all turn out to be true statements. (a): Functions of a certain type (never mind what type3 ) can be represented by matrices, in such a way that every matrix represents a function: function f function f is represented by −→−→−→ represents ←−←−←− matrix Mf . matrix Mf . (b): The composition of two functions of this type is another function of this type (and so can also be represented by a matrix). (c): The matrix that represents the composition f ◦ g of two of functions of this type is the product of the matrices that represent f and g respectively: M(f ◦g) = Mf Mg . Now, is clearly associative; that is, if X, Y , Z and W are any nonempty sets, and function composition h: X → Y if g: Y → Z , are any functions, then f: Z → W (f ◦ g) ◦ h = f ◦ (g ◦ h), 1 The details would take us far past your current knowledge of linear algebra. Some of the details are left as exercises. 3 Never mind for now , that is! 2 1 because, for any x ∈ X, (f ◦ g) ◦ h) (x) = f g h(x) and f ◦ (g ◦ h) (x) = f g h(x) . Therefore: if Mf , Mg and Mh respectively represent functions f , g, and h, then we have Mf Mg Mh = Mf Mg◦h = Mf ◦(g◦h) = M(f ◦g)◦h = Mf ◦g Mh = 3 Mf Mg Mh . The Computational Proof. We must first establish that for any choice (A, B, C) of matrices:4 • either both of A(BC) • or neither of and are defined, (AB)C A(BC) and is defined. (AB)C Exercise 1 For any choice (A, B, C) of matrices, let Condition (∗) (which may be true or may be false) be the following statement. Condition (∗) : There exist positive integers m, n, p and q for which A is m × n, B is n × p, and C is p × q. [a]: Show that A(BC) is defined if and only if (A, B, C) satisfies Condition (∗). [b]: Show that (AB)C is defined if and only if (A, B, C) satisfies Condition (∗). [c]: Show that if (A, B, C) satisfies Condition (∗), then both A(BC) and (AB)C are m × q in shape. 4 I put them inside parentheses because order matters: (A, B, C) is an element of a Cartesian product. 2 For the remainder of the handout we can restrict attention to choices (A, B, C) that satisfy Condition (∗) (so that both products are defined). Let us put a11 .. A := . am1 ··· b11 a1n .. .. ··· . , B := . · · · amn bn1 ··· b1p c11 .. , and C := .. . . ··· · · · bnp cp1 ··· c1q .. . . ··· · · · cpq The proof that (AB)C = A(BC) will proceed entry-by-entry. We will first find the formula for a typical element of (AB)C. Before we begin, observe that for any 1 ≤ i ≤ m and 1 ≤ k ≤ p,5 (AB)ik = n X aij bjk . (1) j=1 Lemma 1 For any 1 ≤ i ≤ m and 1 ≤ ` ≤ q, (AB)C p X n X = i` aij bjk ck` . (2) k=1 j=1 Proof. By the definition of matrix multiplication, (AB)C i` p X = (AB)ik ck` k=1 p X (by formula (1) −→) = (multiply out −→) = n X k=1 j=1 p X n X aij bjk ck` aij bjk ck` . k=1 j=1 Exercise 2 Imitate the proof of Lemma 1 to show that A(BC) i` = p X n X aij bjk ck` . (3) k=1 j=1 Theorem 2 Matrix multiplication is associative. Proof. Let (A, B, C) be any choice of matrices that satisfy Condition (∗) (so that both products are well-defined m × q matrices). For any 1 ≤ i ≤ m and 1 ≤ ` ≤ q, by Equations (2) and (3), (AB)C i` = therefore, (AB)C = A(BC). 5 Recall that (D)ij is the entry in row i and column j of matrix D. 3 A(BC) i` ;