Bulletin of Mathematical Analysis and Applications ISSN: 1821-1291, URL: http://www.bmathaa.org Volume 3 Issue 2(2011), Pages 269-277. STRASSEN’S MATRIX MULTIPLICATION ALGORITHM FOR MATRICES OF ARBITRARY ORDER (COMMUNICATED BY MARTIN HERMANN) IVO HEDTKE Abstract. The well known algorithm of Volker Strassen for matrix multiplication can only be used for (๐2๐ × ๐2๐ ) matrices. For arbitrary (๐ × ๐) matrices one has to add zero rows and columns to the given matrices to use Strassen’s algorithm. Strassen gave a strategy of how to set ๐ and ๐ for arbitrary ๐ to ensure ๐ ≤ ๐2๐ . In this paper we study the number ๐ of additional zero rows and columns and the in๏ฌuence on the number of ๏ฌops used by the algorithm in the worst case (๐ = ๐/16), best case (๐ = 1) and in the average case (๐ ≈ ๐/48). The aim of this work is to give a detailed analysis of the number of additional zero rows and columns and the additional work caused by Strassen’s bad parameters. Strassen used the parameters ๐ and ๐ to show that his matrix multiplication algorithm needs less than 4.7๐log2 7 ๏ฌops. We can show in this paper, that these parameters cause an additional work of approximately 20 % in the worst case in comparison to the optimal strategy for the worst case. This is the main reason for the search for better parameters. 1. Introduction In his paper “Gaussian Elimination is not Optimal ” ([14]) Volker Strassen developed a recursive algorithm (we will call it ๐ฎ) for multiplication of square matrices of order ๐2๐ . The algorithm itself is described below. Further details can be found in [8, p. 31]. Before we start with our analysis of the parameters of Strassen’s algorithm we will have a short look on the history of fast matrix multiplication. The naive algorithm for matrix multiplication is an ๐ช(๐3 ) algorithm. In 1969 Strassen showed that there is an ๐ช(๐2.81 ) algorithm for this problem. Shmuel Winograd optimized Strassen’s algorithm. While the Strassen-Winograd algorithm is a variant that is always implemented (for example in the famous GEMMW package), there are faster ones (in theory) that are impractical to implement. The fastest known algorithm, devised in 1987 by Don Coppersmith and Winograd, runs in 1991 Mathematics Subject Classi๏ฌcation. Primary 65F30; Secondary 68Q17. Key words and phrases. Fast Matrix Multiplication, Strassen Algorithm. c โ2011 Universiteti i Prishtineฬs, Prishtineฬ, Kosoveฬ. The author was supported by the Studienstiftung des Deutschen Volkes. Submitted February 2, 2011. Published March 2, 2011. 269 270 I. HEDTKE ๐ช(๐2.38 ) time. There is also an interesting group-theoretic approach to fast matrix multiplication from Henry Cohn and Christopher Umans, see [4], [5] and [13]. Most researchers believe that an optimal algorithm with ๐ช(๐2 ) runtime exists, since then no further progress was made in ๏ฌnding one. Because modern architectures have complex memory hierarchies and increasing parallelism, performance has become a complex tradeo๏ฌ, not just a simple matter of counting ๏ฌops (in this article one ๏ฌop means one ๏ฌoating-point operation, that means one addition is a ๏ฌop and one multiplication is one ๏ฌop, too). Algorithms which make use of this technology were described by Paolo D’Alberto and Alexandru Nicolau in [1]. An also well known method is Tiling: The normal algorithm can be speeded up by a factor of two by using a six loop implementation that blocks submatrices so that the data passes through the L1 Cache only once. 1.1. The algorithm. Let ๐ด and ๐ต be (๐2๐ ×๐2๐ ) matrices. To compute ๐ถ := ๐ด๐ต let [ ] [ ] ๐ด11 ๐ด12 ๐ต11 ๐ต12 ๐ด= and ๐ต = , ๐ด21 ๐ด22 ๐ต21 ๐ต22 where ๐ด๐๐ and ๐ต๐๐ are matrices of order ๐2๐−1 . With the following auxiliary matrices ๐ป1 := (๐ด11 + ๐ด22 )(๐ต11 + ๐ต22 ) ๐ป2 := (๐ด21 + ๐ด22 )๐ต11 ๐ป3 := ๐ด11 (๐ต12 − ๐ต22 ) ๐ป4 := ๐ด22 (๐ต21 − ๐ต11 ) ๐ป5 := (๐ด11 + ๐ด12 )๐ต22 ๐ป6 := (๐ด21 − ๐ด11 )(๐ต11 + ๐ต12 ) ๐ป7 := (๐ด12 − ๐ด22 )(๐ต21 + ๐ต22 ) we get ๐ถ= [ ๐ป1 + ๐ป4 − ๐ป5 + ๐ป7 ๐ป2 + ๐ป4 ] ๐ป3 + ๐ป5 . ๐ป1 + ๐ป3 − ๐ป2 + ๐ป6 This leads to recursive computation. In the last step of the recursion the products of the (๐ × ๐) matrices are computed with the naive algorithm (straight forward implementation with three for-loops, we will call it ๐ฉ ). 1.2. Properties of the algorithm. The algorithm ๐ฎ needs (see [14]) ๐น๐ฎ (๐, ๐) := 7๐ ๐2 (2๐ + 5) − 4๐ 6๐2 ๏ฌops to compute the product of two square matrices of order ๐2๐ . The naive algorithm ๐ฉ needs ๐3 multiplications and ๐3 −๐2 additions to compute the product of two (๐ × ๐) matrices. Therefore ๐น๐ฉ (๐) = 2๐3 − ๐2 . In the case ๐ = 2๐ the algorithm ๐ฎ is better than ๐ฉ if ๐น๐ฎ (1, ๐) < ๐น๐ฉ (2๐ ), which is the case i๏ฌ ๐ โฉพ 10. But if we use algorithm ๐ฎ only for matrices of order at least 210 = 1024, we get a new problem: ๐ ๐ Lemma 1.1. The algorithm ๐ฎ needs 17 3 (7 − 4 ) units of memory (we write “uom” in short) (number of floats or doubles) to compute the product of two (2๐ × 2๐ ) matrices. Proof. Let ๐ (๐) be the number of uom used by ๐ฎ to compute the product of matrices of order ๐. The matrices ๐ด๐๐ , ๐ต๐๐ and ๐ปโ need 15(๐/2)2 uom. During the computation of the auxiliary matrices ๐ปโ we need 7๐ (๐/2) uom and 2(๐/2)2 uom STRASSEN’S ALGORITHM FOR MATRICES OF ARBITRARY SIZE 271 as input arguments for the recursive calls of ๐ฎ. Therefore we get ๐ (๐) = 7๐ (๐/2)+ ๐ ๐ (17/4)๐2 . Together with ๐ (1) = 0 this yields to ๐ (2๐ ) = 17 โก 3 (7 − 4 ). As an example, if we compute the product of two (210 ×210 ) matrices (represented 10 − 410 ) bytes, i.e. 12.76 gigabytes of as double arrays) with ๐ฎ we need 8 ⋅ 17 3 (7 memory. That is an enormous amount of RAM for such a problem instance. Brice Boyer et al. ([3]) solved this problem with fully in-place schedules of StrassenWinograd’s algorithm (see the following paragraph), if the input matrices can be overwritten. Shmuel Winograd optimized Strassen’s algorithm. The Strassen-Winograd algorithm (described in [11]) needs only 15 additions and subtractions, whereas ๐ฎ needs 18. Winograd had also shown (see [15]), that the minimum number of multiplications required to multiply 2 × 2 matrices is 7. Furthermore, Robert Probert ([12]) showed that 15 additive operations are necessary and su๏ฌcient to multiply two 2 × 2 matrices with 7 multiplications. Because of the bad properties of ๐ฎ with full recursion and large matrices, one can study the idea to use only one step of recursion. If ๐ is even and we use one step of recursion of ๐ฎ (for the remaining products we use ๐ฉ ) the ratio of this operation count to that required by ๐ฉ is (see [9]) 7๐3 + 11๐2 8๐3 − 4๐2 ๐→∞ −−−−→ 7 . 8 Therefore the multiplication of two su๏ฌciently large matrices using ๐ฎ costs approximately 12.5 % less than using ๐ฉ . Using the technique of stopping the recursion in the Strassen-Winograd algorithm early, there are well known implementations, as for example โ on the Cray-2 from David Bailey ([2]), โ GEMMW from Douglas et al. ([7]) and โ a routine in the IBM ESSL library routine ([10]). 1.3. The aim of this work. Strassen’s algorithm can only be used for (๐2๐ × ๐2๐ ) matrices. For arbitrary (๐×๐) matrices one has to add zero rows and columns to the given matrices (see the next section) to use Strassen’s algorithm. Strassen gave a strategy of how to set ๐ and ๐ for arbitrary ๐ to ensure ๐ ≤ ๐2๐ . In this paper we study the number ๐ of additional zero rows and columns and the in๏ฌuence on the number of ๏ฌops used by the algorithm in the worst case, best case and in the average case. It is known ([11]), that these parameters are not optimal. We only study the number ๐ and the additional work caused by the bad parameters of Strassen. We give no better strategy of how to set ๐ and ๐, and we do not analyze other strategies than the one from Strassen. 2. Strassen’s parameter for matrices of arbitrary order Algorithm ๐ฎ uses recursions to multiply matrices of order ๐2๐ . If ๐ = 0 then ๐ฎ coincides with the naive algorithm ๐ฉ . So we will only consider the case where ๐ > 0. To use ๐ฎ for arbitrary (๐ × ๐) matrices ๐ด and ๐ต (that means for arbitrary ˜ which are both (˜ ๐) we have to embed them into matrices ๐ด˜ and ๐ต ๐×๐ ˜ ) matrices ๐ with ๐ ˜ := ๐2 โฉพ ๐. We do this by adding โ := ๐ ˜ − ๐ zero rows and colums to ๐ด 272 I. HEDTKE and ๐ต. This results in ˜= ๐ด˜๐ต [ ๐ด 0โ×๐ 0๐×โ 0โ×โ ][ ๐ต 0โ×๐ ] 0๐×โ ˜ =: ๐ถ, 0โ×โ where 0๐×๐ denotes the (๐ × ๐) zero matrix. If we delete the last โ columns and rows of ๐ถ˜ we get the result ๐ถ = ๐ด๐ต. We now focus on how to ๏ฌnd ๐ and ๐ for arbitrary ๐ with ๐ โฉฝ ๐2๐ . An optimal but purely theoretical choice is (๐∗ , ๐ ∗ ) = arg min{๐น๐ฎ (๐, ๐) : (๐, ๐) ∈ โ × โ0 , ๐ โฉฝ ๐2๐ }. Further methods of ๏ฌnding ๐ and ๐ can be found in [11]. We choose another way. According to Strassen’s proof of the main result of [14], we de๏ฌne ๐ := ⌊log2 ๐⌋ − 4 and ๐ := ⌊๐2−๐ ⌋ + 1, (2.1) where ⌊๐ฅ⌋ denotes the largest integer not greater than ๐ฅ. We de๏ฌne ๐ ˜ := ๐2๐ and study the relationship between ๐ and ๐ ˜ . The results are: โ worst case: ๐ ˜ โฉฝ (17/16)๐, โ best case: ๐ ˜ โฉพ ๐ + 1 and โ average case: ๐ ˜ ≈ (49/48)๐. 2.1. Worst case analysis. Theorem 2.1. Let ๐ ∈ โ with ๐ โฉพ 16. For the parameters (2.1) and ๐2๐ = ๐ ˜ we have 17 ๐ ˜≤ ๐. 16 If ๐ is a power of two, we have ๐ ˜= 17 16 ๐. Proof. For ๏ฌxed ๐ there is exactly one ๐ผ ∈ โ with 2๐ผ ≤ ๐ < 2๐ผ+1 . We de๏ฌne ๐ผ ๐ผ := {2๐ผ , . . . , 2๐ผ+1 − 1}. Because of (2.1) for each ๐ ∈ ๐ผ ๐ผ the value of ๐ is ๐ = ⌊log2 ๐⌋ − 4 = log2 2๐ผ − 4 = ๐ผ − 4. Possible values for ๐ are ⌋ ⌊ ⌋ ⌊ 1 16 ๐ = ๐ ๐ผ−4 + 1 = ๐ ๐ผ + 1 =: ๐(๐). 2 2 ๐(๐) is increasing in ๐ and ๐(2๐ผ ) = 17 and ๐(2๐ผ+1 ) = 33. Therefore we have ๐ ∈ {17, . . . , 32}. For each ๐ ∈ ๐ผ ๐ผ one of the following inequalities holds: (๐ผ1๐ผ ) (๐ผ2๐ผ ) 2๐ผ = 16 ⋅ 2๐ผ−4 17 ⋅ 2๐ผ−4 ≤ ≤ ๐ < ๐ < .. . 17 ⋅ 2๐ผ−4 18 ⋅ 2๐ผ−4 ๐ผ (๐ผ16 ) 31 ⋅ 2๐ผ−4 ≤ ๐ < 32 ⋅ 2๐ผ−4 = 2๐ผ+1 . โ16 Note that ๐ผ ๐ผ = ๐=1 ๐ผ๐๐ผ . It follows, that for all ๐ ∈ โ there exists exactly one ๐ผ with ๐ ∈ ๐ผ ๐ผ and for all ๐ ∈ ๐ผ ๐ผ there is exactly one ๐ with ๐ ∈ ๐ผ๐๐ผ . Note that for all ๐ ∈ ๐ผ๐๐ผ we have ๐ = ๐ผ − 4 and ๐(๐) = ๐ + 16. If we only focus on ๐ผ๐๐ผ the di๏ฌerence ๐ ˜ − ๐ has its maximum at the lower end of ๐ผ๐๐ผ (˜ ๐ is constant STRASSEN’S ALGORITHM FOR MATRICES OF ARBITRARY SIZE 273 ๐น๐ฎ (2๐ , 10 − ๐) 2 × 109 1 × 109 ๐ 0 1 2 3 4 5 6 7 8 9 10 Figure 1. Di๏ฌerent parameters (๐ = 2๐ , ๐ = 10 − ๐, ๐ = ๐2๐ ) to apply in the Strassen algorithm for matrices of order 210 . and ๐ has its minimum at the lower end of ๐ผ๐๐ผ ). On ๐ผ๐๐ผ the value of ๐ ˜ and the minimum of ๐ are ๐ผ−4 ๐ ˜๐ผ ๐ := (16 + ๐) ⋅ 2 and ๐ผ−4 ๐๐ผ . ๐ := (15 + ๐) ⋅ 2 ๐ผ Therefore the di๏ฌerence ๐๐ผ ˜๐ผ ๐ := ๐ ๐ − ๐๐ is constant: ๐ผ−4 ๐๐ผ − (15 + ๐) ⋅ 2๐ผ−4 = 2๐ผ−4 ๐ = (16 + ๐) ⋅ 2 for all ๐. To set this in relation with ๐ we study ๐๐๐ผ := ๐ผ ๐ ˜๐ผ ๐๐ผ ๐๐ผ 2๐ผ−4 ๐ ๐ + ๐๐ ๐ = = 1 + = 1 + . ๐๐ผ ๐๐ผ ๐๐ผ ๐๐ผ ๐ ๐ ๐ ๐ ๐ผ ๐ผ−4 Finally ๐๐๐ผ is maximal, i๏ฌ ๐๐ผ = 2๐ผ . ๐ is minimal, which is the case for ๐1 = 16 ⋅ 2 โก With ๐1๐ผ = 17/16 we get ๐ ˜ ≤ 17 16 ๐, which completes the proof. Now we want to use the result above to take a look at the number of ๏ฌops we need for ๐ฎ in the worst case. The worst case is ๐ = 2๐ for any 4 ≤ ๐ ∈ โ. An optimal decomposition (in the sense of minimizing the number (˜ ๐ − ๐) of zero rows and columns we add to the given matrices) is ๐ = 2๐ and ๐ = ๐ − ๐, because ๐2๐ = 2๐ 2๐−๐ = 2๐ = ๐. Note that these parameters ๐ and ๐ have nothing to do with equation (2.1). Lets have a look on the in๏ฌuence of ๐: Lemma 2.2. Let ๐ = 2๐ . In the decomposition ๐ = ๐2๐ we use ๐ = 2๐ and ๐ = ๐ − ๐. Then ๐ (๐) := ๐น๐ฎ (2๐ , ๐ − ๐) has its minimum at ๐ = 3. Proof. We have ๐ (๐) = 2 ⋅ 7๐ (8/7)๐ + 5 ⋅ 7๐ (4/7)๐ − 4๐ 6. Thus ๐ (๐ + 1) − ๐ (๐) = [2 ⋅ 7๐ (8/7)๐+1 + 5 ⋅ 7๐ (4/7)๐+1 − 4๐ 6] − [2 ⋅ 7๐ (8/7)๐ + 5 ⋅ 7๐ (4/7)๐ − 4๐ 6] = 2 ⋅ 7๐ (8/7)๐ (8/7 − 1) + 5 ⋅ 7๐ (4/7)๐ (4/7 − 1) = 2 ⋅ 7๐ (8/7)๐ ⋅ 1/7 − 15 ⋅ 7๐ (4/7)๐ ⋅ 1/7 = 2(4/7)๐ 7๐−1 (2๐ − 7.5). Therefore, ๐ (๐) is a minimum if ๐ = min{๐ : 2๐ − 7.5 > 0} = 3. โก 274 I. HEDTKE ๐1 (๐) ๐2 (๐) 1.21 1.008 1.20 1.007 1.19 1.006 ๐ ๐ 4 6 8 10 12 14 16 18 20 4 6 8 10 12 14 16 18 20 Figure 2. Comparison of Strassen’s parameters (๐1 , ๐ = 17, ๐ = ๐ − 4), obviously better parameters (๐2 , ๐ = 16, ๐ = ๐ − 4) and the optimal parameters (lemma, ๐ = 8, ๐ = ๐ − 3) for the worst case ๐ = 2๐ . Limits of ๐๐ are dashed. Figure 1 shows that it is not optimal to use ๐ฎ with full recursion in the example ๐ = 10. Now we study the worst case ๐ = 2๐ and di๏ฌerent sets of parameters ๐ and ๐ for ๐น๐ฎ (๐, ๐): (1) If we use equation (2.1), we get the original parameters of Strassen: ๐ = ๐ − 4 and ๐ = 17. Therefore we de๏ฌne ๐น1 (๐) := ๐น๐ฎ (17, ๐ − 4) = 7๐ 2 39 ⋅ 172 ๐ 6 ⋅ 17 − 4 . 74 44 (2) Obviously ๐ = 16 would be a better choice, because with this we get ๐2๐ = ๐ (we avoid the additional zero rows and columns). Now we de๏ฌne ๐น2 (๐) := ๐น๐ฎ (16, ๐ − 4) = 7๐ 37 ⋅ 162 − 4๐ 6. 74 (3) Finally we use the lemma above. With ๐ = 8 = 23 and ๐ = ๐ − 3 we get ๐น3 (๐) := ๐น๐ฎ (8, ๐ − 3) = 7๐ 3 ⋅ 64 − 4๐ 6. 49 Now we analyze ๐น๐ relative to each other. Therefore we de๏ฌne ๐๐ := ๐น๐ /๐น3 (๐ = 1, 2). So we have ๐๐ : {4, 5, . . .} → โ, which is monotonously decreasing in ๐. With ๐1 (๐) = 289(4๐ ⋅ 2401 − 7๐ ⋅ 1664) 12544(4๐ ⋅ 49 − 7๐ ⋅ 32) and ๐2 (๐) = 4๐ ⋅ 7203 − 7๐ ⋅ 4736 147(4๐ ⋅ 49 − 7๐ ⋅ 32) we get ๐1 (4) = 3179/2624 ≈ 1.2115 ๐2 (4) = 124/123 ≈ 1.00813 lim ๐1 (๐) = 3757/3136 ≈ 1.1980 ๐→∞ lim ๐2 (๐) = 148/147 ≈ 1.00680. ๐→∞ Figure 2 shows the graphs of the functions ๐๐ . In conclusion, in the worst case the parameters of Strassen need approx. 20 % more ๏ฌops than the optimal parameters of the lemma. STRASSEN’S ALGORITHM FOR MATRICES OF ARBITRARY SIZE 275 2.2. Best case analysis. Theorem 2.3. Let ๐ ∈ โ with ๐ โฉพ 16. For the parameters (2.1) and ๐2๐ = ๐ ˜ we have ๐+1≤๐ ˜. ๐ If ๐ = 2 โ − 1, โ ∈ {16, . . . , 31}, we have ๐ ˜ = ๐ + 1. Proof. Like in Theorem 2.1 we have for each ๐ผ๐๐ผ a constant value for ๐ ˜๐ผ ๐ namely ๐ผ−4 2 (16 + ๐). Therefore ๐ < ๐ ˜ holds. The di๏ฌerence ๐ ˜ − ๐ has its minimum at the upper end of ๐ผ๐๐ผ . There we have ๐ ˜ − ๐ = 2๐ผ−4 (16 + ๐) − (2๐ผ−4 (16 + ๐) − 1) = 1. This shows ๐ + 1 ≤ ๐ ˜. โก Let us focus on the ๏ฌops we need for ๐ฎ, again. Lets have a look at the example ๐ = 2๐ − 1. The original parameters (see equation (2.1)) for ๐ฎ are ๐ = ๐ − 5 and ๐ = 32. Accordingly we de๏ฌne ๐น (๐) := ๐น๐ฎ (32, ๐ − 5). Because 2๐ − 1 ≤ 2๐ we can add 1 zero row and column and use the lemma from the worst case. Now we get the parameters ๐ = 8 and ๐ = ๐ − 3 and de๏ฌne ๐น˜ (๐) := ๐น๐ฎ (8, ๐ − 3). To analyze ๐น and ๐น˜ relative to each other we have ๐น (๐) 4๐ ⋅ 16807 − 7๐ ⋅ 11776 ๐(๐) := = ๐ . 4 ⋅ 16807 − 7๐ ⋅ 10976 ๐น˜ (๐) Note that ๐ : {5, 6, . . .} → โ is monotonously decreasing in ๐ and has its maximum at ๐ = 5. We get ๐(5) = 336/311 ≈ 1.08039 lim ๐(๐) = 11776/10976 ≈ 1.07289. ๐→∞ Therefore we can say: In the best case the parameters of Strassen are approx. 8 % worse than the optimal parameters from the lemma in the worst case. 2.3. Average case analysis. With ๐ผ˜ ๐ we denote the expected value of ๐ ˜ . We search for a relationship like ๐ ˜ ≈ ๐พ๐ for ๐พ ∈ โ. That means ๐ผ[˜ ๐/๐] = ๐พ. Theorem 2.4. For the parameters (2.1) of Strassen ๐2๐ = ๐ ˜ we have 49 ๐. ๐ผ˜ ๐= 48 ๐ผ Proof. First we focus only on ๐ผ ๐ผ . We write ๐ผ๐ผ := ๐ผโฃ๐ผ ๐ผ and ๐ผ๐ผ ๐ := ๐ผโฃ๐ผ๐ for the ๐ผ ๐ผ expected value on ๐ผ and ๐ผ๐ , resp. We have 16 ๐ผ๐ผ ๐ ˜= 16 16 1 ∑ ๐ผ 1 ∑ ๐ผ 1 ∑ ๐ผ๐ ๐ ˜= ๐ ˜๐ = (๐ + 16)2๐ผ−4 = 2๐ผ−5 ⋅ 49. 16 ๐=1 16 ๐=1 16 ๐=1 Together with ๐ผ๐ผ ๐ = 12 [(2๐ผ+1 − 1) + 2๐ผ ] = 2๐ผ + 2๐ผ−1 − 1/2, we get [ ] ๐ ˜ ๐ผ๐ผ ๐ ˜ 2๐ผ−5 ⋅ 49 ๐(๐ผ) := ๐ผ๐ผ = ๐ผ = ๐ผ . ๐ ๐ผ ๐ 2 + 2๐ผ−1 − 1/2 โ๐ Now we want to calculate ๐ผ๐ := ๐ผโฃ๐ (๐) [˜ ๐/๐], where ๐ (๐) := ๐=0 ๐ผ 4+๐ by using the values ๐(๐). Because of โฃ๐ผ 5 โฃ = 2โฃ๐ผ 4 โฃ and โฃ๐ผ 4 ∪๐ผ 5 โฃ = 3โฃ๐ผ 4 โฃ we have ๐ผ1 = 13 ๐(4)+ 23 ๐(5). With the same argument we get ๐ผ๐ = 4+๐ ∑ ๐=4 ๐ฝ๐ ๐(๐) where ๐ฝ๐ = 2๐−4 . −1 2๐+1 276 I. HEDTKE Finally we have ( 4+๐ ) [ ] ∑ ๐ ˜ ๐ผ = lim ๐ผ๐ = lim ๐ฝ๐ ๐(๐) ๐→∞ ๐→∞ ๐ ๐=4 ) ( 4+๐ ∑ 49 22๐−9 49 = , = lim ๐→∞ 2๐+1 − 1 ๐=4 2๐ + 2๐−1 − 1/2 48 what we intended to show. โก Compared to the worst case (˜ ๐≤ 1 . 1 + 1/48 = 1 + 13 ⋅ 16 17 16 ๐, 17/16 = 1 + 1/16), note that 49/48 = 3. Conclusion Strassen used the parameters ๐ and ๐ in the form (2.1) to show that his matrix multiplication algorithm needs less than 4.7๐log2 7 ๏ฌops. We could show in this paper, that these parameters cause an additional work of approx. 20 % in the worst case in comparison to the optimal strategy for the worst case. This is the main reason for the search for better parameters, like in [11]. References [1] P. D’Alberto and A. Nicolau, Adaptive Winograd’s Matrix Multiplications, ACM Trans. Math. Softw. 36, 1, Article 3 (March 2009). [2] D. H. Bailey, Extra High Speed Matrix Multiplication on the Cray-2, SIAM J. Sci. Stat. Comput., Vol. 9, Co. 3, 603–607, 1988. [3] B. Boyer, J.-G. Dumas, C. Pernet, and W. Zhou, Memory e๏ฌcient scheduling of StrassenWinograd’s matrix multiplication algorithm, International Symposium on Symbolic and Algebraic Computation 2009, Seฬoul. [4] H. Cohn and C. Umans, A Group-theoretic Approach to Fast Matrix Multiplication, Proceedings of the 44th Annual Symposium on Foundations of Computer Science, 11-14 October 2003, Cambridge, MA, IEEE Computer Society, pp. 438-449. [5] H. Cohn, R. Kleinberg, B. Szegedy and C. Umans, Group-theoretic Algorithms for Matrix Multiplication, Proceedings of the 46th Annual Symposium on Foundations of Computer Science, 23-25 October 2005, Pittsburgh, PA, IEEE Computer Society, pp. 379-388. [6] D. Coppersmith and S. Winograd, Matrix Multiplication via Arithmetic Progressions, STOC ’87: Proceedings of the nineteenth annual ACM symposium on Theory of Computing, 1987. [7] C. C. Douglas, M. Heroux, G. Slishman and R. M. Smith, GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen’s Matrix–Matrix Multiply Algorithm, J. of Comp. Physics, 110:1–10, 1994. [8] G. H. Golub and C. F. Van Loan, Matrix Computations, Third ed., The Johns Hopkins University Press, Baltimore, MD, 1996. [9] S. Huss-Lederman, E. M. Jacobson, J. R. Johnson, A. Tsao, and T. Turnbull, Implementation of Strassen’s Algorithm for Matrix Multiplication, 0-89791-854-1, 1996 IEEE [10] IBM Engineering and Scienti๏ฌc Subroutine Library Guide and Reference, 1992. Order No. SC23-0526. [11] P. C. Fischer and R. L. Probert, E๏ฌcient Procedures for using Matrix Algorithms, Automata, Languages and Programming – 2nd Colloquium, University of Saarbruฬcken, Lecture Notes in Computer Science, 1974. [12] R. L. Probert, On the additive complexity of matrix multiplication, SIAM J. Compu, 5:187– 203, 1976. [13] S. Robinson, Toward an Optimal Algorithm for Matrix Multiplication, SIAM News, Volume 38, Number 9, November 2005. [14] V. Strassen, Gaussian Elimination is not Optimal, Numer. Math., 13:354–356, 1969. [15] S. Winograd, On Multiplication of 2 × 2 Matrices, Linear Algebra and its Applications, 4:381-388, 1971. STRASSEN’S ALGORITHM FOR MATRICES OF ARBITRARY SIZE Mathematical Institute, University of Jena, D-07737 Jena, Germany E-mail address: Ivo.Hedtke@uni-jena.de 277