Course Project On :
Strassen's Matrix Multiplication
Under The Guidance Of:
Prof.
Subodh Kumar
Presented By:
Gaurav Jain
Lalchand
Basic Matrix Multiplication
Suppose we want to multiply two matrices of size N x N : for example A x B = C .
C
11
= a
11 b
11
+ a
12 b
21
C
12
= a
11 b
12
+ a
12 b
22
C
21
= a
21 b
11
+ a
22 b
21
C
22
= a
21 b
12
+ a
22 b
22
2x2 matrix multiplication can be accomplished in 8 multiplication.
(2 log
2
8 =2 3 )
Strassens’s Matrix Multiplication
Strassens’s Matrix Multiplication
P
5
P
6
P
7
P
1
P
2
P
3
P
4
= (A
11
+ A
22
)(B
11
+B
22
)
= (A
21
= A
11
+ A
22
* (B
12
) * B
11
- B
22
)
= A
22
* (B
21
- B
11
)
= (A
11
= (A
21
= (A
12
+ A
12
- A
11
- A
22
) * B
22
) * (B
11
) * (B
21
+ B
+ B
12
)
22
)
Strassens’s Matrix Multiplication
P
5
P
6
P
7
P
1
P
2
P
3
P
4
= (A
11
+ A
22
)(B
11
+B
22
)
= (A
21
= A
11
+ A
22
* (B
12
) * B
11
- B
22
)
= A
22
* (B
21
- B
11
)
= (A
11
= (A
21
= (A
12
+ A
12
- A
11
- A
22
) * B
22
) * (B
11
) * (B
21
+ B
+ B
12
)
22
)
C
11
C
12
C
21
C
22
= P
1
= P
3
= P
2
= P
1
+ P
4
+ P
5
+ P
4
+ P
3
- P
5
- P
2
+ P
7
+ P
6
Strassens’s Matrix Multiplication
Ref : Accelerating High Performance Applications with CUDA and MPI
Why MPI + CUDA ?..
➢
Equations naturally suitable for CUDA environment
➢
Incapability of CUDA : No inter GPU communication.
➢
MPI : Data distributing mechanism
➢
CUDA : Main Execution Engine
MPI + CUDA
Steps Performed
➢
➢
➢
Divide the input matrix into four equal parts
Send the appropiate part to the corresponding process
Each process compute the corresponding equation
Node Contains GPU
Use kernels on their own GPU to compute result
Steps Performed
➢
➢
➢
➢
Divide the input matrix into four equal parts
Send the appropiate part to the corresponding process
Each process compute the corresponding equation
➢
Process will send their result to the head process of equation
➢
➢
➢
All Heads collect data
Head will compute C's equation
All head send their partial result to master node
Master will combine & display the result
Detailed Description – Step 1
P
1
P
5
= (A
11
+ A
22
)(B
11
+B
22
)
= (A
11
+ A
12
) * B
22
P
6
P
2
= (A
21
= (A
21
- A
+ A
11
22
) * B
) * (B
11
11
+ B
12
)
P
7
P
3
= A
11
= (A
12
- A
* (B
22
12
- B
) * (B
21
22
)
+ B
22
)
P
4
= A
22
* (B
21
- B
11
)
Detailed Description – Step 2
P
1
, P
5
P
2
, P
6
P
3
, P
7
P
4
Detailed Description – Step 3
P
1
, P
5
P
2
, P
6
P3 , P7
P
4
Declare
Result
Experimental Result - 1
Experimental Result - 2
Experimental Result - 3
Accelerating High Performance Applications with CUDA and MPI :
N. P . Karunadasa & D. N. Ranasinghe
Strassen’s Matrix Multiplication on GPUs : Junjie Li , Sanjay Ranka
Thanks