Strassen`s Matrix Multiplication - Computer Science and Engineering

advertisement

Course Project On :

Strassen's Matrix Multiplication

Under The Guidance Of:

Prof.

Subodh Kumar

Presented By:

Gaurav Jain

Lalchand

Basic Matrix Multiplication

Suppose we want to multiply two matrices of size N x N : for example A x B = C .

C

11

= a

11 b

11

+ a

12 b

21

C

12

= a

11 b

12

+ a

12 b

22

C

21

= a

21 b

11

+ a

22 b

21

C

22

= a

21 b

12

+ a

22 b

22

2x2 matrix multiplication can be accomplished in 8 multiplication.

(2 log

2

8 =2 3 )

Strassens’s Matrix Multiplication

Strassens’s Matrix Multiplication

P

5

P

6

P

7

P

1

P

2

P

3

P

4

= (A

11

+ A

22

)(B

11

+B

22

)

= (A

21

= A

11

+ A

22

* (B

12

) * B

11

- B

22

)

= A

22

* (B

21

- B

11

)

= (A

11

= (A

21

= (A

12

+ A

12

- A

11

- A

22

) * B

22

) * (B

11

) * (B

21

+ B

+ B

12

)

22

)

Strassens’s Matrix Multiplication

P

5

P

6

P

7

P

1

P

2

P

3

P

4

= (A

11

+ A

22

)(B

11

+B

22

)

= (A

21

= A

11

+ A

22

* (B

12

) * B

11

- B

22

)

= A

22

* (B

21

- B

11

)

= (A

11

= (A

21

= (A

12

+ A

12

- A

11

- A

22

) * B

22

) * (B

11

) * (B

21

+ B

+ B

12

)

22

)

C

11

C

12

C

21

C

22

= P

1

= P

3

= P

2

= P

1

+ P

4

+ P

5

+ P

4

+ P

3

- P

5

- P

2

+ P

7

+ P

6

Strassens’s Matrix Multiplication

Ref : Accelerating High Performance Applications with CUDA and MPI

Why MPI + CUDA ?..

Equations naturally suitable for CUDA environment

Incapability of CUDA : No inter GPU communication.

MPI : Data distributing mechanism

CUDA : Main Execution Engine

MPI + CUDA

Steps Performed

Divide the input matrix into four equal parts

Send the appropiate part to the corresponding process

Each process compute the corresponding equation

Node Contains GPU

Use kernels on their own GPU to compute result

Steps Performed

Divide the input matrix into four equal parts

Send the appropiate part to the corresponding process

Each process compute the corresponding equation

Process will send their result to the head process of equation

All Heads collect data

Head will compute C's equation

All head send their partial result to master node

Master will combine & display the result

Detailed Description – Step 1

P

1

P

5

= (A

11

+ A

22

)(B

11

+B

22

)

= (A

11

+ A

12

) * B

22

P

6

P

2

= (A

21

= (A

21

- A

+ A

11

22

) * B

) * (B

11

11

+ B

12

)

P

7

P

3

= A

11

= (A

12

- A

* (B

22

12

- B

) * (B

21

22

)

+ B

22

)

P

4

= A

22

* (B

21

- B

11

)

Detailed Description – Step 2

P

1

, P

5

P

2

, P

6

P

3

, P

7

P

4

Detailed Description – Step 3

P

1

, P

5

P

2

, P

6

P3 , P7

P

4

Declare

Result

Experimental Result - 1

Experimental Result - 2

Experimental Result - 3

References :

Accelerating High Performance Applications with CUDA and MPI :

N. P . Karunadasa & D. N. Ranasinghe

Strassen’s Matrix Multiplication on GPUs : Junjie Li , Sanjay Ranka

Thanks

Download