6/28/2016 Presenter: Yunfei Shen Luyao Li [1] Yi Zhang, Parallel Matrix Layout for MapReduce 1 Introduction Plans (1,2 & 3) Data Analysis Demo 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 2 MapReduce is efficient to parse small files, but massive scientific raw data… MapReduce is not optimized for matrix-form data and linear algebra kernels 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 3 Consider a very simple matrix multiplication problem, A x B=C. Possible plans are [1]: 1. Simple dot product Ci,j= Ai,k * Bk,j 2. Divide A into s t submatrices (blocks) and B into t u ones. Cblock=Ablock*Bblock 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 4 Sub-matrix SequenceFileWrite ->key:location (x, y) Mapper: Input Key1 (index1, index2) , IntWritable (v) A{(s,t), As,t} B{(t,u), Bt,u} Need specially consider lastBlock in row or column, maybe unbalanced: lastBlockIndex, lastBlockSize Output Key2(index1, index2, index3) , Value (index1, index2, v) {(S,T,U), <(s,t,As,t), (t,u,Bt,u)>} Reducer: Input Key2(index1, index2, index3) , Value (index1, index2, v) 1 Cs,t,u=As,t*Bt,u Multiplication 2 Cs,u=∑ Cs,t,u Sum Output Key1(index1, index2) , Intwritable (v) C{(s,t), Cs,t} 6/28/2016 5 Sub-matrix - map A= 4*4, sub=2*2 0 B=4*4, sub=2*2 6/28/2016 Key1(x,y) A B ( 0, 0) ( 0, 2) ( 0, 1) ( 0, 3 ) ( 1, 0) ( 1, 2 ) ( 1, 1) ( 1, 3 ) IntWritable(v) A B 1 5 2 6 3 7 4 8 Key2(S,T,U, m) ( 0, 0, 1, 0) ( 0, 0, 1, 1) Value(x,y,v) ( 0, 0, 1), (0,1, 2), (1,0,3 ), (1,1, 4 ) ( 0, 0, 5), (0,1, 6), ( 1,0,7 ), (1,1, 8 ) 6 Sub-matrix - reduce Key2(S,T,U, m) ( 0, 0, 1, 0) ( 0, 0, 1, 1) Value(x,y,v) ( 0, 0, 1), (0,1, 2), (1,0,3 ), (1,1, 4 ) ( 0, 0, 5), (0,1, 6), ( 1,0,7 ), (1,1, 8 ) Key1 (x, y) (0, 3) (0, 4) (1, 3) (1, 4) Value (x,y, v) (0, 0, 19) (0, 1, 21) (1, 1, 43) (1, 2, 50) 6/28/2016 C=4*4 7 The simplest strategy is to have each reducer do just one of the block multiplications. 0, 0, 1 Key2(S, T, U, m) (0, 0, 1, 0) (0, 0, 1, 1) Value(x,y,v) (0, 0, 1) (0, 0, 2) Job1: Reducer Job 1 Do multiplication in Job1 reducer: Key1(x,y) IntWritable(v) (0, 2) 2 Job 2 Do sum in Job2 reducer: Key1(x,y) IntWritable(v) (0, 2) {…,2,…} Too many Reducers: numS*numT*numU, heavy network traffic!!! 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 8 In this plan, we use a single reducer to multiply a single A block times a whole row of B blocks. Also two jobs. Job1 A block (S, T) e.g. ( 0, 0) 0, 0 Greatly decreasing the number of Reducers: numS*numT maximum reducers are needed for this 0 ,1 plan! 1,0 1, 1 Do multiplication Key2(S, T, U) Value(x,y,v) A (0, 0, -1) (0, 0, 1) B (0, 0, 1) (0, 0, 2) B block (T, K1,2…) e.g. ( 0, 0) (0, 1) Job 2: do sum. 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 9 In Plan3, we use a single reducer to compute the final C block, and there's no need for a second MapReduce job. A blocks and B blocks are arranged like following order after Mapper: for 0 <= T < numT, A[S,0] B[0,U] A[S,1] B[1,U] ... A[S,numT-1] B[numT, U] Reducer number is: numS*numU Only one job needed! A 0 row 0, 0 0 ,1 1,0 1, 1 A(0,0) B(0,1) A(0,1) B(1,1) B 0 column 6/28/2016 Key2(S, U, T) Value(x,y,v) A (0, 1,0) (0, 0, 1) B (0, 1,0) (0, 0, 2) Multiply and sum in one job if (ms != s || mu != u) write reducer [1] Yi Zhang, Parallel Matrix Layout for MapReduce 10 Plan Job Reducer Time (s) 1 4500 2 2 280 60 2 35 58 4000 3 3500 s t u Plan 3000 103 112 119 1 2500 2 2000 3 1500 1 40 32 s t u Plan 1000 504 523 547 1 500 2 0 Job Reducer Time 2 125 2 25 1 10K 25 s t u 50 68 72 3 6/28/2016 300 Job 200 Reducer 100 0 Job Reducer Time 2 196 120 2 36 109 1 36 65 4091 3692 25k 3216 200 1 2 Time 3 Job 1 100 0 1 2 2 Reducer 3 3 Time 150 Job 100 Reducer 50 500k 0 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 1 2 3 Time/100 11 A= C=A x B= Use 2x2 sub-matrix, run with strategy 1, 2, 3 respectively 6/28/2016 , B= [1] Yi Zhang, Parallel Matrix Layout for MapReduce 12 6/28/2016 [1] Yi Zhang, Parallel Matrix Layout for MapReduce 13