Vector vector operation

advertisement
§ Algorithms
for
Computer
arithmetic pipelining:
Vec-1
Pipelined
Vector
Multi-stage pipelines are like assembly lines in
automobile factories.
Cray-I, Cary X-MP, Cyber 205, CONVEX are
pipelined vector computers.
There are vector operations in vector computers
Programs should be “vectorized”.
The addition of 2 vectors:
t=1
t=2
…
t=k
t=k+1
 x1   y1   x1  y1 
x   y  x  y 
2
 2   2   2
    
x   y  x  y 
 n  n  n
n
D1
D2
Dk
…
x1+y1
x2+y2 x1+y1
…
…
xk+yk
xk-1+yk-1
…
…
x1+y1
x2+y2
x3+y3
x1+y1
x2+y2
…
xn+yn
Vec-2
We do not use many processors to perform
vector operations concurrently.
Vector operations can be completed in a rather
short time.
 Vector matrix operations:
Scalar vector operation
Vector vector operation
Scalar matrix operation
Matrix matrix operation
A matrix can be viewed as a very long vector.
When performing matrix-matrix multiplication
we need the following 2 operations:
(1) vector-vector multiplications:
 a1   b1   a1 * b1 
 a  b   a * b 
 2* 2   2 2 
    
a  b  a * bn 
 n  n  n

Vec-3
(2) summation of a vector:
 a1   a2   a12 
 a  a   a 
 3    4    34 
      

a  a  a
 n 1   n   n 1, n 

a1234
 a12   a34  

 a
  a  

a5678
56
78





      

a
 a
 a

n

3
,
n

2
n

1
,
n
n

3
,
n

2
,
n

1
,
n

 
 


…
a1234…n-1,n is obtained.
 The prefix sum problem
s1 = a1
s2 = a1+ a2
s3 = a1+ a2 + a3
…
sn =a1 + a2+…+ an
Vec-4
1st addition:
 a1  a 2   a12  S2
 a  a  a 
 3    4    34 
a 5  a 6  a 56 
a   a   a 
 7   8   78 
2nd addition:
a12   a 3   a13  S3
a  a  a  S
 12    34    14  4
a 56   a 7  a 57 
a   a   a 
 56   78   58 
3rd addition:
a14   a 5  a15  S5
a  a  a  S
6
 14    56    16 
a14  a 57  a17 S7
a  a  a  S
 14   58   18 
8
Vec-5
 The first order linear recurrence problem
Given x0
a1, a2,…,an
b1, b2,…,bn
find
x1=a1x0+b1
x2=a2x1+b2
x3=a3x2+b3
…
xn=anxn-1+bn
Consider n=8.
The vectorized algorithm:
x1=a1x0+b1
x2=a2x1+b2
x3=a3x2+b3
x4=a4x3+b4
x5=a5x4+b5
x6=a6x5+b6
x7=a7x6+b7
x8=a8x7+b8
Vec-6
Step1:
x2 = a2x1 + b2 = a2a1x0 + ( a2b1 + b2 )
x4 = a4x3 + b4 = a4a3x2 + ( a4b3 + b4 )
x6 = a6x5 + b6 = a6a5x4 + ( a6b5 + b6 )
x8 = a8x7 + b8 = a8a7x6 + ( a8b7 + b8 )
We need the following operations:
a 2   a1 
a   a 
 4   3,
a 6   a 5 
 a  a 
 8  7
a 2   b1 
a   b 
 4   3
 a 6   b5 
a  b 
 8  7
 a 2 b1   b 2 
a b   b 
 4 3   4 
 a 6 b5   b 6 
a b   b 
 8 7  8
reexpress:
x1 = a1'x0 + b1'
x2 = a2'x0 + b2'
x3 = a3'x2 + b3'
x4 = a4'x2 + b4'
Vec-7
x5 = a5'x4 + b5'
x6 = a6'x4 + b6'
x7 = a7'x6 + b7'
x8 = a8'x6 + b8'
Step 2:
x3 = a3'a2'x0 + ( a3'b2' + b3' )
x4 = a4'a2'x0 + ( a4'b2' + b4' )
x7 = a7'a6'x4 + ( a7'b6' + b7' )
x8 = a8'a6'x4 + ( a8'b6' + b8' )
We need the following operations:
 a3 ' a2 '  a3 ' b2 '  a3 ' b2 '
a ' a ' a ' b ' a ' b '
 4  2  ,  4  2    4 2 
a7 '  a6 ' a7 ' b6 ' a7 ' b6 '
 a '  a '  a ' b '  a ' b '
 8  6  8  6  8 6
 a3 ' b2 ' b3 '
a ' b ' b '
 4 2 4
 a7 ' b6 ' b7 '
 a ' b ' b '
 8 6  8
Vec-8
reexpress:
x1 = a1''x0 + b1''
x2 = a2''x0 + b2''
x3 = a3''x0 + b3''
x4 = a4''x0 + b4''
x5 = a5''x4 + b5''
x6 = a6''x4 + b6''
x7 = a7''x4 + b7''
x8 = a8''x4 + b8''
Step3:
x5 = a5''a4''x0 + ( a5''b4'' + b5'' )
x6 = a6''a4''x0 + ( a6''b4'' + b6'' )
x7 = a7''a4''x0 + ( a7''b4'' + b7'' )
x8 = a8''a4''x0 + ( a8''b4'' + b8'' )
We need the following operations:
 a5 " a4 "  a5 " b4 "
 a " a "  a " b "
 6  4  ,  6  4 
a7 " a4 " a7 " b4 "
 a " a "  a " b "
 8   4   8   4 
Vec-9
 a5 " b4 " b5 "
 a " b " b "
 6 4  6 
a7 " b4 " b7 "
 a " b " b "
 8 4   8 
reexpress:
x1
x2
x3
x4
x5
x6
x7
x8
xi = aI"'x0 + bi"' , 1  i  8
Function of
Original
Step 1
Step2
x0
x0
x0
x1  x0
x0
x2
x2  x0
x3  x2  x0
x4
x4
x4
x5  x4
x4
x6
x6  x4
x7  x6  x4
O(log n) vector operations
Vec-10




Step3
x0
x0
x0
x0
x0
x0
x0
x0
 Banded matrix operations
a banded matrix:
 a11
a
 21
A= 




a12
a22
a32
a23
a33
a43
a34
a44
a54




a45 
a55 
multiplication of a banded matrix and a
vector:
 a11
a
 21




a12
a22
a32
a23
a33
a43
a34
a44
a54
  b1 
 b 
 2 
 b3 
a45  b4 
a55  b5 
There may be many vector multiplications:
 b1 
b 
 2
a11 a12 0 0 0 b3 
b 
 4
b5 
Vec-11
The vector extracted from matrix A contain
many zeros.
a11b1  a12 b2 
 a11 a12
  b1  
a a a
 b   a b  a b  a b 
22 2
23 3 
 21 22 23
  2   21 1
a32 a33 a34

 b3   a32 b2  a33b3  a34 b4 

 b   a b  a b  a b 
a
a
a
43 44 45   4 
44 4
45 5 

 43 3


a54 a55  b5  a54 b4  a55b5 
The result can be obtained by:
 a11   b1   0   0   a12  b2 
a  b   a   b   a  b 
 22   2   21   1   23   3 
 a33   b3   a32   b2    a34   b4 
a  b  a  b     
 44   4   43   3  a45  b5 
 a55  b5  a54  b4   0   0 
Vec-12
 multiplication of two banded matrices
 a11 a12
  b11 b12

a a a
 b b b

 21 22 23
  21 22 23

a32 a33 a34
b32 b33 b34




 

a
a
a
b
b
b
43
44
45  
43 44 45 


a54 a55  
b54 b55 

a b 
 21 12
C0  a32 b23 
a b 
 43 34
a54 b45 
a11b11 
a 22 b22 
a33b33 
a 44 b44 
a55b55 
a12 b21 
a 23b32 

a34 b43 
a 45b54 


 0   0   a11   b11   a12  b21 
 a  b  a  b   a  b 
 21   12   22   22   23   32 
 a32   b23    a33   b33    a34   b43 
a  b  a  b     
 43   34   44   44  a45  b54 
a54  b45   a55  b55   0   0 
Vec-13
a22  b21   a21   b11 
 a  b  a  b 
C1   33    32    32    22 
a44  b43  a43  b33 
 a  b  a  b 
 55   54   54   44 
a32  b21 
C2  a43   b32 
   
a54  b43 
C-1 and C-2 can be calculated similarly.
 Linear systems
Ax  y , where A: n  n matrix
x : unknown column vector
y : column vector
If A is dense, we can apply the Guassian
elimination method to design a vectorized
algorithm. ( O(n2) vector operations)
Vec-14
 Banded triangular matrix
e.g. a 66 banded matrix with bandwith 3:
 a11
a
 21
 a31





a22
a32
a42
a33
a43
a53
a44
a54
a55
a64
a65








a66 
corresponds to a second order linear recurrence
problem:
y
x1  1
a11
x2  ( y2  a21 x1 ) / a22
and xi 

1
yi  ai (i 1) xi 1  ai (i  2) xi  2
aii

for 3 i 6
A linear system with a banded triangular
parameter matrix with bandwidth w can be
transformed
into
a
(w-1)th
order
linear
recurrence problem.
O(log n) vector operations ( similar to the first
oreder linear recurrence problem.)
Vec-15
 Tridiagonal linear systems – method 1
e.g. a 44 tridiagonal matrix:
 a1
c
 2



b1
a2
c3
b2
a3
c4



b3 
a4 
All of non-zero elements are clustered on the
central three diagonals.
Algorithm: (O(log n ) vector operations.)
Step 1: Apply LU decomposition. i.e. find
1
L
A=LU, where L   2



1
L3
u1 b1

u2
U 



Vec-16
1
L4
b2
u3




1



b3 
u 4 
Step 2: Let Ux=z. Solve a linear system with
banded triangular matrix:
Lz = y.
Step 3: Solve another linear system with banded
triangular matrix:
Ux = z.
How to find L and U ?
e.g. A = LU.
b1

 a1 b1
  u1

c a b
 L u L b  u
b2
2
1
2
1
2
2
2
2



L3u 2 L3b2  u3
b3

 c3 a3 b3  


 
L
u
L
b

u
c
a


4 3
4 3
4
4 4
 u1
u
 2

 u3
u 4
 a1
 a2  b1 L2
 a3  b2 L3
 a4  b3 L4
 L2 


 L3 

L 

 4
Vec-17
c2
u1
c3
u2
c4
u3
substitute Li into ui:
u1  a1
c2b1
u 2  a2 
u1
u3  a3 
c3b2
u2
u 4  a4 
c4b3
u3
The ui sequence is a first order recurrence, but
not linear.
Introduce qi :
u1 
a1 q1

1 q0
u 2  a2 
c2b1 a2 q1  c2b1q0 q2


u1
q1
q1
u3  a3 
c3b2 a3q2  c3b2 q1 q3


u2
q2
q2
c4b3 a4 q3  c4b3q2 q4
u 4  a4 


u3
q3
q3
Vec-18
In general,
a q cb q
q
ui  1 i 1 i i 1 i  2  i
qi 1
qi 1
The qi sequence is a second order linear recurrence:
given q0 = 1
q1 = a1
find qi = aiqi-1 – cibi-1qi-2,
i2
Summary:
Step 1: Solve the qi sequence.
q
Step 2: ui  i , 1  i  n
qi 1
c
Step 3: Li  i , 2  i  n
u i 1
O(log n ) vector operations.
Vec-19
 Tridiagonal linear systems — method 2
(divide-and-conquer)
e.g.
p1
p2
p3
p4
p
p5
p6
p7
p8
 a1
c
 2









b1
a2
c3
b2
a3
c4
b3
a4
c5
b4
a5
c6
b5
a6
c7
A
pi denotes the ith row.
Perform the following operations:
Q1  p1  b1 p2
a2
Q2  p2  c2 p1  b2 p3
a1
a3

Qi  pi  ci pi 1  bi pi 1
ai 1
ai 1

Q8  p8  c8 p7
a7
Vec-20
b6
a7
c8
b7
a8
y1 
y2 

y3 
y4 

y5 
y6 

y7 

y8 
y
Q1
Q2
Q3
Q4
Q
Q5
Q6
Q7
Q8
 d1 0 e1
0 d
0 e2
2

 f 3 0 d 3 0 e3

f 4 0 d 4 0 e4


f 5 0 d 5 0 e5

f 6 0 d 6 0 e6


f7 0 d7 0

f8 0 d8

z1 
z2 

z3 
z 4 
z5 

z6 
z7 

z8 
Rearrange:









d1 e1
f3 d3 e3
f5 d5 e5
f7 d7
d2 e2
f4 d4 e4
f6 d6 e6
f8 d8
z1
z3
z5
z7
z2
z4
z6
z8









An 88 tridiagonal linear system can be split
into two 44 independent tridiagonal linear
systems.
How to obtain the matrix Q ?
 k1   b1   a2 
k  b    a 
3
 2   2  
    
k  b    a 
 7  7 
8
( 1 division )
Vec-21
l2  c2    a1 
l   c    a 
 3   3   2
      
l   c    a 
 8  8  7
( 1 division )
 d1   a1   k1  c2   0   0 
d  a  k   c  l   b 
 2   2   2  3   2  1 
                  
d  a      l   b 
 7   7  k 7   c8   7   2 
 d 8   a8   0   0  l8   b7 
( 2 multiplications and 2 additions )
 e1   k1  b2 
e   k   b 
 2    2  3 
       
e   k  b 
 6   6  7 
( 1 multiplication )
 f 3  l3  c2 
 f  l   c 
 4    4  3 
       
 f   l  c 
 8   8  7 
( 1 multiplication )
 z1   y1   k1   y2   0   0 
 z   y  k   y  l   y 
 2   2   2  3   2  1 
                  
 z   y      l   y 
 7   7  k 7   y8   7   2 
 z8   y8   0   0  l8   y7 
( 2 multiplications and 2 additions )
Vec-22
After 12 vector operations, we obtain the matrix Q.
88  44  22  11.
total: O(log n ) vector operations.

general banded linear systems
e.g.
p1  a1
p 2 c2

p3  e3
p4 
p5 
p

p6 
p7 

p8 
p9 

p10 
b1 d1
a2 b2 d 2
c3 a3 b3 d 3
e4 c4 a4 b4 d 4
e5 c5 a5 b5
e6 c6 a6
e7 c7
e8

Vec-23
d5
b6
a7
c8
e9
d6
b7
a8
c9
e10
d7
b8 d8
a9 b9
c10 a10
y1 
y2 

y3 
y4 
y5 

y6 
y7 

y8 
y9 

y10 
Q1  f1
g1
i1
Q2 
f2
g2

Q3  h3
f3
g3
Q4  h4
f4
Q5  j5
h5
f5
Q

Q6 
j6
h6
Q7 
j7
h7

Q8 
j8
Q9 
j9

Q10 
i2
i3
g4
i4
g5
f6
i5
g6
f7
h8
g7
f8
h9
j10
i6
g8
f9
h10
f10













f1 g1 i1
h3 f3 g3 i3
j5 h5 f5 g5 i5
j7 h7 f7 g7
j9 h9 f9
f2 g2 i2
h4 f4 g4
j6 h6 f6
j8 h8
j10
Vec-24
i4
g6 i6
f8 g8
h10 f10
z1
z2
z3
z4
z5
z6
z7
z8
z9
z10












z1 
z2 

z3 
z 4 
z5 

z6 
z7 

z8 
z9 

z10 
How to obtain the matrix Q ?
Take Q5 as an example:
Q5 = pp3 + qp4 + p5 + rp6 + sp7

c3 e4
  p  0 
b a e
  q   c 
3
4
6

    5 
 d 4 a6 c7   r   b5 

  

d
b
s
0



6 7  
( a tridiagonal linear system )
Summary:
Step 1: Solve n tridagonal linear systems.
Step 2: Generate Q. At this point, the original
problem is split into two smaller problems with
half size.
Vec-25
Download