Loop transformations

advertisement
Embedded Computer Architecture
Loop Transformations
TU/e 5kk73
Henk Corporaal
Bart Mesman
Overview
• Motivation
• Loop level transformations catalogue
– Loop merging
– Loop interchange
– Loop unrolling
– Unroll-and-Jam
– Loop tiling
• Loop Transformation Theory and
Dependence Analysis
Thanks for many slides go to the DTSE people from IMEC and
Dr. Peter Knijnenburg († 2007) from Leiden University
@HC 5KK73 Embedded Computer Architecture
2
Loop Transformations
• Change the order in which the iteration
space is traversed.
• Can expose parallelism, increase available
ILP, or improve memory behavior.
• Dependence testing is required to check
validity of transformation.
@HC 5KK73 Embedded Computer Architecture
3
Why loop transformations
Example 1: in-place mapping
for (j=1; j<M; j++)
for (i=0; i<N; i++)
A[i][j] = f(A[i][j-1]);
for (i=0; i<N; i++)
OUT[i] = A[i][M-1];
for (i=0; i<N; i++){
for (j=1; j<M; j++)
A[i][j] = f(A[i][j-1]);
OUT[i] = A[i][M-1];
}
OUT
j
A
@HC 5KK73 Embedded Computer Architecture
i
4
Why loop transformations
Example 2: memory allocation
for (i=0; i<N; i++)
N cyc.
B[i] = f(A[i]);
for (i=0; i<N; i++)
N cyc.
C[i] = g(B[i]);
2 background ports
for (i=0; i<N; i++){
B[i] = f(A[i]);
2N cyc.
C[i] = g(B[i]);
}
1 backgr. + 1 foregr. ports
Foreground
memory
CPU
External
memory
interface
Background memory
@HC 5KK73 Embedded Computer Architecture
5
Loop transformation catalogue
• merge (fusion)
• interchange
– improve
locality/regularity
– improve locality
•
•
•
•
bump
extend/reduce
body split
reverse
– improve regularity
@HC 5KK73 Embedded Computer Architecture
•
•
•
•
•
skew
tiling
index split
unroll
unroll-and-jam
6
Loop Merge
for Ia = exp1 to exp2
A(Ia)
for Ib = exp1 to exp2
B(Ib)

for I = exp1 to exp2
A(I)
B(I)
• Improve locality
• Reduce loop overhead
@HC 5KK73 Embedded Computer Architecture
7
Loop Merge:
Example of locality improvement:
for (i=0;
B[i] =
for (j=0;
C[j] =
i<N; i++)
f(A[i]);
j<N; j++)
f(B[j],A[j]);
Location
Production/Consumption
Consumption(s
Time
for (i=0; i<N; i++)
B[i] = f(A[i]);
C[i] = f(B[i],A[i]);
Location
Production/Consumption
Consumption(s)
Consumptions of second loop closer to
productions and consumptions of first loop
• Is this always the case after merging loops?
@HC 5KK73 Embedded Computer Architecture
Time
8
Loop Merge not always allowed !
• Data dependencies from first to second loop
can block Loop Merge
• Merging is allowed if
 I: cons(I) in loop 2  prod(I) in loop 1
• Enablers: Bump, Reverse, Skew
for (i=0; i<N; i++)
B[i] = f(A[i]);
N-1 >= i
for (i=0; i<N; i++)
C[i] = g(B[N-1]);
@HC 5KK73 Embedded Computer Architecture
for (i=0; i<N; i++)
B[i] = f(A[i]);
i-2 < i
for (i=2; i<N; i++)
C[i] = g(B[i-2]);
9
Loop Bump
for I = exp1 to exp2
A(I)

@HC 5KK73 Embedded Computer Architecture
for I = exp1+N to exp2+N
A(I-N)
10
Loop Bump: Example as enabler
for (i=2;
B[i] =
for (i=0;
C[i] =
i<N; i++)
f(A[i]);
i<N-2; i++)
g(B[i+2]);
i+2 > i  direct merging
not possible
• we would consume B[i+2] before
it is produced
Loop Bump for (i=2; i<N; i++)

B[i] = f(A[i]);
for (i=2; i<N; i++)
C[i-2] = g(B[i+2-2]);
i+2–2 = i 
merging possible
for (i=2; i<N; i++)
Loop Merge
B[i] = f(A[i]);
C[i-2] = g(B[i]);

@HC 5KK73 Embedded Computer Architecture
11
Loop Extend
for I = exp1 to exp2
A(I)

@HC 5KK73 Embedded Computer Architecture
exp3  exp1
exp4  exp2
for I = exp3 to exp4
if Iexp1 and Iexp2
A(I)
12
Loop Extend: Example as enabler
for (i=0;
B[i] =
for (i=2;
C[i-2]
i<N; i++)
f(A[i]);
i<N+2; i++)
= g(B[i]);
for (i=0; i<N+2; i++)
if(i<N)
Loop Extend
B[i] = f(A[i]);
for (i=0; i<N+2; i++)
if(i>=2)
C[i-2] = g(B[i]);


Loop Merge
@HC 5KK73 Embedded Computer Architecture
for (i=0; i<N+2; i++)
if(i<N)
B[i] = f(A[i]);
if(i>=2)
C[i-2] = g(B[i]);
13
Loop Reduce
for I = exp1 to exp2
if Iexp3 and Iexp4
A(I)

@HC 5KK73 Embedded Computer Architecture
for I = max(exp1,exp3) to
min(exp2,exp4)
A(I)
14
Loop Body Split
for I = exp1 to exp2
A(I)
B(I)

@HC 5KK73 Embedded Computer Architecture
A(I) must be
single-assignment:
its elements should
be written once
for Ia = exp1 to exp2
A(Ia)
for Ib = exp1 to exp2
B(Ib)
15
Loop Body Split: Example as enabler
for (i=0; i<N; i++)
Loop Body Split
A[i] = f(A[i-1]);
B[i] = g(in[i]);
for (j=0; j<N; j++)
for (i=0; i<N; i++)
C[j] = h(B[j],A[N]);
A[i] = f(A[i-1]);
for (k=0; k<N; k++)
B[k] = g(in[k]);
for (j=0; j<N; j++)
for (i=0; i<N; i++)
C[j] = h(B[j],A[N]);
A[i] = f(A[i-1]);
for (j=0; j<N; j++)
Loop Merge
B[j] = g(in[j]);
C[j] = h(B[j],A[N]);
@HC 5KK73 Embedded Computer Architecture
16
Loop Reverse
for I = exp1 to exp2
A(I)
Location
Production
Consumption
Time

for I = exp2 downto exp1
Location
A(I)
Or written differently:
for I = exp1 to exp2
A(exp2-(I-exp1))
@HC 5KK73 Embedded Computer Architecture
Time
17
Loop Reverse: Satisfy dependencies
for (i=0; i<N; i++)
A[i] = f(A[i-1]);
No loop-carried dependencies
allowed ! Can not reverse
the loop
A[0] = …
Enabler: data-flow
for (i=1; i<=N; i++)
transformations; exploit
A[i]= A[i-1]+ f(…);
associativity of + operator
… = A[N] …
A[N] = …
for (i=N-1; i>=0; i--)
A[i]= A[i+1]+ f(…);
… = A[0] …
@HC 5KK73 Embedded Computer Architecture
18
Loop Reverse: Example as enabler
for (i=0;
B[i] =
for (i=0;
C[i] =
i<=N; i++)
f(A[i]);
i<=N; i++)
g(B[N-i]);
N–i > i  merging not possible
Loop Reverse for (i=0; i<=N; i++)

N-(N-i) = i 
B[i] = f(A[i]);
merging possible
for (i=0; i<=N; i++)
C[N-i] = g(B[N-(N-i)]);

Loop Merge for (i=0; i<=N; i++)
@HC 5KK73 Embedded Computer Architecture
B[i] = f(A[i]);
C[N-i] = g(B[i]);
19
Loop Interchange
for I1 = exp1 to exp2
for I2 = exp3 to exp4
A(I1, I2)

@HC 5KK73 Embedded Computer Architecture
for I2 = exp3 to exp4
for I1 = exp1 to exp2
A(I1, I2)
20
Loop Interchange: index traversal
j
j
Loop
Interchange
i
for(i=0; i<W; i++)
for(j=0; j<H; j++)
A[i][j] = …;
@HC 5KK73 Embedded Computer Architecture
i
for(j=0; j<H; j++)
for(i=0; i<W; i++)
A[i][j] = …;
21
Loop Interchange
• Validity: check dependence direction
vectors.
• Often used to improve cache behavior.
• The innermost loop (loop index changes
fastest) should (only) index the right-most
array index expression in case of row-major
storage, like in C.
• Can improve execution time by 1 or 2
orders of magnitude.
@HC 5KK73 Embedded Computer Architecture
22
Loop Interchange (cont’d)
• Loop interchange can also expose
parallelism.
• If an inner-loop does not carry a
dependence (entry in direction vector
equals ‘=‘), this loop can be executed in
parallel.
• Moving this loop outwards increases the
granularity of the parallel loop iterations.
@HC 5KK73 Embedded Computer Architecture
23
Loop Interchange:
Example of locality improvement
for (i=0; i<N; i++)
for (j=0; j<M; j++)
B[i][j] = f(A[j],B[i][j-1]);
for (j=0; j<M; j++)
for (i=0; i<N; i++)
B[i][j] = f(A[j],B[i][j-1]);
• In second loop:
– A[j] reused N times (temporal locality)
– However: loosing spatial locality in B (assuming row major
ordering)
• => Exploration required
@HC 5KK73 Embedded Computer Architecture
24
Loop Interchange: Satisfy dependencies
• No loop-carried dependencies allowed
unless  I2: prod(I2)  cons(I2)
Compare:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
A[i][j] = f(A[i-1][j+1]);
for (i=0; i<N; i++)
for (j=0; j<M; j++)
A[i][j] = f(A[i-1][j]);
• Enablers:
– Data-flow transformations
– Loop Bump
– Skewing
@HC 5KK73 Embedded Computer Architecture
25
Loop Skew:
Basic transformation
for I1 = exp1 to exp2
for I2 = exp3 to exp4
A(I1, I2)

@HC 5KK73 Embedded Computer Architecture
for I1 = exp1 to exp2
for I2 = exp3+.I1+ to
exp4+.I1+
A(I1, I2-.I1- )
26
Loop Skew
j
j
Loop Skewing
i
for(j=0;j<H;j++)
for(i=0;i<W;i++)
A[i][j] = ...;
@HC 5KK73 Embedded Computer Architecture
i
for(j=0; j<H; j++)
for(i=0+j; i<W+j;i++)
A[i-j][j] = ...;
27
Loop Skew: Example as enabler
of regularity improvement
for (i=0; i<N; i++)
for (j=0; j<M; j++)
… = f(A[i+j]);
for (i=0; i<N; i++)
for (j=i; j<i+M; j++)
… = f(A[j]);

Loop Skew
Loop Extend &
Interchange

@HC 5KK73 Embedded Computer Architecture
for (j=0; j<N+M; j++)
for (i=0; i<N; i++)
if (j>=i && j<i+M)
… = f(A[j]);
28
for (i=0; i<N; i++)
for (j=0; j<M; j++)
A[i+1][j] = f(A[i][j+1]);
i
i
j
i
j
Loop skewing as enabler
for Loop Interchange
@HC 5KK73 Embedded Computer Architecture
j
29
Loop Tiling
for I = 0 to exp1 . exp2
A(I)

@HC 5KK73 Embedded Computer Architecture
Tile size exp1
Tile factor exp2
for I1 = 0 to exp2
for I2 = exp1.I1 to exp1.(I1 +1)
A(I2)
30
Loop Tiling (1-D example)
i
j
1-D Loop
Tiling
i
for(i=0;i<9;i++)
A[i] = ...;
@HC 5KK73 Embedded Computer Architecture
for(j=0; j<3; j++)
for(i=4*j; i<4*j+4; i++)
if (i<9)
A[i] = ...;
31
Loop Tiling
• Improve cache reuse by dividing the (2 or
higher dimensional) iteration space into
tiles and iterating over these tiles.
• Only useful when working set does not fit
into cache or when there exists much
interference.
• Two adjacent loops can legally be tiled if
they can legally be interchanged.
@HC 5KK73 Embedded Computer Architecture
32
2-D Tiling Example
for(i=0; i<N; i++)
for(j=0; j<N; j++)
A[i][j] = B[j][i];
for(TI=0; TI<N; TI+=16)
for(TJ=0; TJ<N; TJ+=16)
for(i=TI; i<min(TI+16,N); i++)
Inside
for(j=TJ; j<min(TJ+16,N); j++)
a tile
A[i][j] = B[j][i];
@HC 5KK73 Embedded Computer Architecture
33
2-D Tiling:
changes the execution order
i
j
i
j
@HC 5KK73 Embedded Computer Architecture
34
What is the best Tile Size?
• Current tile size selection algorithms use a
cache model:
– Generate collection of tile sizes;
– Estimate resulting cache miss rate;
– Select best one.
• Only take into account L1 cache.
• Mostly do not take into account n-way
associativity.
@HC 5KK73 Embedded Computer Architecture
35
Loop Index Split
for Ia = exp1 to exp2
A(Ia)

@HC 5KK73 Embedded Computer Architecture
for Ia = exp1 to p
A(Ia)
for Ib = p+1 to exp2
A(Ib)
36
Loop Unrolling
• Duplicate loop body and adjust loop
header.
• Increases available ILP, reduces loop
overhead, and increases possibilities for
common subexpression elimination.
• Always valid !!
@HC 5KK73 Embedded Computer Architecture
37
(Partial) Loop Unrolling
for I = exp1 to exp2
A(I)

A(exp1)
A(exp1+1)
A(exp1+2)
…
A(exp2)
@HC 5KK73 Embedded Computer Architecture
for I = exp1/2 to exp2 /2
A(2l)
A(2l+1)
38
Loop Unrolling: Downside
• If unroll factor is not divisor of trip count,
then need to add remainder loop.
• If trip count not known at compile time,
need to check at runtime.
• Code size increases which may result in
higher I-cache miss rate.
• Global determination of optimal unroll
factors is difficult.
@HC 5KK73 Embedded Computer Architecture
39
Loop Unroll: Example of
removal of non-affine iterator
for (L=1; L < 4; L++)
for (i=0; i < (1<<L); i++)
A(L,i);
for (i=0; i < 2; i++)
A(1,i);
for (i=0; i < 4; i++)
A(2,i);
for (i=0; i < 8; i++)
A(3,i);
@HC 5KK73 Embedded Computer Architecture
40
Unroll-and-Jam
• Unroll outerloop and fuse new copies of the
innerloop.
• Increases size of loop body and hence
available ILP.
• Can also improve locality.
@HC 5KK73 Embedded Computer Architecture
41
Unroll-and-Jam Example
for (i=0;i<N;i++)
for (j=0;j<N;j++)
A[i][j] = B[j][i];
• More ILP exposed
• Spatial locality of B
enhanced
@HC 5KK73 Embedded Computer Architecture
for (i=0;i<N;i=i+2)
for (j=0;j<N;j++)
A[i][j] = B[j][i];
for (j=0;j<N;j++)
A[i+1][j] = B[j][i+1];
for (i=0; i<N; i+=2)
for (j=0; j<N; j++) {
A[i][j] = B[j][i];
A[i+1][j] = B[j][i+1];
}
42
Simplified loop transformation script
• Give all loops same nesting depth
– Use dummy 1-iteration loops if necessary
•
Improve regularity
– Usually applying loop interchange or reverse
•
Improve locality
– Usually applying loop merge
– Break data dependencies with loop bump/skew
– Sometimes loop index split or unrolling is easier
@HC 5KK73 Embedded Computer Architecture
43
Loop transformation theory
• Iteration spaces
• Polyhedral model
• Dependency analysis
@HC 5KK73 Embedded Computer Architecture
44
Technical Preliminaries (1)
do i = 2, N
do j = i, N
x[i,j]=0.5*(x[i-1,j]+ x[i,j-1])
enddo
(1)
• Address expr (1)
1 0  i   1
0 1  j    0 

   
j
4
• perfect loop nest
• iteration space
• dependence vector
3
2
2
@HC 5KK73 Embedded Computer Architecture
3
4
i
45
Technical Preliminaries (2)
Switch loop indexes:
do m = 2, N
do n = 2, m
x[n,m]=0.5*(x[n-1,m]+ x[n,m-1])
enddo
(2)
(2)
m 0 1  i 
 n   1 0  j 
  
 
1 0  i   1 1 0 0 1 m  1 0 1 m  1
0 1  j    0   0 1 1 0  n    0   1 0  n    0 

    

    
   
affine transformation
@HC 5KK73 Embedded Computer Architecture
46
Polyhedral Model
• Polyhedron is set x : Ax  c for some
matrix A and bounds vector c
• Polyhedra (or Polytopes) are objects in a
many-dimensional space without holes
• Iteration spaces of loops (with unit stride)
can be represented as polyhedra
• Array accesses and loop transformations
can be represented as matrices
@HC 5KK73 Embedded Computer Architecture
47
Iteration Space
• A loop nest is represented as BI  b for
iteration vector I
• Example:
for(i=0; i<10;i++)
-1 0
0
for(j=i; j<10;j++)
1 0 i
9

1 -1 j
0
0 1
9
@HC 5KK73 Embedded Computer Architecture
48
Array Accesses
• Any array access A[e1][e2] for linear index
expressions e1 and e2 can be represented
as an access matrix and offset vector.
A+a
• This can be considered as a mapping from
the iteration space into the storage space
of the array (which is a trivial polyhedron)
@HC 5KK73 Embedded Computer Architecture
49
Unimodular Matrices
• A unimodular matrix T is a matrix with
integer entries and determinant 1.
• This means that such a matrix maps an
object onto another object with exactly the
same number of integer points in it.
• Its inverse T¹ always exist and is
unimodular as well.
@HC 5KK73 Embedded Computer Architecture
50
Types of Unimodular Transformations
•
•
•
•
Loop interchange
Loop reversal
Loop skewing for arbitrary skew factor
Since unimodular transformations are
closed under multiplication, any
combination is a unimodular transformation
again.
@HC 5KK73 Embedded Computer Architecture
51
Application
• Transformed loop nest is given by AT¹ I’  a
• Any array access matrix is transformed into
AT¹.
• Transformed loop nest needs to be
normalized by means of Fourier-Motzkin
elimination to ensure that loop bounds are
affine expressions in more outer loop
indices.
@HC 5KK73 Embedded Computer Architecture
52
Dependence Analysis
Consider following statements:
S1: a = b + c;
S2: d = a + f;
S3: a = g + h;
• S1  S2: true or flow dependence = RaW
• S2  S3: anti-dependence = WaR
• S1  S3: output dependence = WaW
@HC 5KK73 Embedded Computer Architecture
53
Dependences in Loops
• Consider the following loop
for(i=0; i<N; i++){
S1: a[i] = …;
S2: b[i] = a[i-1];}
• Loop carried dependence S1  S2.
• Need to detect if there exists i and i’ such
that i = i’-1 in loop space.
@HC 5KK73 Embedded Computer Architecture
54
Definition of Dependence
• There exists a dependence if there two
statement instances that refer to the same
memory location and (at least) one of them
is a write.
• There should not be a write between these
two statement instances.
• In general, it is undecidable whether there
exist a dependence.
@HC 5KK73 Embedded Computer Architecture
55
Direction of Dependence
• If there is a flow dependence between two
statements S1 and S2 in a loop, then S1
writes to a variable in an earlier iteration
than S2 reads that variable.
• The iteration vector of the write is
lexicographically less than the iteration
vector of the read.
• I  I’ iff i1 = i’1  i(k-1) = i’(k-1)  ik < i’k
for some k.
@HC 5KK73 Embedded Computer Architecture
56
Direction Vectors
• A direction vector is a vector
(=,=,,=,<,*,*,,*)
– where * can denote = or < or >.
• Such a vector encodes a (collection of)
dependence.
• A loop transformation should result in a
new direction vector for the dependence
that is also lexicographically positive.
@HC 5KK73 Embedded Computer Architecture
57
Loop Interchange
• Interchanging two loops also interchanges
the corresponding entries in a direction
vector.
• Example: if direction vector of a
dependence is (<,>) then we may not
interchange the loops because the resulting
direction would be (>,<) which is
lexicographically negative.
@HC 5KK73 Embedded Computer Architecture
58
Affine Bounds and Indices
• We assume loop bounds and array index
expressions are affine expressions:
a0 + a1 * i1 +  + ak * ik
• Each loop bound for loop index ik is an
affine expressions over the previous loop
indices i1 to i(k-1).
• Each loop index expression is a affine
expression over all loop indices.
@HC 5KK73 Embedded Computer Architecture
59
Non-Affine Expressions
• Index expressions like i*j cannot be
handled by dependence tests. We must
assume that there exists a dependence.
• An important class of index expressions are
indirections A[B[i]]. These occur frequently
in scientific applications (sparse matrix
computations).
• In embedded applications???
@HC 5KK73 Embedded Computer Architecture
60
Linear Diophantine Equations
• A linear diophantine equations is of the
form
aj * xj = c
• Equation has a solution iff gcd(a1,,an) is
divisor of c
@HC 5KK73 Embedded Computer Architecture
61
GCD Test for Dependence
• Assume single loop and two references
A[a+bi] and A[c+di].
• If there exist a dependence, then gcd(b,d)
divides (c-a).
• Note the direction of the implication!
• If gcd(b,d) does not divide (c-a) then there
exists no dependence.
@HC 5KK73 Embedded Computer Architecture
62
GCD Test (cont’d)
• However, if gcd(b,d) does divide (c-a) then
there might exist a dependence.
• Test is not exact since it does not take into
account loop bounds.
• For example:
•
for(i=0; i<10; i++)
•
A[i] = A[i+10] + 1;
@HC 5KK73 Embedded Computer Architecture
63
GCD Test (cont’d)
• Using the Theorem on linear diophantine
equations, we can test in arbitrary loop
nests.
• We need one test for each direction vector.
• Vector (=,=,,=,<,) implies that first k
indices are the same.
• See book by Zima for details.
@HC 5KK73 Embedded Computer Architecture
64
Other Dependence Tests
• There exist many dependence test
– Separability test
– GCD test
– Banerjee test
– Range test
– Fourier-Motzkin test
– Omega test
• Exactness increases, but so does the cost.
@HC 5KK73 Embedded Computer Architecture
65
Fourier-Motzkin Elimination
• Consider a collection of linear inequalities
over the variables i1,,in
•
e1(i1,,in)  e1’(i1,,in)
•

•
em(i1,,in)  em’(i1,,in)
• Is this system consistent, or does there
exist a solution?
• FM-elimination can determine this.
@HC 5KK73 Embedded Computer Architecture
66
FM-Elimination (cont’d)
• First, create all pairs L(i1,,i(n-1))  in and
in  U(i1,,i(n-1)). This is solution for in.
• Then create new system
•
L(i1,,i(n-1))  U(i1,,i(n-1))
•
together with all original inequalities not
involving in.
• This new system has one variable less and
we continue this way.
@HC 5KK73 Embedded Computer Architecture
67
FM-Elimination (cont’d)
• After eliminating i1, we end up with
collection of inequalities between constants
c1  c1’.
• The original system is consistent iff every
such inequality can be satisfied.
• There does not exist an inequality like
•
10  3.
• There may be exponentially many new
inequalities generated!
@HC 5KK73 Embedded Computer Architecture
68
Fourier-Motzkin Test
• Loop bounds plus array index expressions
generate sets of inequalities, using new
loop indices i’ for the sink of dependence.
• Each direction vector generates inequalities
•
i1 = i1’  i(k-1) = i(k-1)’ ik < ik’
• If all these systems are inconsistent, then
there exists no dependence.
• This test is not exact (real solutions but no
integer ones) but almost.
@HC 5KK73 Embedded Computer Architecture
69
N-Dimensional Arrays
• Test in each dimension separately.
• This can introduce another level of
inaccuracy.
• Some tests (FM and Omega test) can test
in many dimensions at the same time.
• Otherwise, you can linearize an array:
Transform a logically N-dimensional array
to its one-dimensional storage format.
@HC 5KK73 Embedded Computer Architecture
70
Hierarchy of Tests
• Try cheap test, then more expensive ones:
• if (cheap test1= NO)
•
then print ‘NO’
•
else if (test2 = NO)
•
then print ‘NO’
•
else if (test3 = NO)
•
then print ‘NO’
•
else 
@HC 5KK73 Embedded Computer Architecture
71
Practical Dependence Testing
• Cheap tests, like GCD and Banerjee tests,
can disprove many dependences.
• Adding expensive tests only disproves a
few more possible dependences.
• Compiler writer needs to trade-off
compilation time and accuracy of
dependence testing.
• For time critical applications, expensive
tests like Omega test (exact!) can be used.
@HC 5KK73 Embedded Computer Architecture
72
Download