MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP) 2007

advertisement
MIT OpenCourseWare
http://ocw.mit.edu
6.189 Multicore Programming Primer, January (IAP) 2007
Please use the following citation format:
Rodric Rabbah, 6.189 Multicore Programming Primer, January (IAP)
2007. (Massachusetts Institute of Technology: MIT OpenCourseWare).
http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative
Commons Attribution-Noncommercial-Share Alike.
Note: Please use the actual date you accessed this material in your citation.
For more information about citing these materials or our Terms of Use, visit:
http://ocw.mit.edu/terms
6.189 IAP 2007
Lecture 17
The Raw Experience
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
1
6.189 IAP 2007 MIT
Raw Chips
October 02
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
2
6.189 IAP 2007 MIT
Raw Microprocessor
● Tiled microprocessor with
point-to-point pipelined scalar
operand network
● Each tiles is 4 mm x 4mm
•
MIPS-style compute processor
–
–
–
Single-issue 8-stage pipe
32b FPU
32K D Cache, I Cache
● 4 on-chip mesh networks
•
•
•
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
3
Two for operands
One for cache misses, I/O
One for message passing
6.189 IAP 2007 MIT
Raw Microprocessor
●
●
●
●
16 tiles (16 issue)
180 nm ASIC (IBM SA-27E)
~100 million transistors
1 million gates
● 3-4 years of development
● 1.5 years of testing
● 200K lines of test code
● Core Frequency:
•
•
425 MHz @ 1.8 V
500 MHz @ 2.2 V
● Frequency competitive with IBM-
implemented PowerPCs in same
process
● 18W average power
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
4
6.189 IAP 2007 MIT
One Cycle in the Life of a Tiled Processor
Image by MIT OCW.
Image by MIT OCW.
● Application uses as many tiles as needed to exploit its parallelism
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
5
6.189 IAP 2007 MIT
Raw Motherboard
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
6
6.189 IAP 2007 MIT
Raw in Action
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
7
6.189 IAP 2007 MIT
MPEG-2 Encoder Performance
350 x 240 Images
720 x 480 Images
16
16
4
Speedup
30
8
30
4
10
1
10
1
1
4
16
8
4
# of Tiles
8
# of Tiles
„
„
„
„
Square – Linear speedup
Diamond – Hand-optimized, slice parallel implementation
Circle – Slice parallel implementation
Triangle – Baseline macroblock parallel implementation
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
8
6.189 IAP 2007 MIT
Frames/s
8
60
Frames/s
Speedup
60
MPEG-2 Encoder Performance
Encoding Rate (frames/s)
# Tiles
352 x 240
640 x 480
720 x 480
1
4.30
1.14
1.00
2
8.48
2.24
1.97
4
16.18
4.45
3.84
8
30.82
8.69
7.52
16
58.65
16.74
14.57
32
103*
30*
64
158*
51.90
* Estimated data rates
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
9
6.189 IAP 2007 MIT
Programmable Graphics Pipeline
Input
Vertex
V
Vertex
Sync
Triangle Setup
Pixel
P
Pixel
simplified graphics pipeline
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
10
6.189 IAP 2007 MIT
Phong Shading
● Per-pixel phong-shaded
polyhedron
● 162 vertices, 1 light
Output, rendered using Raw simulator
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
11
6.189 IAP 2007 MIT
Phong Shading (64-tiles)
Fixed pipeline
Reconfigurable pipeline
● 33% faster
● 150% better
utilization
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
12
6.189 IAP 2007 MIT
Shadow Volumes
● 4 textured triangles
● 1 point light
● Rendered in 3 passes
Output, rendered using Raw simulator
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
13
6.189 IAP 2007 MIT
Shadow Volumes (64-tiles)
Fixed pipeline
Pass 1
Pass 2
Pass 3
Reconfigurable pipeline
Pass 1 Pass 2
Pass 3
● 40% faster
cycles
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
14
6.189 IAP 2007 MIT
1020 Element Microphone Array
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
15
6.189 IAP 2007 MIT
Case Study: Beamformer
1,600
1,430
1,400
MFLOPS
1,200
1,000
800
640
600
400
240
200
19
0
C program
C program
1 GHz Pentium III 420 MHz single tile
Raw
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
16
Unoptimized
StreamIt
Optimized StreamIt
420 MHz 64 tile
Raw
420 MHz 16 tile
Raw
6.189 IAP 2007 MIT
The Raw Experience
● Insights into the design Raw architecture
● Raw parallelizing compiler
● StreamIt language and Compiler
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
17
6.189 IAP 2007 MIT
Scalability Problems in Wide Issue Processors
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Bypass Net
ALU
RF
ALU
Wide
Fetch
(16 inst)
Control
ALU
PC
Unified
Load/Store
Queue
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 18
6.189 IAP 2007 MIT
Area and Frequency Scalability Problems
2
~N
ALU
ALU
ALU
ALU
ALU
ALU
ALU
N ALUs
ALU
~N3
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
ALU
ALU
ALU
ALU
ALU
ALU
Bypass Net
ALU
ALU
RF
19
6.189 IAP 2007 MIT
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Operand Routing is Global
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
20
Bypass Net
+
>>
6.189 IAP 2007 MIT
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Idea: Make Operand Routing Local
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
ALU
ALU
ALU
ALU
ALU
ALU
Bypass Net
ALU
ALU
RF
21
6.189 IAP 2007 MIT
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Idea: Exploit Locality
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
22
Bypass Net
6.189 IAP 2007 MIT
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Replace Crossbar with Point-To-Point Network Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
23
ALU
ALU
ALU
ALU
RF
6.189 IAP 2007 MIT
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 24
ALU
>>
ALU
RF
ALU
+
ALU
Replace Crossbar with Point-To-Point Network 6.189 IAP 2007 MIT
Operand Transport Latency
Crossbar
Point-to-Point Network
Non-local
Placement
~N
~ N½
Locality-driven
Placement
~N
~1
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
25
6.189 IAP 2007 MIT
Distribute the Register File
RF
ALU
RF
ALU
ALU
RF
RF
26
RF
ALU
ALU
RF
ALU
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
RF
ALU
ALU
RF
RF
RF
RF
ALU
ALU
RF
ALU
ALU
RF
ALU
ALU
ALU
RF
RF
RF
ALU
RF
6.189 IAP 2007 MIT
More Scalability Problems
RF
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
ALU
RF
ALU
RF
27
RF
ALU
ALU
RF
ALU
Unified
Load/Store
Queue
ALU
ALU
RF
RF
ALU
E
L
B
A
L
A
C
S
RF
ALU
ALU
RF
RF
RF
ALU
Control
RF
ALU
ALU
RF
Wide
Fetch
(16 inst)
RF
ALU
ALU
PC
RF
ALU
RF
6.189 IAP 2007 MIT
More Scalability Problems
PC
Wide
Fetch
(16 inst)
Control
Unified
Load/Store
Queue
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
28
6.189 IAP 2007 MIT
Distribute Everything
I$
D$
ALU
RF
I$
D$
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
D$
29
RF
I$
ALU
ALU
RF
PC
RF
PC
D$
PC
I$
D$
ALU
ALU
ALU
RF
PC
I$
RF
D$
PC
I$
PC
RF
I$
D$
RF
ALU
RF
PC
I$
Unified
Load/Store D$
Queue
I$
D$
PC
I$
D$
I$
PC
RF
RF
PC
RF
I$
ALU
D$
PC
PC
D$
D$
ALU
RF
RF
ALU
I$
I$
ALU
Control
PC
I$
PC
RF
D$
ALU
Wide
Fetch
(16 inst)
PC
ALU
D$
ALU
I$
ALU
PC
RF
ALU
PC
D$
6.189 IAP 2007 MIT
Tiled Processor
I$
I$
D$
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
D$
30
PC
RF
I$
ALU
ALU
PC
RF
I$
D$
RF
ALU
RF
PC
RF
ALU
ALU
ALU
PC
I$
D$
PC
I$
D$
RF
PC
I$
RF
ALU
ALU
ALU
RF
RF
PC
I$
D$
RF
D$
ALU
RF
PC
D$
I$
D$
PC
I$
PC
RF
PC
D$
D$
PC
I$
RF
ALU
RF
D$
I$
D$
I$
D$
ALU
I$
I$
PC
RF
ALU
PC
PC
ALU
D$
ALU
I$
RF
ALU
PC
D$
6.189 IAP 2007 MIT
Tiled Processor
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
31
6.189 IAP 2007 MIT
Tiled Processor (Taylor PhD 2007)
● Fast inter-tile
communication through
point-to-point pipelined
scalar operand network
(SON)
● Easy to scale for the
same reasons as
multicores
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
32
6.189 IAP 2007 MIT
Raw Compute Processor Internals
r24
r24
r25
r25
r26
r27
IF
r26
E
M1
D
r27
M2
A
TL
F
P
RF
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
U
33
6.189 IAP 2007 MIT
Tile-Tile Communication
add $25,$1,$2
Route $P->$E
Route $W->$P
sub $20,$1,$25
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
34
6.189 IAP 2007 MIT
Why Communication Is Expensive on Multicores
Multiprocessor Node 1
send
occupancy
Multiprocessor Node 2
Transport Cost
receive
receive
overhead
latency
send
send
overhead
latency
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
receive
occupancy
35
6.189 IAP 2007 MIT
Why Communication Is Expensive on Multicores
Multiprocessor Node 1
send
occupancy
send
latency
Destination node name
Sequence number
Value
Launch sequence
Commit Latency
Network injection
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
36
6.189 IAP 2007 MIT
Why Communication Is Expensive on Multicores
Multiprocessor Node 2
receive sequence
receive
occupancy
injection cost
receive
latency
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
37
6.189 IAP 2007 MIT
A Figure of Merit for SONs
● 5-tuple <SO, SL, NHL, RL, RO>
• • • • • Send occupancy
Send latency
Network hop latency
Receive latency
Receive occupancy
● Tip: Ordering follows timing of message from sender
to receiver
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
38
6.189 IAP 2007 MIT
The Interesting Region
Scalable
Multiprocessor
● Where is Cell in
this space?
<2, 14, 3, 14,4>
(on-chip)
Raw SON < 0, 0, 1, 2, 0>
Superscalar
< 0, 0, 0, 0, 0>
(scalable)
(not scalable) Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
39
6.189 IAP 2007 MIT
The Raw Experience
● Insights into the design Raw architecture
● Raw parallelizing compiler
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
40
6.189 IAP 2007 MIT
Raw Parallelizing Compiler (Lee PhD 2005)
Sequential
Program
Compiler
● Data distribution
● Instruction distribution
● Coordination
• • Communication
Control flow
Fine-grained
Orchestrated
Parallel Program
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
41
6.189 IAP 2007 MIT
Data Distribution
?
load r1 <- addr
add r1 <- r1, 1
?
M
M
M
M
?
?
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
42
6.189 IAP 2007 MIT
Instruction Distribution
seed.0=seed
pval1=seed.0*3.0
v1.2=v1
v2.4=v2
pval5=seed.0*6.0
pval0=pval1+2.0
pval2=seed.0*v1.2
pval3=seed.o*v2.4
pval4=pval5+2.0
tmp1.3=pval2+2.0
tmp0.1=pval0/2.0
tmp2.5=pval3+2.0
tmp3.6=pval4/3.0
tmp1=tmp1.3
tmp0=tmp0.1
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
tmp3=tmp3.6
pval6=tmp1.3-tmp2.5
v1.8=pval7*3.0
v2.7=pval6*5.0
v0.9=tmp0.1-v1.8
v1=v1.8
v0=v0.9
v3.10=tmp3.6-v2.7
v2=v2.7
v3=v3.10
basic block
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
43
6.189 IAP 2007 MIT
Clustering: Parallelism vs. Communication
seed.0=seed
seed.0=seed
pval1=seed.0*3.0
pval1=seed.0*3.0
v1.2=v1
v1.2=v1
v2.4=v2
v2.4=v2
pval5=seed.0*6.0
pval5=seed.0*6.0
pval0=pval1+2.0
pval2=seed.0*v1.2
pval0=pval1+2.0
pval2=seed.0*v1.2
pval3=seed.o*v2.4
pval3=seed.o*v2.4
pval4=pval5+2.0
pval4=pval5+2.0
tmp1.3=pval2+2.0
tmp1.3=pval2+2.0
tmp0.1=pval0/2.0
tmp0.1=pval0/2.0
tmp2.5=pval3+2.0
tmp3.6=pval4/3.0
tmp1=tmp1.3
tmp0=tmp0.1
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
tmp3=tmp3.6
tmp3=tmp3.6
pval6=tmp1.3-tmp2.5
pval6=tmp1.3-tmp2.5
v1.8=pval7*3.0
v1.8=pval7*3.0
v2.7=pval6*5.0
v2.7=pval6*5.0
v0.9=tmp0.1-v1.8
v0.9=tmp0.1-v1.8
v1=v1.8
v0=v0.9
tmp3.6=pval4/3.0
tmp1=tmp1.3
tmp0=tmp0.1
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
tmp2.5=pval3+2.0
v1=v1.8
v3.10=tmp3.6-v2.7
v0=v0.9
v2=v2.7
v2=v2.7
v3=v3.10
v3=v3.10
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
v3.10=tmp3.6-v2.7
44
6.189 IAP 2007 MIT
Adjusting Granularity: Load Balancing
seed.0=seed
pval1=seed.0*3.0
seed.0=seed
pval1=seed.0*3.0
v1.2=v1
v1.2=v1
v2.4=v2
v2.4=v2
pval5=seed.0*6.0
pval0=pval1+2.0
pval5=seed.0*6.0
pval2=seed.0*v1.2
pval0=pval1+2.0
pval3=seed.o*v2.4
pval2=seed.0*v1.2
pval3=seed.o*v2.4
pval4=pval5+2.0
pval4=pval5+2.0
tmp1.3=pval2+2.0
tmp1.3=pval2+2.0
tmp0.1=pval0/2.0
tmp0.1=pval0/2.0
tmp2.5=pval3+2.0
tmp3.6=pval4/3.0
tmp1=tmp1.3
tmp0=tmp0.1
tmp0=tmp0.1
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
tmp3=tmp3.6
pval6=tmp1.3-tmp2.5
tmp3=tmp3.6
pval6=tmp1.3-tmp2.5
v1.8=pval7*3.0
v1.8=pval7*3.0
v2.7=pval6*5.0
v2.7=pval6*5.0
v0.9=tmp0.1-v1.8
v0.9=tmp0.1-v1.8
v1=v1.8
v0=v0.9
tmp3.6=pval4/3.0
tmp1=tmp1.3
v1=v1.8
v3.10=tmp3.6-v2.7
v0=v0.9
v2=v2.7
v3=v3.10
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
v3.10=tmp3.6-v2.7
v2=v2.7
v3=v3.10
45
6.189 IAP 2007 MIT
Placement
seed.0=seed
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
pval1=seed.0*3.0
v1.2=v1
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v2.4=v2
tmp0.1=pval0/2.0
pval5=seed.0*6.0
pval0=pval1+2.0
pval5=seed.0*6.0
v3.10=tmp3.6-v2.7
pval2=seed.0*v1.2
pval3=seed.o*v2.4
tmp0=tmp0.1
pval4=pval5+2.0
tmp1.3=pval2+2.0
v3=v3.10
tmp0.1=pval0/2.0
tmp2.5=pval3+2.0
tmp3.6=pval4/3.0
tmp1=tmp1.3
tmp0=tmp0.1
tmp2=tmp2.5
pval7=tmp1.3+tmp2.5
v1.2=v1
v2.4=v2
pval2=seed.0*v1.2
pval3=seed.o*v2.4
tmp1.3=pval2+2.0
tmp2.5=pval3+2.0
tmp3=tmp3.6
pval6=tmp1.3-tmp2.5
v1.8=pval7*3.0
v2.7=pval6*5.0
v0.9=tmp0.1-v1.8
v1=v1.8
v0=v0.9
tmp1=tmp1.3
v3.10=tmp3.6-v2.7
pval7=tmp1.3+tmp2.5
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2=v2.7
v3=v3.10
v1.8=pval7*3.0
v0.9=tmp0.1-v1.8
v2.7=pval6*5.0
v2=v2.7
v1=v1.8
v0=v0.9
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
46
6.189 IAP 2007 MIT
Communication Coordination
Proc
Proc
Proc
receive
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
47
Switch
Switch
routes
Proc
send
6.189 IAP 2007 MIT
Instruction Scheduling
v1.2=v1
v2.4=v2
seed.0=seed
send(seed.0)
pval1=seed.0*3.0
seed.0=recv(0)
pval3=seed.o*v2.4
route(t,E)
route(W,t)
route(t,E)
route(N,t)
seed.0=recv()
pval2=seed.0*v1.2
seed.0=recv()
pval5=seed.0*6.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp2.5=pval3+2.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp1.3=pval2+2.0
tmp2=tmp2.5
send(tmp2.5)
route(t,E)
route(E,t)
send(tmp1.3)
tmp1=tmp1.3
tmp1.3=recv()
pval6=tmp1.3-tmp2.5
tmp2.5=recv()
v2.7=pval6*5.0
tmp0.1=recv()
pval7=tmp1.3+tmp2.5
route(t,W)
route(W,t)
send(tmp0.1)
tmp0=tmp0.1
tmp3=tmp3.6
route(t,E)
route(W,S)
route(W,t)
v1.8=pval7*3.0
Send(v2.7)
v2=v2.7
route(t,E)
route(W,N)
v1=v1.8
v0.9=tmp0.1-v1.8
route(S,t)
v2.7=recv()
v3.10=tmp3.6-v2.7
v0=v0.9
v3=v3.10
● Inter-tile cycle scheduling schedules communication
and can guarantee deadlock freedom
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
48
6.189 IAP 2007 MIT
Final Code Representation
v2.4=v2
route(t,E)
v1.2=v1
route(N,t)
seed.0=recv()
route(W,t)
seed.0=seed
route(t,E)
seed.0=recv(0)
route(t,E)
seed.0=recv()
route(t,W)
pval5=seed.0*6.0
route(W,S)
send(seed.0)
route(t,E)
pval3=seed.o*v2.4
route(E,t)
pval2=seed.0*v1.2
route(W,t)
pval4=pval5+2.0
route(S,t)
pval1=seed.0*3.0
tmp2.5=pval3+2.0
route(t,E)
tmp1.3=pval2+2.0
route(W,t)
tmp3.6=pval4/3.0
pval0=pval1+2.0
tmp2=tmp2.5
send(tmp1.3)
route(W,N)
tmp3=tmp3.6
tmp0.1=pval0/2.0
send(tmp2.5)
tmp1=tmp1.3
v2.7=recv()
send(tmp0.1)
tmp1.3=recv()
tmp2.5=recv()
v3.10=tmp3.6-v2.7
tmp0=tmp0.1
pval6=tmp1.3-tmp2.5
tmp0.1=recv()
v3=v3.10
v2.7=pval6*5.0
pval7=tmp1.3+tmp2.5
Send(v2.7)
v1.8=pval7*3.0
v2=v2.7
v1=v1.8
v0.9=tmp0.1-v1.8
v0=v0.9
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
49
6.189 IAP 2007 MIT
Control Coordination
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
50
6.189 IAP 2007 MIT
Asynchronous Global Branching
br x
br x
br x
br x
br x
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
51
6.189 IAP 2007 MIT
Summary
● Tiled microprocessors incorporate the best elements
of superscalars and multiprocessors
Superscalar
Multicore
Tiled Processor
with SON
PE-PE communication
Free
Expensive
Cheap
exploitation
of parallelism
Implicit
Explicit
Both
Clean semantics
Yes
No
Yes
scalable
No
Yes
Yes
power efficient
No
Yes
Yes
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
52
6.189 IAP 2007 MIT
Raw Project Contributors
Anant Agarwal
Saman Amarasinghe
Jonathan Babb
Rajeev Barua
Ian Bratt
Jonathan Eastep
Matt Frank
Ben Greenwald
Henry Hoffmann
Paul Johnson
Jason Kim
Theo Konstantakopoulos
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
Walter Lee
Jason Miller
James Psota
Arvind Saraf
Vivek Sarkar
Nathan Shnidman
Volker Strumpen
Michael Taylor
Elliot Waingold
David Wentzlaff
53
6.189 IAP 2007 MIT
Download