Executing a Program on the MIT Tagged-Token Dataflow Architecture* Presented by: Michael Bauer

advertisement
Executing a Program on the
MIT Tagged-Token Dataflow
Architecture*
(This is true black magic)
Presented by: Michael Bauer
ECE 259/CPS 221
Spring Semester 2008
Dr. Lebeck
* Based on “Executing a Program on the MIT Tagged-Token Dataflow Architecture”
in IEEE Transactions on Computing, March 1990
Arvind
Tagged-Token Dataflow
Architecture
Pai-Mei
Notice they both have only one name and
a special power with eight syllables.
Five Point Palm Exploding
Heart Technique
Outline
1. Motivation
2. What is Dataflow?
3. Id
4. I-Structures
5. Tokens
6. Id Constructs
7. Putting It Together: Single Dataflow Processor
8. Supporting Multiple Processors
9. Conclusions
Motivation
- Lots of transistors to do computation
- Von-Neumann model doesn’t fully exploit available parallelism
- Parallel computing is hard, dataflow is alternative type of parallelism
- Memory and communication latency growing
So what exactly is dataflow?
What is Dataflow?
- Dataflow is a NON-Von-Neumann model of computation
- No concept of instruction order (as specified by program counter)
- No separation between memory and computation (no longer any
concept of load/store ordering)
- Program is specified as a dataflow graph, movement of
data from one operation to the next
- Static dataflow specifies resources at compile time, similar to VLIW*
- Dynamic dataflow performs resource allocation at runtime (the MIT
architecture is dynamic dataflow)*
* Definitions from “Dataflow Architectures and Multithreading” in IEEE Micro, August 1994.
Id
Id, ego, superego...
…not that Id
- Id is a functional programming language
- Everything based on primitives and functions
- No objects (note this a fundamental problem
for dataflow machines)
- Can be compiled into explicit dataflow graphs
Arc
s + A[j] * B[j]
- Dataflow graphs can then be executed by dataflow machine
Operator
I-Structures
Dataflow is inherently stateless
I-structures add some sense of state to aid execution without
compromising parallelism
I-structures are composed of
1. Tag associated with I-structure
2. Some number of physical locations to store data
3. A label for each location specifying whether the location
is ‘absent’, ‘waiting’, or ‘present’
4. A queue for each location of tokens waiting to access that
particular location
Two most important things to remember:
1. I-structures can only be written to once
2. I-structures can be initialized and tag returned without
having any data written to them, all requests block until
a write occurs
Tokens
Tokens represent the propagation of data along edges in the
dataflow graph
Token format:
<c.s, v>p
c – context (used to specify which “frame” token is a part of,
used to resolve which dynamic invocation of a loop or a
function call is referred to)
s – address destination of the instruction
v – actual data
p – which input to the operation this is (e.g. divide t1 t2)
c.s – is called the Tag of the token
Operations take two tokens and generate a new token:
op: <c.s,v1>l x <c.s,v2>r -> <c.t, (v1 op v2)>
Id Constructs (1) Conditionals
Conditionals are inherently
hard for dataflow
Instead utilize switch
statements and combine
operators
Switch blocks only generate
tokens on one output
Note, these are not actual operators
just symbolic (I’m also ignoring idea
of a dataflow graph being wellbehaved)
Id Constructs (2) Loops
Loops generate problems for
dataflow due to their
asynchronous nature.
What happens if ‘s’ tokens
race ahead of ‘j’ tokens for
different contexts?
Use loop throttling to control
the rate at which different
operations are performed.
Id Constructs (3) Functions
Architecture only supports
operations with two inputs
How do you handle an nargument function?
Idea: Partial functions
Recursively use I-structures
to represent (n-1) functions
until reaching n=2
How do we handle recursive
function calls?
Manager programs generate
separate contexts, also
allocate I-structures
Aside: How do we know to release I-structures
after a function call returns?
Well-behaved dataflow graphs: All operations
must have a token argument on each input and
must generate a token output.
Putting it Together: Single Dataflow Processor
1. Look at tokens generated/incoming
and try and match them with
operations in Wait-Match Unit (WMU)
2. If all necessary tokens arrived,
fetch instruction, any constants from
memory and data from I-structures
3. Perform data operation and
compute the tag for the next token
4. Generate the output token and
forward it back to WMU and network
Where do you see the bottlenecks?
Supporting Multiple Processors
- Have multiple processing elements connected by a token passing network
- Can hide latency as other work can occur at PE’s before message arrive
- Problem: How do managers coordinate across PE’s
- Problem: What if application lacks sufficient parallelism?
Multi-threading
anyone?
Conclusion
Dataflow architectures are very good at exploiting parallelism
In practice suffer from several pathologies*:
1. Associative search to match tags doesn’t scale
2. Resource allocation is difficult as number of resources increases
3. Handling data structures/objects is very difficult (e.g. SIMD)
4. Can’t get enough memory close enough to the processor
Von-Neumann acts enough like dataflow to perform better (e.g. out
of order execution, superscalar, branch prediction)
Maybe we can get around some of these:
- Better ways of performing Tag matching
- IRAM (Patterson, 1998) to get DRAM on chip
- New languages/ programming model
- Modified memory interaction (e.g. use Von-Neumann memory
model like Wavescalar)
* Taken from “Dataflow Architectures and Multithreading” in IEEE Micro, August 1994.
Download