HP PowerPoint Advanced Template

advertisement
Alviso
Rick McGeer (HP)
Erik Rubow (Ericsson)
Stephen Lonergan (U Vic)
Amin Vahdat (UCSD)
© 2008 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
Alviso Motivation
•
Make NetFPGA programming
accessible to network
designers
•
NetFPGA: FPGA-based 4-port
switch board
•
Key to building high-speed
software-defined networks
•
Typical NetFPGA designer
knows software routers, not
VLSI design
NetFPGA
•
•
•
Basic building block is a Xilinix FPGA (Virtex-II)
Programming tools are Verilog (simulator-based HDL),
Synopsys/Cadence/Xilinx synthesis tools for FPGA
Problems
− Verilog very low-level design tool
− Many details of hardware design must be mastered by designer
− No high-level network-based design environment
•
Why is this interesting?
− Software on GP processors still can’t keep up with modern
switching equipment
− Modern high-performance software routers require very substantial
hardware (typically, GPGPUs)
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
Alviso
•
C-like language whose modules can be easily
realized as either hardware or software
•
Some restrictions
− No memory allocation
− No functions or recursion
− Built-in parallelism
− No forks
Alviso Elements
•
Module: Basic unit of design
− Roughly equivalent to an Object in software design
− Container of processes (see below) and variables
− No shared variables across module boundaries!
− All communications into/out of modules through “ports”
(similar to software parameters)
− Exactly equivalent to hardware ports
Alviso Elements
•
Process: Basic element of computation
•
Roughly equivalent to a thread in software
•
Equivalent to a block of logic in hardware
•
Begins immediately on load
•
Runs to completion
Alviso Elements
•
Port: Variable explicitly written or read by a
process in a module
•
Sole means of communication into/out of a
module
•
Equivalent to a hardware port
− Always latched (see below)
•
Roughly equivalent (in software) to a public
object variable with a get/set method (read = get,
write = set)
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
The Problem of Parallel Design
•
•
The central assumption of
design: the finite state
model of computation
Every variable is a little
FSM
− Quiescent unless explicitly
perturbed by an instruction
− But parallel design breaks
this model for shared
variables
x=2;
x=x+1;
x = 3…right?
proc thread1() {
x=2;
x=x+1;
}
proc thread2() {
x=x*100;
}
Value of x is indeterminate
All the Problems in Parallel Design
Break Down into solving this
•
How do we recover a semantically-consistent
deterministic model of design with efficient
communications?
•
A key to efficient multicore programming,
hardware/software codesign,….
•
There are other problems, but without solving this
one they are all built on a house of sand…
Historical Answer: Restrict
Communications
•
•
•
Problem is fundamentally one of communication
Unrestricted asynchronous communication breaks
design model
Solution 1: No shared variables between threads
− Inefficent: effectively, every thread is in its own address
space
•
Solution 2: Locks and semaphores: restrict ability of
other threads to play with state during computation
− Deadlock!
− Locks themselves become a nondeterministic,
asynchronous communication channel….
Requirements of our Solution
•
•
•
•
Semantics independent of external systems (e.g., a
thread scheduler)
Efficient communication between threads
Designs fully implementable in either hardware or
software
Module behavior identical independent of hardware
or software realization – semantics independent of
implementation
− A caveat: mixed hardware/software systems will vary in
behavior, depending on mix of hardware/software
components
− Hardware components are much faster than software
components
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel
Systems
Lessons from Hardware: Restricted Communication between
modules
Synchronous Languages
A Practical Realization
Restricted combinational communication
−
−
−
−
•
Motivation
The Mutex statement
Strict priority on processes
Recovering maximal parallelism
Status and Conclusions
Lessons From Hardware
•
Hardware Design is…
− Highly parallel
− Efficient
− Deterministic
− Independent of mysteries such as thread scheduling….
•
How did those guys do that?
− And, more to the point, how can we?
Classic Hardware Design
•
• Data flows
unidirectionally in
logic, latches
update at clock
edge
Banks of acyclic
“combinational” logic,
separated by clocked
latches
Latch
Logic
Latch
Logic
Means…
•
Acyclic logic: logic banks compute in fixed time –
length of longest path through the circuit
•
Latches update only on clock edges: value of
logic inputs stable during computation
•
Computation divided into “cycles” of fixed length:
no communication between logic blocks during
computation
Mapping Alviso to Hardware
Ports
Latch
Logic
Process(es)
Latch
Variables
Logic
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
Adapting Hardware to Languages
•
Shared Variables == Latches
•
Logic Blocks == Threads
•
Threads run for fixed block of time, then “wait” for
next cycle of computation
•
Shared variables only update when all threads
are waiting
•
No interrupts, no locks, no semaphores….
Alviso
•
Synchronous/Reactive Language
− Computation in “zero” time, communication takes time “one”
− Means: no communication while computing
− Follows: Esterel, Lustre, ReactiveC, Signal, V++, SMV
•
•
•
•
•
C-like syntax
Major new innovation: “wait” statement
“wait”: halt computation and wait for variables to
update
Each thread must execute a wait statement within a
fixed period of time
Means: each cyclic computation graph (aka, loop)
must contain a wait statement
A quick example
proc thread1 {
x = 2;
while(true) {
x++;
wait;
}
}
proc thread2 {
wait;
while(true) {
x <<= 1;
wait;
}
}
x=
3
But what about after the wait?
x=
4
Answer: Deterministic Priority
•
What happens with conflicts on shared variable
updates?
− No effect on computation: updates only visible after
wait
− But x can only have one value…which should we
choose?
− Answer: priority. Processes have deterministic priority
(total order on processes). In the event of conflict,
higher-priority process wins
Alviso Computational Graph
•
wait statements lead to a computation graph that
is a forest of DAGs
− Roots of the DAGs: initial statements of processes and
statements immediately following wait statements
− Leaves: final statements of processes and wait
statements
•
Computation terminates at a leaf on each cycle
•
Starts on the next cycle at the subsequent root
•
Computation in cycle is traversal of the DAG from
root to leaf
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
Interprocess Zero-Delay Signaling
•
•
Sometimes, you just have to break the rules
Occasionally, processes need to signal each other in
the same cycle
− To gain exclusive access to a shared variable, for example
− Multi-cycle locking too inefficient
•
Almost every S/R language eventually incorporates
some form of zero-delay interprocess signaling
− Exceptions: V++, ReactiveC
− Almost always makes hash of the semantics
− Question: How can we do interprocess zero-delay signaling
without making a mess?
Answer: Go Back to Hardware
•
Zero-delay signaling is OK: what makes a mess
is zero-delay loops
− Hardware: run zero-delay wires in only one direction
− Software: impose a priority order on processes
• High-priority processes execute “first”
• Higher-priority processes can signal lower-priority processors
(but not vice-versa)
•
Concrete realization: Mutex
Mutex
•
Single-bit shared variable
− Two states: “locked” and “unlocked”
mutex foo;
If (foo.lock()) {
…execute guarded code…
}
•
lock() operation
− Only succeeds (returns 1) if mutex is unlocked
− Prevents any subsequent lock on foo from succeeding until
unlock() is executed
− unlock() releases lock at the beginning of next cycle
− So, e.g., if (foo.lock()) foo.unlock() holds lock for this cycle
Implementing Mutex Safely
•
Hardware: no issue
− Arrange blocks of logic corresponding to processes in
priority order
− Mutex signals flow from high-priority to low-priority process
− Arbitration on variable write works the same way
•
Software: same idea
− Run processes in priority order
− High-priority processes run before lower-priority processes
− Mutex locks in high-priority process automatically visible to
lower-priority process
− But price is very high: conceptually, serialized a parallel
computation
Recovering Parallelism With Mutexes
•
Recall: Each process defines a forest of DAGS
− Call each such DAG a fiber
•
Each Mutex defines a partial order among fibers
− FA > FB iff
• Fiber A and Fiber B both lock Mutex F
• A is higher priority than B
•
At every cycle, exactly one fiber per process will run
− For this cycle, choose any schedule consistent with partial
orders on runnable fibers
− Optimization: locked mutexes don’t affect schedule (all lock
operations in cycle will fail, and only succesful locks
introduce dependency)
− Therefore: disregard partial orders imposed by locked
mutexes
Outline
•
•
•
•
•
•
•
Motivation
A Quick Tour of Alviso
The Problem of Unrestricted Communication in Parallel Systems
Lessons from Hardware: Restricted Communication between
modules
Alviso: A Synchronous Language
Restricted combinational communication
− Motivation
− The Mutex statement
− Strict priority on processes
− Recovering maximal parallelism
Status and Conclusions
Alviso Status And Conclusion
•
Hardware synthesis chain written and tested on a few
sample designs
− Need for zero-delay intermodule communication noted
− Arbitration on memory interface
•
•
•
•
Software interpreter written and tested
XML intermediate form under development
Planned first release April 2011
Is it perfect? Far from it…
− Need users to help us figure out how to make it better
− Contact: erik.rubow@ericsson.com, rick.mcgeer@hp.com
Download