Alviso Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD) © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions Alviso Motivation • Make NetFPGA programming accessible to network designers • NetFPGA: FPGA-based 4-port switch board • Key to building high-speed software-defined networks • Typical NetFPGA designer knows software routers, not VLSI design NetFPGA • • • Basic building block is a Xilinix FPGA (Virtex-II) Programming tools are Verilog (simulator-based HDL), Synopsys/Cadence/Xilinx synthesis tools for FPGA Problems − Verilog very low-level design tool − Many details of hardware design must be mastered by designer − No high-level network-based design environment • Why is this interesting? − Software on GP processors still can’t keep up with modern switching equipment − Modern high-performance software routers require very substantial hardware (typically, GPGPUs) Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions Alviso • C-like language whose modules can be easily realized as either hardware or software • Some restrictions − No memory allocation − No functions or recursion − Built-in parallelism − No forks Alviso Elements • Module: Basic unit of design − Roughly equivalent to an Object in software design − Container of processes (see below) and variables − No shared variables across module boundaries! − All communications into/out of modules through “ports” (similar to software parameters) − Exactly equivalent to hardware ports Alviso Elements • Process: Basic element of computation • Roughly equivalent to a thread in software • Equivalent to a block of logic in hardware • Begins immediately on load • Runs to completion Alviso Elements • Port: Variable explicitly written or read by a process in a module • Sole means of communication into/out of a module • Equivalent to a hardware port − Always latched (see below) • Roughly equivalent (in software) to a public object variable with a get/set method (read = get, write = set) Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions The Problem of Parallel Design • • The central assumption of design: the finite state model of computation Every variable is a little FSM − Quiescent unless explicitly perturbed by an instruction − But parallel design breaks this model for shared variables x=2; x=x+1; x = 3…right? proc thread1() { x=2; x=x+1; } proc thread2() { x=x*100; } Value of x is indeterminate All the Problems in Parallel Design Break Down into solving this • How do we recover a semantically-consistent deterministic model of design with efficient communications? • A key to efficient multicore programming, hardware/software codesign,…. • There are other problems, but without solving this one they are all built on a house of sand… Historical Answer: Restrict Communications • • • Problem is fundamentally one of communication Unrestricted asynchronous communication breaks design model Solution 1: No shared variables between threads − Inefficent: effectively, every thread is in its own address space • Solution 2: Locks and semaphores: restrict ability of other threads to play with state during computation − Deadlock! − Locks themselves become a nondeterministic, asynchronous communication channel…. Requirements of our Solution • • • • Semantics independent of external systems (e.g., a thread scheduler) Efficient communication between threads Designs fully implementable in either hardware or software Module behavior identical independent of hardware or software realization – semantics independent of implementation − A caveat: mixed hardware/software systems will vary in behavior, depending on mix of hardware/software components − Hardware components are much faster than software components Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Synchronous Languages A Practical Realization Restricted combinational communication − − − − • Motivation The Mutex statement Strict priority on processes Recovering maximal parallelism Status and Conclusions Lessons From Hardware • Hardware Design is… − Highly parallel − Efficient − Deterministic − Independent of mysteries such as thread scheduling…. • How did those guys do that? − And, more to the point, how can we? Classic Hardware Design • • Data flows unidirectionally in logic, latches update at clock edge Banks of acyclic “combinational” logic, separated by clocked latches Latch Logic Latch Logic Means… • Acyclic logic: logic banks compute in fixed time – length of longest path through the circuit • Latches update only on clock edges: value of logic inputs stable during computation • Computation divided into “cycles” of fixed length: no communication between logic blocks during computation Mapping Alviso to Hardware Ports Latch Logic Process(es) Latch Variables Logic Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions Adapting Hardware to Languages • Shared Variables == Latches • Logic Blocks == Threads • Threads run for fixed block of time, then “wait” for next cycle of computation • Shared variables only update when all threads are waiting • No interrupts, no locks, no semaphores…. Alviso • Synchronous/Reactive Language − Computation in “zero” time, communication takes time “one” − Means: no communication while computing − Follows: Esterel, Lustre, ReactiveC, Signal, V++, SMV • • • • • C-like syntax Major new innovation: “wait” statement “wait”: halt computation and wait for variables to update Each thread must execute a wait statement within a fixed period of time Means: each cyclic computation graph (aka, loop) must contain a wait statement A quick example proc thread1 { x = 2; while(true) { x++; wait; } } proc thread2 { wait; while(true) { x <<= 1; wait; } } x= 3 But what about after the wait? x= 4 Answer: Deterministic Priority • What happens with conflicts on shared variable updates? − No effect on computation: updates only visible after wait − But x can only have one value…which should we choose? − Answer: priority. Processes have deterministic priority (total order on processes). In the event of conflict, higher-priority process wins Alviso Computational Graph • wait statements lead to a computation graph that is a forest of DAGs − Roots of the DAGs: initial statements of processes and statements immediately following wait statements − Leaves: final statements of processes and wait statements • Computation terminates at a leaf on each cycle • Starts on the next cycle at the subsequent root • Computation in cycle is traversal of the DAG from root to leaf Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions Interprocess Zero-Delay Signaling • • Sometimes, you just have to break the rules Occasionally, processes need to signal each other in the same cycle − To gain exclusive access to a shared variable, for example − Multi-cycle locking too inefficient • Almost every S/R language eventually incorporates some form of zero-delay interprocess signaling − Exceptions: V++, ReactiveC − Almost always makes hash of the semantics − Question: How can we do interprocess zero-delay signaling without making a mess? Answer: Go Back to Hardware • Zero-delay signaling is OK: what makes a mess is zero-delay loops − Hardware: run zero-delay wires in only one direction − Software: impose a priority order on processes • High-priority processes execute “first” • Higher-priority processes can signal lower-priority processors (but not vice-versa) • Concrete realization: Mutex Mutex • Single-bit shared variable − Two states: “locked” and “unlocked” mutex foo; If (foo.lock()) { …execute guarded code… } • lock() operation − Only succeeds (returns 1) if mutex is unlocked − Prevents any subsequent lock on foo from succeeding until unlock() is executed − unlock() releases lock at the beginning of next cycle − So, e.g., if (foo.lock()) foo.unlock() holds lock for this cycle Implementing Mutex Safely • Hardware: no issue − Arrange blocks of logic corresponding to processes in priority order − Mutex signals flow from high-priority to low-priority process − Arbitration on variable write works the same way • Software: same idea − Run processes in priority order − High-priority processes run before lower-priority processes − Mutex locks in high-priority process automatically visible to lower-priority process − But price is very high: conceptually, serialized a parallel computation Recovering Parallelism With Mutexes • Recall: Each process defines a forest of DAGS − Call each such DAG a fiber • Each Mutex defines a partial order among fibers − FA > FB iff • Fiber A and Fiber B both lock Mutex F • A is higher priority than B • At every cycle, exactly one fiber per process will run − For this cycle, choose any schedule consistent with partial orders on runnable fibers − Optimization: locked mutexes don’t affect schedule (all lock operations in cycle will fail, and only succesful locks introduce dependency) − Therefore: disregard partial orders imposed by locked mutexes Outline • • • • • • • Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication − Motivation − The Mutex statement − Strict priority on processes − Recovering maximal parallelism Status and Conclusions Alviso Status And Conclusion • Hardware synthesis chain written and tested on a few sample designs − Need for zero-delay intermodule communication noted − Arbitration on memory interface • • • • Software interpreter written and tested XML intermediate form under development Planned first release April 2011 Is it perfect? Far from it… − Need users to help us figure out how to make it better − Contact: erik.rubow@ericsson.com, rick.mcgeer@hp.com