The Problem with Threads

Based on work by Edward A. Lee (2006) Presented by Leeor Peled, June 2010 Seminar in VLSI Architectures (048879) Asynchronous computing  During this course, we learned how to design asynchronous logic, how to coordinate and time its elements, and how to build async elements, controllers and data paths.  It’s now time to investigate further layers of computing systems and see if we can utilize what we learned there. Wire delays, Gate delays Signal level CE’s Data Dependency Clock skewing RTL/CL level SOC level Handshake protocols OS scheduling, Interrupts, Threads! ? SW domain GALS ? SW Parallelism  Most applications are serial  HW manipulates Inst/mem/data level parallelism  Superscaling, OOO, Vectorization (SIMD)  Dependencies still limit the parallelism.  Still high penalty on mem access, IO  Thread level parallelism –  Software manipulation - high latency stall  switch context  Good for multiple tasks (e.g. servers), but can we boost a single app?  Yes. Write concurrent code!  But Very hard to develop  Bug prone  Few software paradigms / programming models SW Parallelism (cont.)  Interesting similarity between SW to HW:  Asynchronous ≈ parallel ? Faster ,more efficient but also  Non-deterministic   Various possibilities for the order of occurrence - Must be prepared for each.  Race condition may occur between threads just like signals  So why not use similar methods? Parallelism examples – Fine Grain Parallelization (Taken from Ginosar, “many-cores” slides)  Convert (independent) loop iterations  for ( i=0; i<10000; i++ ) { a[i] = b[i]*c[i]; }  Into parallel tasks  duplicable task XX(…) 10000 { ii = INSTANCE; a[ii] = b[ii]*c[ii]; }  All tasks, or any subset, can be executed in parallel 5 Linear Solver: Simulation snap-shots (Taken from Ginosar, “many-cores” slides) Parallelism examples (cont.)  Unfortunately, not all applications are “embarrassingly parallel”.  In reality we employ various “design patterns” that were thoroughly investigated (and available in libs)  Producer-Consumer model : procedure producer() { while (true) { item = produceItem() procedure consumer() { while (true) { if (itemCount == 0) { sleep() } if (itemCount == BUFFER_SIZE) { sleep() } item = removeItemFromBuffer() itemCount = itemCount - 1 putItemIntoBuffer(item) itemCount = itemCount + 1 if (itemCount == BUFFER_SIZE - 1) { wakeup(producer) } if (itemCount == 1) { wakeup(consumer) } } consumeItem(item) } } } Producer-Consumer visualization Looks familiar? http://www.eonclash.com/Tutorials/Multithreading/MartinHarvey1.1/Ch9.html Threads: problem statement  Real workloads must work very hard to sync concurrent code.  Following example shows the problem with unprotected access  Serial: functinos A and B can be called in any order.  Possible outputs are 0,0 and 1,1 A: St [x],1 St [y],1  Concurrent: also possible 0,1 (what about 1,0?).  How would the program react?  Design Issues:  Memory ordering  Coherency  Consistency  Debugability B: S = ld [x] T = ld [y] Print S,T Threads: problem statement (cont.)  Invalid results are bad, but some problems are worse –  Deadlock  Livelocks  Example – observer pattern (in Java): public class ValueHolder { public void addListener(listener) {…} public void setValue(newValue) { myValue = newValue; for (int i = 0; i < myListeners.length; i++) { myListeners[i].valueChanged(newValue) } }  What’s the problem? Threads: problem statement (cont.)  Invalid results are bad, but some problems are worse –  Deadlock  Livelocks  Example – observer pattern (in Java): public class ValueHolder { public synchronized void addListener(listener) {…} public synchronized void setValue(newValue) { myValue = newValue; for (int i = 0; i < myListeners.length; i++) { myListeners[i].valueChanged(newValue) } }  What’s the problem? Threads: problem statement (cont.)  Invalid results are bad, but some problems are worse –  Deadlock  Livelocks  Example – observer pattern (in Java): public synchronized void addListener(listener) {…} public void setValue(newValue) { synchronized(this) { myValue = newValue; listeners = myListeners.clone(); } for (int i = 0; i < listeners.length; i++) { listeners[i].valueChanged(newValue) } }  What’s the problem? Other Synchronizing Object Threads: the bleak reality All Programmers Programmers who use threads Those who Want to do it properly  Threads: current methods  Currently, the only defenses againt such problems are –  The technical aspect –  Analyze software structure using dedicated tools (formal verification)  Blast, intel thread checker  Use protected languages  Cilk, Split-C (also various SW TM flavors) – lock/sync semantics  Guava (private mem space for unsynced objectes)  Use predefined design patterns  Transactions (DB), TM  The human aspect –  Employ experienced programmers  Apply a strict software design process (code reviews, debug sessions)  Coding rules (lock acquiring order)  The business aspect – be prepared to recall and compensate often… Parallel objects - solutions  Lee’s Observation: It’s not concurrency that is inherently difficult it’s just the thread model!  Key issues here - a thread shares everything, so everything might change for it between two atomic actions.  Threads may interleave in any way (memory ordering has vast options) can change state on all other threads t1 t0 A A’  Parallel computation with threads can be shown to explode exponentially in the number of outcomes  Long, boring mathematical proof ahead…  But In fact - we usually only need to share a single message or data stream! Some math  Let :  N={0,1,2,3,...}  B={0,1}  B* : the set of all finite bit sequences  Bω:(NB) : the set of all infinite bit sequences  B** = B* U Bω will represent the state of the computing macine  Q: (B**B**)  An imperative macine M=(A,c) is composed of a finite set of atomic “instructions” A ⊂ Q , and a control function c: B**N that represents how they’re sequenced.  A “halt” instruction h ∊ A is defined : ∀ b ∊ B**, h(b)=b  A sequential program (length m) is a function p:NA, s.t. ∀n≥m, p(n)=h  The set of all programs is countably infinite (|P|=0‫)א‬  An execution of p starts with b0 ∊ B**, and ∀n∊N, bn+1=p(c(bn),bn) Some math (cont’d)  Now, for multiple threads, we replace the program execution with –  bn+1=pi(c(bn),bn), i∊{1,2}  Each action is atomic, but for each step, i (the active context) is determined arbitrarily (we’re assuming no simultaneous execution for simplicity).  The correct notation should be: bn+1=pin(c(bn),bn), in∊{1,2}  Let S:({1..m}{1,2}) be the vector of contexts (i0, i1, ..im), so |S|=2m  Interleaving leads to exponential growth in possible outcomes, even for a given set of programs and initial state.  Further advantages of sequential programs  The sequence bn is well defined.  The function computed by the program is partially defined for each input leading to halt.  p1 and p2 can be compared  Multithreading also makes these exponentially harder. Parallel objects - solutions (cont.)  What other solutions do we have to activate multiple objects concurrently?  Move from object-oriented design to actor-oriented  Also similar to the async logic we discussed – each logical element is in charge of its own input/output  To compare – OO equivalent in VLSI means that the signals would have to be “responsible” for their own correct transfer   Let us study the following 4 actor oriented models of computation (MOCs)  Rendezvous  PN (process network)  SR (synchronous/reactive)  DE (discrete events)  these MOCs are all different alternatives with a similar computability strength, but one might be better than the other for some design patterns Actor oriented design - Rendezvous  Based on work by Reo. Same functionality as before  Each actor (producer/consumer/observer) is a process  (No more process per dataflow / data object)  Communication is through randezvous  Producers are mutually exclusive (consumers are not)  2 possible 3-way rendezvous possibilities  Merge is now the only non-deterministic element  No deadlocks, no consistency (values ordering) problem Actor oriented design - PN (process network)  Based on PN model of concurrency by Kahn & MacQueen (‘77)  Communication is through streams  Unbounded FIFOs  Blocking reads  Same benefits, plus – queuing allows the observer/consumer to operate at different speeds (unless we explicitly add dependency), or delay the observation indifferently Actor oriented design - SR (sync/reactive)  Concept based on synchronous languages such Esterel, SIGNAL and Lustre (mostly used for RT/embedded systems like aircraft control, nuclear plants)  Synchronous: time is an ordered sequence of instants  Actual evaluation assumed to be zero time – instant reactions  Reactive: Instants initiated by environmental events (Harel/Penueli)  “When is just as important as what”  At each clock tick, every signal is evaluated (iteratively if needed) or is absent  Provides deterministic concurrency, events are ordered  Scheduler picks order of evaluation (may be done in compilation time, Edwards ‘98). Mutual dependency handled by iterations. Actor oriented design - DE (discrete events)  Concept based on VHDL/Verilog or Opnet network modeler  Exact timing specification with rigorous semantics  Each event is timed and processed chronologically.  Merge (and the entire system) are deterministic.  Unlike SR, here every evaluation takes a certain time delta  More realistic  However, evaluation order might introduce non-determinism if not define properly The road ahead  Actor oriented design is not new, various languages exist           CORBA event service (distributed push-pull) ROOM and UML-2 (dataflow, Rational, IBM) VHDL, Verilog (discrete events, Cadence, Synopsys, ...) LabVIEW (structured dataflow, National Instruments) Modelica (continuous-time, constraint-based, Linkoping) OPNET (discrete events, Opnet Technologies) SDL (process networks) Occam (rendezvous) Simulink (Continuous-time, The MathWorks) SPW (synchronous dataflow, Cadence, CoWare)  However, most are domain specific, and the few general purpose ones never caught on  Programmers don’t like new syntax  Adding libs to existing languages is not enough  UML case study?  Lee’s suggested solution is “coordination languages”  Polymorphic objects from other languages, general type-system Ptolemy II Design environment  Actors/components can be defined in C/C++, java, Matlab, python, perl, …  Visual editor, abstract syntax  Varying concurrency models Models of Computation in Ptolemy II  CI – Push/pull component interaction  Click – Push/pull with method invocation  CSP – concurrent threads with rendezvous  Continuous – continuous-time modeling with fixed-point semantics  CT – continuous-time modeling  DDF – Dynamic dataflow  DE – discrete-event systems  DDE – distributed discrete events  DPN – distributed process networks  FSM – finite state machines  DT – discrete time (cycle driven)  Giotto – synchronous periodic  GR – 3-D graphics  PN – process networks  Rendezvous – extension of CSP  SDF – synchronous dataflow  SR – synchronous/reactive  TM – timed multitasking Actor oriented design - examples  Two implementation of sequential     interleaving based on rendezvous Both are deterministic Barrier allows rendezvous to occur only when both inputs are ready Buffer can rendezvous with input OR with output. Commutator chooses one input for rendezvous (round robin) Actor oriented design - examples Conclusions  The bottom line from Lee’s work is – Instead of working with non-deterministic threads and attempting to prune this non-determinism, we should start with deterministic models, and add non-determinism only when needed.  Problem in adapting it is still - lack of cooperation from users (same as with async VLSI design, in fact)  Only languages that are general purpose, and no new syntax  A transparent solution would be simpler to enforce  Library based, compiler, HW…  Can we take something back to the VLSI level?  Some synchronization schemes can be built in HW (which?)  Actor oriented approach – are we there?  Design methodology / tools?

The Problem with Threads

Related documents

Products

Support

The Problem with Threads

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib