Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig E Rasmussen © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Chapter 6 Objectives • Examine the historical development of concurrent and parallel systems. – Hardware – Languages • Discuss relationship that languages had with evolving hardware. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 2 Outline • Evolution of machines • Evolution of languages • Limits of automatic parallelization © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 3 Early concurrency • Biggest challenge with early computers was keeping them busy. – Increasing utilization. • Idle CPUs meant wasted cycles. • Common source of idle CPUs: I/O • How to use the CPU when a program is waiting on I/O? © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 4 Addressing the I/O problem • Two early methods became popular: – Multiprogramming – Interrupt driven I/O • Concurrency addressed the utilization problem by allowing different programs to use resources when one becomes blocked on I/O. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 5 Multiprogramming • Transparently multiplex the computing hardware to give the appearance of simultaneous execution of multiple programs. • Prior to multicore, single CPU machines used this approach to provide multitasking. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 6 Interrupt Driven I/O • Another development was interrupt driven I/O. • This allows hardware to notify the CPU when I/O operations were complete. – Avoids inefficient active polling and checking by CPU. – CPU can do other work and only worry about I/O operations when the I/O hardware tells it that they are ready. • Interrupt-based hardware helps manage parallel devices within a machine. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 7 CPU sophistication • Multiprogramming and interrupt-based hardware addressed early utilization problems when CPUs were simple. • CPUs continued to advance though: – Huge increases in speed (cycles/second) – More sophisticated instruction sets • This led to more hardware advances to support the increased capabilities of newer CPUs. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 8 Memory performance • The time difference between CPU clock cycles and physical I/O devices (like tapes) always were large. • Soon CPUs overtook digital memory performance as well. – Memory itself looked slower and slower relative to the CPU. – Results in the same utilization problem faced in the early days with I/O devices. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 9 Hierarchical memory • One approach to dealing with this growing gap between CPU performance and memory performance was caches. • Place one or more small, fast memories between the CPU and main memory. – Recently accessed data is replicated there. – Locality assumption is often safe: memory that was recently accessed is likely to be accessed again. – Caches speed up these subsequent accesses, and we amortize cost to first access it from slow memory with multiple subsequence accesses from fast cache. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 10 Pipelining • Another advance was to decompose instructions into pieces such that the CPU be structured like a factory assembly line. – Instructions start at one end, and as they pass through, subtasks are performed such that at the end, the instruction is complete. • This allows multiple instructions to be executing at any point in time, each at a different point in the pipeline. – This is a form of parallelism: instruction level parallelism. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 11 Pipelining • Pipelining allowed for more complex instructions to be provided that required multiple clock cycles to complete. – Each clock cycle, part of the instruction could proceed. • Instruction level parallelism allowed this multicycle complexity to be hidden. – If the pipeline could be kept full, then every cycle an instruction could complete. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 12 Vector processing • Another method for achieving parallelism in the CPU was to allow each instruction to operate on a set of data elements simultaneously. • Vector processing tool this approach – instructions operated on small vectors of data elements. • This became very popular in scientific computing, and later in multimedia and graphics processing. – Today’s graphics processing units are a modern descendant of early vector machines. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 13 Dataflow • In the 1970s and 1980s, dataflow was proposed an alternative architecture to the traditional designs dating back to the 1940s. • In dataflow, programs are represented as graphs in which vertices represent computations and edges represent data flowing into and out of computations. – Large opportunities for parallelism – any computation with data ready could execute. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 14 Dataflow • Example: an expression (a+b)*(c+d) could be represented as the flow graph below. • The independence of the + nodes means they can execute in parallel. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 15 Massively parallel machines • In the 1980s and early 1990s, “massively parallel” machines with many, many parallel processors were created. • Design goal: use many, many simple CPUs to solve problems with lots of available parallelism. – Many instructions would complete per second due to high processor count. – This would outperform more systems with few complex CPUs. – Relied on finding problems with lots of parallelism. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 16 Distributed systems and clusters • Special purpose supercomputers (such as MPPs) were very expensive. • Networks of workstations could achieve similar performance by writing programs that used message passing as the method for coordinating parallel processing elements. • Performance was often lower than special purpose supercomputers due to network latency, but extreme cost savings to buy clusters of workstations outweighed performance impact. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 17 Today: Multicore and accelerators • Processor manufacturers today can put multiple CPUs on a single physical chip. – Expensive shared memory multiprocessors of the past are now cheap desktop or laptop processors! • Demands of multimedia (video, 3D games) have led to adoption of multicore and vector processing in special purpose accelerators. – Most machines today have a graphics card that is more capable than a vector supercomputer 20 to 30 years ago. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 18 Outline • Evolution of machines • Evolution of languages • Limits of automatic parallelization © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 19 FORTRAN • The first widely used language was FORTRAN – the FORmula TRANslation language. – Dominant in numerical applications, which were the common application area for early computers. • Standarad FORTRAN took many decades to adopt concurrency constructs. • Dialects of FORTRAN built for specific machines did adopt these constructs. – E.g.: IVTRAN for the ILLIAC IV added the “DO FOR ALL” loop to provide parallel loops. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 20 ALGOL • The ALGOrithmic Language was introduced around the same time as FORTRAN. – Introduced control flow constructs present in modern languages. • ALGOL 68 introduced concurrency constructs. – Collateral clauses allowed the programmer to express sequences of operations that could be executed in arbitrary order (or in parallel). – Added a data type for semaphores used for synchronization purposes. – Introduced “par begin”, a way of expressing that a block of statements can be executed in parallel. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 21 Concurrent Pascal and Modula • Concurrent Pascal is a dialect of the Pascal language designed for operating systems software developers. – Operating systems are fundamentally concerned with concurrency since the introduction of multiprogramming and parallel I/O devices. • Added constructs representing processes and monitors. – Monitors are similar to objects in that they provide data encapsulation and synchronization primitives for defining and enforcing critical sections that operate on this data. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 22 Communicating Sequential Processes • CSP was a formal language defined to represent concurrent processes that interact with each other via message passing. • The occam language was heavily influenced by CSP and provided language constructs for: – Creating processes. – Defining sequential and parallel sequences of operations. – Channels for communication between processes. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 23 Ada • Ada was a language designed for the US Dept. of Defense in the 1970s and 1980s, still in use today. • Intended to be used for critical systems in which high software assurance was required. • Included constructs for concurrent programming early in its development. • Ada used the “task” abstraction to represent concurrently executing activities. – Communication via message passing or shared memory. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 24 Ada • Tasks communication is via rendezvous. • Tasks reach points where they block until another task reaches a paired point, at which time they communicate and continue. – The tasks “rendezvous” with each other. • Ada 95 also introduced protected objects for synchronization of data access. – Provides mutual exclusion primitives to the programmer. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 25 Functional languages • Functional languages have been around since LISP in the 1950s. • Typically considered to be at a higher level of abstraction from imperative languages like C. – E.g.: Mapping a function over a list. A functional programmer doesn’t implement lower level details like how the list is represented or how the loop iterating over its elements is structured. • Higher level of abstraction leaves more decision making up to the compiler. – Such as parallelization. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 26 Functional languages • MultiLISP was a dialect of Scheme that focused on concurrent and parallel programming. • Included the notion of a future variable. – A future is a variable that can be passed around but may not have a value associated with it until a later time. – Operations on these future values can be synchronized if they are required before they are available. • Futures have influenced modern languages, like Java and X10. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 27 Dataflow languages • Dataflow languages like Id and VAL were created to program dataflow hardware. – Purely functional core languages restricted side effects and made derivation of data flow graphs easier. – Lack of side effects facilitates parallelism. – I-structures and M-structures were part of the Id language to provide facilities for synchronization, memory side effects, and I/O. • Modern languages like Haskell provide constructs (e.g.: Haskell MVars) that are based on features of dataflow languages. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 28 Logic languages • Based on formal logic expressions. – Programmers stated sets of relations and facts to encode their problems. – Fundamental core of the languages are logical operators (AND, OR, NOT). – High degree of parallelism in logic expressions. If we have an expression “A and B and C”, A, B, and C can be evaluated in parallel. • Logic languages like PROLOG influenced modern languages like Erlang. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 29 Parallel languages • A number of languages were explored that specifically focused on parallelism. – Earlier examples were focused on general purpose programming, with concurrency constructs as a secondary concern. • Languages like High Performance Fortran and ZPL instead focused on abstractions that were specifically designed to build parallel programs. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 30 Parallel languages • In both HPF and ZPL, data distribution was a key concern. – Given a set of parallel processing elements, how does the programmer describe how large logical data structures are physically decomposed across the processors? – Goal was to let the compiler generate the often tedious and error prone code to handle data distribution and movement. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 31 Parallel languages • Encourages users to take a “global view” of programs, not focus on local processor view. • This lets programmer focus on the problem they want to solve, instead of details about how to map their problem onto a parallel machine. • Parallel languages also focus on retargetability. – If parallelization decisions are fully controlled by the compiler, then it can make different decisions for different platforms. Portability is easier in this case. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 32 Modern languages • With the introduction of multicore, concurrency constructs are being added to most mainstream languages now. – – – – Java: Threading and synchronization primitives. Fortran: Co-Array Fortran added to 2008 standard. .NET languages: Synchronization and threading Clojure: LISP derivative with software transactional memory – Scala: Concurrent functional language. – Haskell: Software transactional memory, Mvars, threads. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 33 Modern languages • Most new language features in concurrent languages are based on features explored in earlier languages. • Studying older languages that include concurrency constructs is informative in understanding what motivated their design and creation. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 34 Outline • Evolution of machines • Evolution of languages • Limits of automatic parallelization © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 35 Inferrence of parallelism is hard • Most people agree that it is exceptionally difficult to automatically parallelize programs written in sequential languages. – Language features get in the way. E.g., pointers introduce potential for aliasing, which restricts compiler freedom to parallelize. – High-level abstractions are lost in low-level implementations. Complex loops and pointer-based data structures make it very challenging to infer structures that can be parallelized. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 36 Move towards concurrent languages • Vectorizing and parallelizing compilers are very powerful, but they are reaching their limits as parallelism seen in practice increases. • The big trend in language design is to introduce language features that are built to support concurrent and parallel programming. © 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 37