New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3rd July 2009 1 ‘Obvious Truths’ • Single processors will not get faster, we need to go to multi-core • There will be a need for processors with many (> 32?) cores • These will need to support general purpose applications • Application performance will need to scale with number of cores 2 ‘Obvious Truths’(2) • General purpose parallel computing needs shared memory • Current shared memory requires cache coherence • Cache coherence doesn’t scale beyond 32 cores • Updateable state makes general purpose parallel programming difficult 3 ‘Obvious Untruths’ • HPC already has all the answers to parallel programming • Message passing is the answer (hardware or software or both) • Conventional languages already have adequate threading and locking facilities • We can program without state 4 So what next? • Simplifying the programming model must be the answer – removing facilities is desirable e.g. – Random control transfer – Pointer arithmetic – Explicit memory reclamation • Arbitrary state manipulation is the enemy of parallelism – we must restrict it! 5 Half Truths? • Functional languages are the answer to parallelism, all we need is to add state (in a controlled way) • Transactional memory can replace locking to simplify the handling of parallel state • Transactional memory can remove the need for cache coherence 6 Functions+Transactions • The Cambridge Microsoft Haskell work has shown how transactions can be included in a functional language via monads • Is this a style of programming which can be sold to the world as the way ahead for future multi-core (manycore) systems? 7 Selling a New Language • It must capable of expressing • • • • everything that people want It isn’t just a case of producing something which is a good technical solution It mustn’t be too complex It probably needs to look familiar It needs to be efficient to implement 8 The Problems • FP is unfamiliar to many existing programmers • Many people find it hard to understand • Even more find monads difficult • In spite of excellent FP compiler technology, imperative programming will probably always be more efficient 9 Can We Compromise? • Pure functional programs can be executed easily in parallel because they don’t update global state • But if we only exploit parallelism at the function level, local manipulation of state within a function causes no problems • Can we work with such a model? 10 What Would We Gain? • ‘Easy’ parallelism at function level – This could either be explicit or implicit • Familiarity of low level code – Can use iteration, assignment, updateable arrays etc. • Potential increase in efficiency – Direct mapping to machine code – Explicit memory re-use 11 What Would We Lose? • Clearly we lose referential transparency within any imperative code • But this is inevitable if we want to manipulate state – even with monads • Clearly, as described so far, we haven’t got the ability to manipulate global state – we need more 12 Adding Transactions • We should only use shared state when it is really necessary • It should be clear in the language when this is happening • It should be detectable statically • Ideally, it should be possible to check automatically the need for atomic sections 13 Memory Architecture • With the right underlying programming model we should be able to determine memory regions – Read only – Thread local – Global write once – Global shared (transactional) • Can lead to simplified scalable memory architecture 14 Experiments • Using Scala to investigate programming styles – Is open source – Has both imperative & functional feature – Not currently transactional • Using Simics based hardware simulator to experiment with memory architectures 15 Outstanding Questions • Data Parallelism – How to express – How to handle in-place update of parallel data (array) structures • Streaming applications – Purely functional? – Need message passing constructs? – Need additions to the memory model? 16 Conclusions • None really so far! • But am convinced, from a technical viewpoint, we need new programming approaches • Am fairly convinced that we need to be pragmatic in order to sell a new approach, even if this requires compromises from ideals 17 Questions? 18 Transactional Memory • Programming model to simplify manipulation of shared state – Speculative model – Sections of program declared ‘atomic’ – They must complete without conflict or die and restart – Must not alter global state until complete – Needs system support – software or hardware 19 Object Based Transactional Memory Hardware • Based on ‘object-aware’ caches • Exploits object structure to simplify transactional memory operations • Advantages over other hardware TM proposals – Handles cache overflow elegantly – Enables multiple memory banks with distributed commit 20 TM & Cache Coherence • Fine grain cache coherence is the major impediment to extensible multi-cores • Updates to shared memory only occur when a transaction commits • Caches only need to be updated at commit points (which tend to be coarser grain) • If all shared memory is made transactional, the requirement for fine grain coherence is removed 21 TM Programming • TM constructs can be added to conventional programming languages • But, they require careful use to ensure correctness • If transactional & non-transactional operations are allowed on the same data, the result can become complex to understand. 22 New Programming Models? • Problems can often be simplified by restricting (unnecessary) programming facilities e.g. – Arbitrary control transfer – Pointer arithmetic – Explicit memory reclamation • A new approach is needed to simplify parallel programming & hardware 23 We Need Useable & Efficient Models • Shared memory is essential for general purpose programming • Message passing (alone) (e.g. MPI, Occam etc.) is not sufficient • We need shared updateable state – e.g. pure functional programming is not the answer • The languages need to be simple and easily implementable 24 A Synthesis? • Functional Programming has something to offer – don’t use state unnecessarily • But don’t be too ‘religious’ – local, single threaded state is simple & efficient • Can all global shared state be handled transactionally? 25 Experiments • Using the language Scala – has both functional and imperative features • Experimenting with applications • Studying how techniques similar to ‘escape analysis’ can identify shared mutable state • Looking at hardware implications, particularly memory architecture 26