Design Patterns and Computer Architecture Mark Murphy, Scott Beamer, Henry Cook, Andrew Waterman, Krste Asanovic, Kurt Keutzer 1/17 Design Patterns and Architecture Design patterns (so far) are good at exposing ||ism We need to incorporate Architectural information But not too much: we don't want to drown in detail! Computer Architects need patterns too! Only half of the battle / There is parallelism everywhere we look! Dwarfs were supposed to supplant benchmarks, remember? Dwarfs -> Computational Patterns: too vague for architects Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance issues? 2/17 Work In Progress The point of this talk is not to present any results I want your input on result of brainstorming sessions between myself and the Architecture research group There are 40 minutes for this -- ~20 of me presenting slides and the rest for discussion 3/17 Pattern LanguageApplications Exposes ||ism Computational Patterns Structural Patterns Identify the key computations Productivity Layer Choose your high level structure Agent and repository Layered systems Arbitrary static task graph Map reduce Iterative refinement Dense linear algebra Backtrack branch and bound Monte carlo methods Model view controller Sparse linear algebra Finite state machine Dynamic programming Process control Pipe-and-filter Unstructured grids Graphical models Graph algorithms Event based, implicit invocation Puppeteer Structured grids N-body methods Circuits Spectral methods Parallel Algorithm Strategy Patterns Refine the structure - what concurrent approach do I use? Guided re-organization Task Parallelism Geometric Decomposition Data Parallelism Pipeline Discrete Event Recursive Splitting Implementation Strategy Patterns Efficiency Layer Actors SPMD Master/Worker Shared queue Distributed array Task queue Strict data parallel Loop parallelism Shared data Graph partitioning Fork/Join BSP Shared hash table Memory parallelism Data Structure Program Structure Utilize Supporting Structures – how do I implement my concurrency? Guided mapping Concurrent Execution Patterns Implementation methods – what are the building blocks of parallel programming? Guided implementation Advancing Program Counters Coordination MIMD Thread pool Message passing Mutual exclusion Task graph Speculation Collective communication Transaction al memory SIMD Data flow Collective synchronization P2P synchronization Digital circuits Pattern Language Exposes ||ism Example from Machine Learning: Compute the gradient of a scalar function w.r.t a matrix B Each entry of gradient requires NxN Blas2 matrix computations 5/17 Pattern Language Exposes ||ism Example from Quantum Chemistry: Need to compute a matrix <# basis functions> x <# electrons> Each entry of matrix requires evaluating a number of functions, and summing the results 6/17 Pattern Language Exposes ||ism In both examples, we have (at least) two levels of ||ism Many entries in matrix (Task Parallel) Much work in computing each entry (Map/Reduce Data Parallel) The pattern language can pretty much tell us this However, the right parallel program for a GPU-like manycore processor looks different in the two cases for the Machine Learning problem, only parallelize the computation of each matrix element for the Chemistry problem, parallelize at both levels Knowing this requires understanding that GPU-like processors implement fine-grained data parallelism best 7/17 SW writers understand HW arch? There has been a sentiment that the pattern language should be architecture-agnostic Architectural savvy required for decisions like these. Otherwise, the options are all unattractive: Implement every possible parallelization, choose best? ... Choose one parallelization, hope it works? ... Ask Bryan to parallelize your code? But clearly we can't write a pattern language around GTX200, just as we can't write it around LRB or Nehalem 8/17 Performance Models? Abstract, simplistic models to capture the essence of low-level performance issues. Extant example: logP for distributed memory machines l -- Network Latency for message o -- CPU overhead of sending a message g -- gap = inverse of NIC bandwidth P -- number of processors l-latency network 9/17 Performance Models? Could imagine a similar model for current manycores. How about this one? The BLIMP model: B(L) -- Bandwidth as function of load/store block size I -- # Instruction Fetch units M -- # Load/Store units P -- # Execution Pipelines I=4 P=8 10/17 Performance Models? Problems are obvious Sure -- you can analyze the FFT algorithm and Matrix Mulitply But what about my code? Can't handle data dependence in computational intensity Example: SIFT Feature Extraction Compute a "scale space" For each maximum in scale space: Do a whole bunch of work How many maxima are there? "Interesting" architectural features cannot be described Still .... better than nothing? 11/17 Design Patterns and Architecture Design patterns (so far) are good at exposing ||ism We need to incorporate Architectural information But not too much: we don't want to drown in detail! Computer Architects need patterns too! Only half of the battle / There is parallelism everywhere we look! Dwarfs were supposed to supplant benchmarks, remember? Dwarfs -> Computational Patterns: too vague for architects Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance issues? 12/17 Architects need patterns too! "Benchmark Addiction" was part of motivation for Dwarfs We (i.e. Tim) quickly discovered that Dwarfs were far too vague and high-level to serve this purpose Reliance upon C-source code benchmarks pigeon-holed architectural innovation Dwarfs were supposed to be anti-benchmarks: provide a nonsource code description of the computations that were important A Computational Patern (~Dwarf) doesn't even imply a particular problem to be solved, much less a particular algorithm Can the fleshed-out pattern language be the solution? 13/17 Anti-Benchmarks? Architecture-agnostic patterns-based analysis of a program enumerates space of implementations Map/Reduce Task Parallel But architects still need their benchmark fix What does this actually tell them? They need to know: Is my cache big enough? Should I include my whiz-bang u-arch widget? 14/17 Anti-Benchmarks Suppose that the pattern language included somehow the architectural savvy needed to make every possible implementation decision What happens when the architect changes the rules? 15/17 Multiple Levels of Description Level Level Level Level Level 0: 1: 2: 3: 4: A patterns-based description An "Abstract Machine" model? A performance model? A cycle-accurate simulation? A joule-accurate simulation? 16/17 Abstract Machines Alternate proposal for performance model (K. Asanovic) Map the program to two different machines (one with, one without the widget). How are the programs different? Given a microarchitectural widget, how does its presence/absence affect the performance of a program? Mapping process TBD. SEJITS? Examples: An "Infinite ILP" machine. The superscalar analogue of PRAM An Infinite Vector-width machine. An infinite thread machine 17/17 Design Patterns and Architecture Design patterns (so far) are good at exposing ||ism We need to incorporate Architectural information But not too much: we don't want to drown in detail! Computer Architects need patterns too! Only half of the battle / There is parallelism everywhere we look! Dwarfs were supposed to supplant benchmarks, remember? Dwarfs -> Computational Patterns: too vague for architects Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance issues? 18/17 Architectural Meta-Patterns Hopefully by now I've conveyed my concern about the lack of architectural / performance information in design patterns Also, hopefully it is clear that I don't know the answer Maybe someone can write me a pattern? How should I tell you what I know about architecture? 19/17 Thank You 20/17