Lecture 26 • Thursday, will look at some of the Patternlets in Parallel Computing Module (uses OpenMP) CS 210 Fundamentals of Programming I Computer Science Curriculum (http://csinparallel.org/) • Parallel Computing Concepts Module April 18, 2016 • Source: CSinParallel: Parallel Computing in the 1 CS 210 Fundamentals of Programming I • Motivation • Terminology • Parallel speedup • Options for communication • Issues in concurrency April 18, 2016 Parallel Computing Concepts 2 CS 210 Fundamentals of Programming I • Moore’s "Law": an empirical observation by Intel co-founder Gordon Moore in 1965 • Number of components in computer circuits had doubled each year since 1958 • Four decades later, that number has continued to double each two years or less April 18, 2016 Motivation 3 CS 210 Fundamentals of Programming I • But the speedups in software performance mostly were due to increasing clock speeds • Era of the "free lunch"; just wait 18-24 months and software goes faster! • Increased clock speed => increased heat April 18, 2016 Motivation 4 the sun Temperature actual projected CS 210 Fundamentals of Programming I April 18, 2016 hot plate 2020 5 Source: J. Adams, 2014 CCSC:MW Conference Keynote CS 210 Fundamentals of Programming I • Around 2005, manufacturers stopped increasing clock speed • Instead, created multi-core CPUs; number of cores per CPU chip is growing exponentially • Most software has been designed for a single core; end of the "free lunch" era • Future software development will require understanding of how to take advantage of multi-core systems April 18, 2016 Motivation 6 CS 210 Fundamentals of Programming I • parallelism: multiple (computer) actions physically taking place at the same time • concurrency: programming in order to take advantage of parallelism (or virtual parallelism) • Parallelism takes place in hardware, concurrency takes place in software April 18, 2016 Terminology 7 • sequential programming: programming for a single core • concurrent programming: programming for multiple cores or multiple computers CS 210 Fundamentals of Programming I • process: the execution of a program • thread: a sequence of execution within a program, each process has at least one April 18, 2016 Terminology 8 CS 210 Fundamentals of Programming I • multi-core computing: computing with systems that provide multiple computational circuits per CPU package • distributed computing: computing with systems consisting of multiple computers connected by computer network(s) • Systems can have both April 18, 2016 Terminology 9 CS 210 Fundamentals of Programming I • data parallelism: the same processing is applied to multiple subsets of a large data set in parallel • task parallelism: different tasks or stages of a computation are performed in parallel April 18, 2016 Terminology 10 CS 210 Fundamentals of Programming I • shared memory multiprocessing: e.g., multi-core system, and/or multiple CPU packages in a single computer, all sharing the same main memory • cluster: multiple networked computers managed as a single resource and designed for working as a unit on large computational problems April 18, 2016 Terminology 11 CS 210 Fundamentals of Programming I • grid computing: distributed systems at multiple locations, typically with separate management, coordinated for working on large-scale problems • cloud computing: computing services are accessed via networking on large, centrally managed clusters at data centers, typically at unknown remote locations April 18, 2016 Terminology 12 CS 210 Fundamentals of Programming I • The ratio of the compute time for a sequential algorithm to the time for a parallel algorithm is the speedup of a parallel algorithm over a corresponding sequential algorithm. • If the speedup factor is n, then we say we have n-fold speedup. April 18, 2016 Parallel Speedup 13 • Number of processors • Other processes running at the same time • Communication overhead • Synchronization overhead • Inherently sequential computation • Rarely does n processors give n-fold speedup, but occasionally get better than n-fold speedup CS 210 Fundamentals of Programming I • The observed speedup depends on all implementation factors April 18, 2016 Parallel Speedup 14 overall speedup = 1 1−𝑃 𝑃 + 𝑆 • where P is the time proportion of the algorithm that can be parallelized • where S is the speedup factor for that portion of the algorithm due to parallelization • Note that the sequential portion has disproportionate effect. CS 210 Fundamentals of Programming I • Amdahl’s Law is a formula for estimating the maximum speedup from an algorithm that is part sequential and part parallel April 18, 2016 Amdahl's Law 15 CS 210 Fundamentals of Programming I • message passing: communicating with basic operations send and receive to transmit information from one computation to another. • shared memory: communicating by reading and writing from local memory locations that are accessible by multiple computations April 18, 2016 Options for Communication 16 CS 210 Fundamentals of Programming I • distributed memory: some parallel computing systems provide a service for sharing memory locations on a remote computer system, enabling non-local reads and writes to a memory location for communication. April 18, 2016 Options for Communication 17 CS 210 Fundamentals of Programming I • Fault tolerance is the capacity of a computing system to continue to satisfy its spec in the presence of faults (causes of error) • Scheduling means assigning computations (processes or threads) to processors (cores, distributed computers, etc.) according to time. April 18, 2016 Issues in Concurrency 18 CS 210 Fundamentals of Programming I • Mutually exclusive access to shared resources means that at most one computation (process or thread) can access a resource (such as a shared memory location) at a time. This is one of the requirements for correct IPC. April 18, 2016 Issues in Concurrency 19