Data Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ever Wonder … When did the term multicore become popular? A multi-core processor is a single computing component with two or more independent actual central processing units, which are the units that read and execute program instructions.) wikipedia Art of Multiprocessor Programming 2 Let’s Ask Google Ngram! usage of multicore in books by publication year Art of Multiprocessor Programming 3 Let’s Ask Google Ngram! This part since 2000 is obvious … Art of Multiprocessor Programming 4 Let’s Ask Google Ngram! ??? Art of Multiprocessor Programming 5 Let’s Ask Google Ngram! multicore cable multicore fiber … but we digress … Art of Multiprocessor Programming 6 WordCount alpha 8 bravo 3 charlie 9 … zulu 1 easy to do sequentially … what about in parallel? Art of Multiprocessor Programming 7 MapReduce split text among mapping threads … chapter 1 chapter 2 Art of Multiprocessor Programming chapter k 8 Map Phase must count words! must count words! must count words! a mapping thread per chapter … chapter 1 chapter 2 Art of Multiprocessor Programming chapter k 9 Map Phase alpha 9 juliet 2, alpha 1 tango 4 … each mapper thread produces a stream … of key-value pairs … key: word value: local count chapter 1 Art of Multiprocessor Programming 10 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } Art of Multiprocessor Programming 11 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } input: document fragment Art of Multiprocessor Programming 12 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } key: individual word Art of Multiprocessor Programming 13 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } value: local count Art of Multiprocessor Programming 14 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } a task that runs in parallel with other tasks Art of Multiprocessor Programming 15 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } produces a map: word count Art of Multiprocessor Programming 16 Mapper Class abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; } } initialize input: which document fragment? Art of Multiprocessor Programming 17 WordCount Mapper class WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … } Art of Multiprocessor Programming 18 WordCount Mapper class WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … } document fragment is list of words Art of Multiprocessor Programming 19 WordCount Mapper class WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … } document fragment is list of words map each word … Art of Multiprocessor Programming 20 WordCount Mapper class WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … } document fragment is list of words map each word … to its count in the fragment Art of Multiprocessor Programming 21 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } } Art of Multiprocessor Programming 22 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); the}compute() method constructs the return map; local word count } } Art of Multiprocessor Programming 23 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } create a map to hold the output } Art of Multiprocessor Programming 24 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } examine each word in } the document fragment Art of Multiprocessor Programming 25 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } } increment that word’s count in the map Art of Multiprocessor Programming 26 WordCount Mapper Map<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } when the local count is return map; complete, return the map } } Art of Multiprocessor Programming 27 Reduce Phase alpha 4 bravo 2 … zulu 1 a reducer thread merges mapper outputs alpha 2 juliet 1 tango 1 … alpha 1 foxtrot 1 papa 1 tango 1 … alpha 1 oscar 1, bravo 2… … Art of Multiprocessor Programming 28 Reduce Phase alpha 3 bravo 2 … zulu 1 the reducer task produces a stream … of key-value pairs … key: word value: word count Art of Multiprocessor Programming 29 Reducer Class abstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; } } Art of Multiprocessor Programming 30 Reducer Class abstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, each reducer is given List<V> aList) { a single key (word) key = aKey; valueList = aList; } } Art of Multiprocessor Programming 31 Reducer Class abstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aaList; and list of associated values } (word count per fragment) } Art of Multiprocessor Programming 32 Reducer Class abstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; It produces a single summary value valueList = aList; (the total count for that word) } } Art of Multiprocessor Programming 33 WordCount 0.037 0.002 0.045 … 0.000 normalizing document wordcount gives a fingerprint vector Art of Multiprocessor Programming 34 Document Fingerprint a fingerprint is a point in a high-dimensional space Art of Multiprocessor Programming 35 Clustering romance novels Usenix Procedings tango lyrics Art of Multiprocessor Programming 36 k-means Find k clusters from raw data Art of Multiprocessor Programming 37 k-means Find k clusters from raw data each vector closer to those in same cluster … than in different clusters. Art of Multiprocessor Programming 38 MapReduce split points among mapping threads … thread 1 thread 2 Art of Multiprocessor Programming thread k 39 k-means Reducer picks k “centers” at random Art of Multiprocessor Programming 40 Reduce Phase 0 c0 1 c1 2 c2 reducer sends key-value pair to mappers key: cluster number value: center point … thread 1 thread 2 Art of Multiprocessor Programming thread k41 Mappers 0 c0 1 c1 2 c2 Each mapper uses centers to assign each vector to a cluster Art of Multiprocessor Programming 42 Mappers 0 c0 1 c1 2 c2 Each mapper uses centers to assign each vector to a cluster Art of Multiprocessor Programming 43 Mappers p0 2 p1 1 p2 1 p3 0 mapper sends key-value stream to reducer key: point value: cluster ID Art of Multiprocessor Programming 44 Back at the Reducer C0 = {…} C1 = {…} C2 = {…} The reducer merges the streams …. and assembles clusters … Art of Multiprocessor Programming 45 Back at the Reducer 0 c 0’ 1 c 1’ 2 c 2’ The reducer computes new centers based on new clusters … Art of Multiprocessor Programming 46 Once is Not Enough 0 c 0’ 1 c 1’ 2 c 2’ reducer sends new centers to mappers process ends when centers become stable … thread 1 thread 2 Art of Multiprocessor Programming thread k47 To Recaptulate We saw two problems … wordcount & k-means … with similar parallel solutions Map part is parallel … Reduce part is sequential. Art of Multiprocessor Programming 48 abstraction Art of Multiprocessor Programming 49 Map Function (k1, v1) list(k2, v2) doc, contents word, count point, cluster ID cluster ID, center Art of Multiprocessor Programming 50 Reduce Function (k2, list(v2)) list(v2) count word, list of counts cluster ID, list of points new cluster center Art of Multiprocessor Programming 51 Example Distributed Grep Map: line of document Reduce: copy line to display Art of Multiprocessor Programming 52 Example URL Access Frequency Map: (URL, local count) Reduce: (URL, total count) Art of Multiprocessor Programming 53 Example Reverse web link graph Map: (target link, source page) Reduce: (target link, list of source pages) Art of Multiprocessor Programming 54 Other Examples histogram matrix multiplication PageRank Betweenness centrality Art of Multiprocessor Programming 55 Distributed MapReduce Google, Hadoop, etc… Communication by message Fault-tolerance important Art of Multiprocessor Programming 56 Multicore MapReduce Phoenix, Phoenix++, Metis … Communication by shared memory objects Fault-tolerance unimportant Art of Multiprocessor Programming 57 Costs key-value layout cache pressure memory allocation static vs dynamic mechanism overhead Art of Multiprocessor Programming 58 Part Two Data Streams 59 Streams source data transformation sequence of transformations sometimes in parallel data transformation data no relation to I/O streams consumer Art of Multiprocessor Programming 60 Streams source data transformation data transformation data consumer transformations given by mathematical functions Art of Multiprocessor Programming 61 Streams source data transformation creates new stream transformation data transformation no modifications or side-effects data correctness easier? consumer Art of Multiprocessor Programming 62 Functional Programming functions map old state to new state old state never changed no complex side-effects elegant, easier proofs of correctness Art of Multiprocessor Programming 63 Oh, Really? “Functional languages are unnatural to use; but so are knives and forks, diplomatic protocols, double-entry bookkeeping, and a host of other things modern civilization has found useful.” Jim Morris, 1982 Art of Multiprocessor Programming 64 Haiku esthetically pleasing only works on those who understand Haiku Art of Multiprocessor Programming 65 Karate esthetically pleasing works even on those who do not understand Karate Art of Multiprocessor Programming 66 Jim Morris’s Question Is functional programming more like Haiku or Karate? 1981: Haiku Today: Karate Art of Multiprocessor Programming 67 Laziness 1,2,3,… No computation until absolutely necessary x x+1 add 1 to each element: no computation x 2x double each element: no computation sum 2(xi+1) is terminal Art of Multiprocessor Programming 68 Laziness 1,2,3,… x x+1 x 2x collect in List move to container is terminal Art of Multiprocessor Programming 69 Laziness 1,2,3,… x x+1 x 2(x+1) x 2x collect in List Laziness permits optimizations Art of Multiprocessor Programming 70 Laziness Laziness permits infinite streams … Stream<Integer> fib = new FibStream(); 112 3 5 8 12 20 32 … Art of Multiprocessor Programming 71 Unbounded Random Stream Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() ); } Art of Multiprocessor Programming 72 Unbounded Random Stream Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() ); Unbounded stream of double} precision random numbers Art of Multiprocessor Programming 73 Random Stream Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() ); } Stream that generates new elements on the fly Art of Multiprocessor Programming 74 Random Stream Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() ); } function to call when generating new element example of Java lambda expression (anon method) Art of Multiprocessor Programming 75 WordCount List<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 76 WordCount List<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> putspattern.splitAsStream(s)) each word from the document into a List .collect(Collectors.toList()); } Art of Multiprocessor Programming 77 WordCount List<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> open the file, create a FileReader pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 78 WordCount List<String> readFile(String fileName) how a stream program looks { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } each line creates a new stream Art of Multiprocessor Programming 79 WordCount List<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) turn the FileReader into a stream .collect(Collectors.toList()); of lines, each line a string } Art of Multiprocessor Programming 80 WordCount List<String> readFile(String map creates a new stream by applying fileName) a { function to each stream element … here, converts to lower case return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 81 WordCount List<String> readFile(String flatMap replaces one stream elementfileName) { with … multiple stream elements here, splits line into words return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 82 WordCount List<String> readFile(String fileName) collect puts stream elements in a container { … here, in a List return reader terminal operation .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 83 WordCount List<String> readFile(String fileName) { No loops … return reader No conditionals .lines() No mutable objects .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList()); } Art of Multiprocessor Programming 84 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting())); now let’s count the words Art of Multiprocessor Programming 85 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( start with list of words Function.identity(), Collectors.counting())); Art of Multiprocessor Programming 86 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting())); turn List into a stream Art of Multiprocessor Programming 87 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting())); put stream into a container (Map) word count Art of Multiprocessor Programming 88 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting())); each element’s key is that element Art of Multiprocessor Programming 89 WordCount Map<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting())); each element’s value is the number of times it appears Art of Multiprocessor Programming 90 WordCount Map<String,Long> map = text .stream() No loops .collect( Collectors.groupingBy( No conditionals Function.identity(), No mutable objects Collectors.counting())); Art of Multiprocessor Programming 91 k-Means class Point { Point(double x, double y) {…} Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…} } Art of Multiprocessor Programming 92 k-Means class Point { Point(double x, double y) {…} stream-based! Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…} } Art of Multiprocessor Programming 93 BaryCenter Art of Multiprocessor Programming 94 Stream Barycenter Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } Art of Multiprocessor Programming 95 Stream Barycenter Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } cluster size Art of Multiprocessor Programming 96 Stream Barycenter Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } turn List into Stream Art of Multiprocessor Programming 97 Reduce stream.reduce(+): () ; (a) a (a,b) a+b (a,b,c) (a+b)+c terminal operation Art of Multiprocessor Programming etc. … 98 k-Means Optional because sum might be empty! Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); sum points in cluster } Art of Multiprocessor Programming 99 k-Means Pointextract barycenter(List<Point> cluster){ sum and divide by # points double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } Art of Multiprocessor Programming 100 k-Means List<Point> points = readFile("cluster.dat"); centers = randomDistinctCenters(points); double convergence = 1.0; while (convergence > EPSILON) { … } Art of Multiprocessor Programming 101 k-Means List<Point> points = readFile("cluster.dat"); centers = randomDistinctCenters(points); double convergence = 1.0; while (convergence > EPSILON) { read points from file … } Art of Multiprocessor Programming 102 k-Means List<Point> points = readFile("cluster.dat"); centers = randomDistinctCenters(points); double convergence = 1.0; while (convergence > EPSILON) { … } pick random centers Art of Multiprocessor Programming 103 k-Means List<Point> points = readFile("cluster.dat"); centers = keep going until centers are stable randomDistinctCenters(points); double convergence = 1.0; while (convergence > EPSILON) { … } Art of Multiprocessor Programming 104 Compute New Clusters while (convergence > EPSILON) { Map<Integer,List<Point>> clusters = points .stream() .collect( Collectors.groupingBy( p -> turnclosestCenter(centers, list of points into a Stream p) ) ); Art of Multiprocessor Programming } 105 Compute New Clusters put each point in a map while (convergence > EPSILON) key is closest center { Map<Integer,List<Point>> clusters value is list of points with that center = points .stream() .collect( Collectors.groupingBy( p -> closestCenter(centers, p) ) ); Art of Multiprocessor Programming 106 } while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( compute new centers Collectors.toMap( map: ID center e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 107 while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) turn map into a Stream of ) (ID, point) pairs ); convergence = distance(centers, newCenters); centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 108 while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); turn stream into a map convergence = distance(centers, newCenters); ID barycenter centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 109 while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); new= Key is still the cluster newCenters); ID convergence distance(centers, centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 110 while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() new value is the barycenter computed earlier .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 111 while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( If centers have moved, start Collectors.toMap( again with the new centers e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters; } k-Means Con’t Art of Multiprocessor Programming 112 Functional k-Means many fewer lines of code easier to read (really!) easier to reason easier to optimize Art of Multiprocessor Programming 113 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s)); So far, Streams are sequential Art of Multiprocessor Programming 114 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s)); make List of strings and turn them into a stream Art of Multiprocessor Programming 115 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s)); forEach applies a method to each element (not functional) Art of Multiprocessor Programming 116 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") prints .stream() Arlington .forEach(s -> printf("%s\n", s)); Berkeley Clarendon Dartmouth Exeter Art of Multiprocessor Programming 117 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .parallelStream() .forEach(s -> printf("%s\n", s)); turn List into a parallel stream Art of Multiprocessor Programming 118 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") prints Dartmouth .parallelStream() Arlington Berkeley .forEach(s -> printf("%s\n", s)); Berkeley Dartmouth Exeter Clarendon Arlington Arlington Dartmouth Clarendon Clarendon ExeterBerkeley Art Exeter of Multiprocessor Programming 119 Parallelism? Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .parallel() .forEach(s -> printf("%s\n", s)); can turn stream into a parallel stream Art of Multiprocessor Programming 120 Pitfalls list.stream().forEach( s -> list.add(0) ); Art of Multiprocessor Programming 121 Pitfalls list.stream().forEach( s -> list.add(0) ); lambda (function) must not modify source! Art of Multiprocessor Programming 122 Pitfalls source.parallelStream() .forEach( s -> target.add(s)); exception if target not thread-safe order added is non-deterministic Art of Multiprocessor Programming 123 Conclusions Streams support functional programming data parallelism compiler optimizations Art of Multiprocessor Programming 124