Parallel and Distributed Programming Models and Languages 15-740/18-740 Computer Architecture In-Class Discussion

advertisement
Parallel and Distributed Programming
Models and Languages
15-740/18-740 Computer Architecture
In-Class Discussion
Dong Zhou
Kun Li
Mike Ralph
Why distributed computations?
• Buzzword: Big Data
• Take sorting as an example
– Amount of data that can be sorted in 60 seconds
– One computer can read ~60 MB/sec from one disk
– 2012 world record
• Flat Datacenter Storage by Ed Nightingale et.al
• 1470 GB
• 256 heterogeneous nodes, 1033 disks
• Google indexes 100 billion+ web pages
Solution: use many nodes
• Grid computing
– Hundreds of supercomputers connected by high
speed net
• Cluster computing
– Thousands or tens of thousands of PCs connected
by high speed LANS
• 1000 nodes potentially give 1000x speedup
Distributed computations are difficult
to program
•
•
•
•
•
•
Sending data to/from nodes
Coordinating among nodes
Recovering from node failure
Optimizing for locality
Debugging
…
MapReduce
• A programming model for large-scale
computations
– Process large amounts of input, produce output
– No side-effects or persistent state
• MapReduce is implemented as a runtime library
–
–
–
–
Automatic parallelization
Load balancing
Locality optimization
Handling of machine failures
MapReduce design
• Input data is partitioned into M splits
• Map: extract information on each split
– Each map produces R partitions
• Shuffle and sort
– Bring M partitions to the same reducer
• Reduce: aggregate, summarize, filter or
transform
• Output is in R result files
More specifically
• Programmer specifies two methods
– map(k, v) → <k', v'>*
– reduce(k', <v'>*) → <k'', v''>*
• All v' with same k' are reduced together
• Usually also specify:
– partition(k', total partitions) → partition for k’
– often a simple hash of the key
Runtime
MapReduce is widely applicable
•
•
•
•
•
Distributed grep
Distributed clustering
Web link graph reversal
Detecting approx. duplicate web pages
…
Dryad
• Similar goals as MapReduce
– Focus on throughput, not latency
– Automatic management of scheduling,
distribution, fault tolerance
• Computations expressed as a graph
– Vertices are computations
– Edges are communication channels
– Each vertex has several input and output edges
Why using a dataflow graph?
• Many programs can be represented as a
distributed dataflow graph
– The programmer may not have to know this
• ``SQL-like’’ queries: LINQ
• Dryad will run them for you
Runtime
• Vertices (V) run arbitrary app code
• Vertices exchange data through
files, TCP pipes etc.
• Vertices communicate with JM to report
status
V
• Job Manager (JM) consults name server(NS)
to discover available machines.
• JM maintains job graph and schedules vertices
V
V
• Daemon process (D)
executes vertices
Job = Directed Acyclic Graph
Outputs
Processing
vertices
Channels
(file, pipe,
shared
memory)
Inputs
Advantages of DAG over MapReduce
• Big jobs more efficient with Dryad
– MapReduce: big jobs runs > 1 MR stages
• Reducers of each stage write to replicated storage
• Output of reduce: 2 network copies, 3 disks
– Dryad: each job is represented with a DAG
• Intermediate vertices write to local file
• …
Pig Latin
• High-level procedural abstraction of
MapReduce
• Contains SQL-like primitives
• Example:
good_urls = FILTER urls BY pagerank > 0.2;
groups = GROUP good_urls BY category;
big_groups = FILTER groups BY COUNT(good_urls)>106;
Output = FOREACH big_groups GENERATE category,
AVG(good_urls.pagerank);
• Plus user-defined functions (UDFs)
Value
• Reduces development time
• Procedural vs. declarative
• Overhead/performance costs worthwhile?
C/C++
Assembly
Pig Latin
MapReduce
Green-Marl
• High-level graph analysis language/compiler
• Uses basic data types and graph primitives
• Built-in graph function
– BFS, RBFS, DFS
• Uses domain specific optimizations
– Both non-architecture and architecture specific
• Compiler translates Green-Marl to other highlevel language (ex. C++)
Tradeoffs
• Achieve speedup over hand-tuned parallel
equivalents
• Tested only on single workstation
• Only works with graph representations
– Difficulty representing certain data sets and
computations
• Domain specific vs. general purpose languages
• Future work for more architectures, userdefined data structures
Questions and Discussion
Example: count word frequencies in
web page
• Input is files with one doc per record
• Map parses document into words
– key = document URL
– value = document contents
• Output of map
"doc1", "to be or not to be"
"to", "1"
"be", "1"
"or", "1"
"not", "1"
"to", "1"
"be", "1"
Example: count word frequencies in
web page
• Reduce: computes sum for a key
key = "be"
values = "1", "1"
"2"
key = "not"
values = "1"
key = "or"
values = "1"
"1"
"2"
• Output of reduce saved
"to", "2"
"be", "2"
"or", "1"
"not", "1"
key = "to"
values = "1", "1"
"2"
Example: Pseudo-code
Map(String input_key, String input_value):
//input_key: document name
//input_value: document contents
for each word w in input_values:
EmitIntermediate(w, "1");
Reduce(String key, Iterator intermediate_values):
//key: a word, same for input and output
//intermediate_values: a list of counts
int result = 0;
for each v in inermediate_values:
result += ParseInt(v);
Emit(AsString(result))
Download