MapReduce Online

advertisement
MapReduce Online
Veli Hasanov
50051030
Fatih University
OUTLINE
1 Introduction
2 Background (Hadoop)
3 Pipelined MapReduce
4 Online Aggregation
5 Conclusion
1-Introduction
• MapReduce has emerged as a popular way in
working on large clusters
• The Google MapReduce framework and
open-source Hadoop system
• Data-centric fashion: transformation to data
sets
• distributed execution, network
communication and fault tolerance are
handled by MR
1-Introduction
• Pipelining provides several important advantages
to a MapReduce framework:
- map->reduce => generate & refine an
approximation of their final answer during the
course of execution =online aggregation
- MapReduce jobs(that run continuosly) accepting
new data, as it arrives, analyse it immediately.
This allows MR to be used in applications such as
event monitoring and stream processing
- pipelining can reduce job completion times by up
to 25% in some scenarios.
1-Introduction
• We present a modified version of the Hadoop
MapReduce framework that supports on-line
aggregation, which allows users to see “early
returns”
OUTLINE
1 Introduction
2 Background (Hadoop)
3 Pipelined MapReduce
4 Online Aggregation
5 Conclusion
2-Hadoop Background
• 2.1 Programming Model
- mapping , [intermediate <key,value> pairs.]
reducing
- Optionally, user can use Combiner Function
(map-side pre-aggregation). Reduces network
traffic btw map-reduce
2-Hadoop Background
• 2.2 Hadoop Architecture
- Hadoop MapReduce + Hadoop Distributed File System
(HDFS)
- HDFS is used to store both inputs to map and reduce
steps(Intermediate results are stored in node’s local file
system)
- Hadoop installation =single master node and many worker
nodes
- Master Node - JobTracker (accept, divide, give)
- Worker Node - TaskTracker (manages execution of tasks.
By default is has 2 maps & 2 reduces slots)
2-Hadoop Background
• 2.3 Map Task Execution
- Each map task is assigned a portions of input files (splits)
- By default split contains a single HDFS block (in 64MB).
number of file blocks = number of map tasks.
- Execution of map tasks is divided into 2 phases:
- 1. The map phase : : Reads the task’s splits from HDFS,
parses it <record, key>, and applies map function to each
record
- 2. The commit phase : After that the commit phase
registers the final output with the TaskTracker, (informs
JobTracker about completing of a task)
- outputCollector function (in the map phase ) stores map
output in a format that is easy for reduce tasks to consume.
2-Hadoop Background
• 2.4 Reduce Task Execution
is divided into three phases:
- Shuffle, sort, reduce
• The output of the reduce function is written to a
temporary location on HDFS. After comlpeting,
the task’s HDFS output file is atomically renamed
from its temporary location to its final location.
• If there is a failure in map or reduce task
execution??
2-Hadoop Background
• Map tasks write their output to local disk
– Output available after map task has completed
• Reduce tasks write their output to HDFS
– Once job is finished, next job’s map tasks can be
scheduled, and will read input from HDFS
• Therefore, fault tolerance is simple: simply rerun tasks on failure
– No consumers see partial operator output
Dataflow in Hadoop
Submit job
map
map
schedule
reduce
reduce
Dataflow in Hadoop
Read
Input File
Block 1
HDFS
Block 2
map
reduce
map
reduce
Dataflow in Hadoop
map
reduce
Local
FS
HTTP GET
map
Local
FS
reduce
Dataflow in Hadoop
reduce
Write
Final
Answer
HDFS
reduce
OUTLINE
1 Introduction
2 Background (Hadoop)
3 Pipelined MapReduce
4 Online Aggregation
5 Conclusion
3- Pipelined MapReduce
• 3.1 Pipelining within a job
. 3.1.1 Naïve Pipelining
- we modified Hadoop to send data directly from map
to reduce tasks.
- Client submit jobs-> JobTracker assigns map&reduce
tasks to the available TaskTracker slots..
- TCP socket will be used to pipeline the output of the
map function. As soon as map-output is produced,
Mapper determines where(which partition in reduce
task) the record should be sent to.
3- Pipelined MapReduce
• 3.1.2 Refinements
• Naïve Pipelining may suffer from several practical
problems:
- Prblm1. There may not be enough slots available to schedule
every task in a new job and large number of TCP connection is
needed. => Map Task write output to the disk. Once the
reduce task assign a slot, then it can pull the records from the
map task. For TCP problems: each reducer can be
configurable. (pull the data from a certain number of mappers
at once)
- Prblm2. The map function was invoked by the same thread
that wrote output records to the pipeline sockets. i.e. If the
network is truncated the mapper will be prevented from
doing useful work. => separate thread: stores its output in an
in-memory buffer, then another one sends these data to the
connected reducers
3- Pipelined MapReduce
• 3.1.3 Granularity of Map Output
Another problem with the naïve design is that it eagerly sends each
record as soon as it is produced, which prevents the use of mapside combiners. => Instead of sending the buffer contents to
reducers directly, we wait for the buffer to grow to a threshold size.
The mapper then applies the combiner function and writes the
buffer to disk using the spill file format.
- When a map task generates a new spill file, it first queries the
TaskTracker for the number of unsent spill files. If this number
grows beyond a certain threshold. mapper will accumulate multiple
spill files.
- Once the queue of unsent spill files exceeds the threshold, the map
task merges and combines the accumulated spill files into a single
file, and then registers its output with the TaskTracker.
3- Pipelined MapReduce
• 3.2 Pipelining Between Jobs
• In the traditional Hadoop architecture, the output of
each job is written to HDFS. (j1, j2)
• Furthermore, the JobTracker cannot schedule a
consumer job until the producer job has completed,
because scheduling a map task requires knowing the
HDFS block locations of the map’s input split.
• => In our modified version of Hadoop, the reduce tasks
of one job can optionally pipeline their output directly
to the map tasks of the next job. And we’ll introduce
‘snapshot’ outputs that is publishd by online
aggregation and continuous queries.
3- Pipelined MapReduce
FAULT TOLERANCE
• Traditional fault tolerance algorithms for
pipelined dataflow systems are complex
• HOP approach: write to disk and pipeline
–
–
–
–
Producers write data into in-memory buffer
In-memory buffer periodically spilled to disk
Spills are also sent to consumers
Consumers treat pipelined data as “tentative” until
producer is known to complete
– Fault tolerance via task restart, tentative output
discarded
OUTLINE
1 Introduction
2 Background (Hadoop)
3 Pipelined MapReduce
4 Online Aggregation
5 Conclusion
4- Online Aggregation
• Although MapReduce was originally designed as
a batch-oriented system, it is often used for
interactive data analysis. (examle)
• an interactive user would prefer a “quick and
dirty” approximation over a correct answer that
takes much longer to compute.
• How we extended our pipelined Hadoop
implementation to support online aggregation
within a single job (Section 4.1) and between
multiple jobs (Section 4.2).
4- Online Aggregation
• 4.1 Single-Job Online Aggregation
• In HOP, the data records produced by map
tasks are sent to reduce tasks shortly after
each record is generated.
• Snapshot is an output of the reduce task at a
certain time.
• It is important for us, how correct a snapshot
is. & how does snapshot coincide with the
correct data. It is a hard problem..
4- Online Aggregation
• Snapshots are computed periodically, as new data arrives at
each reducer.
• The user may
- specifiy how often snapshots should be computed
- specify whether to include data from tentative (unfinished)
map tasks
• if there are not enough free slots to allow all the reduce
tasks in a job to be scheduled, snapshots will not be
available for reduce tasks that are still waiting to be
executed
• Within a single job: periodically invoke reduce function at
each reduce task on available data
• Between jobs: periodically send a “snapshot” to consumer
jobs
4- Online Aggregation
Read
Input File
Block 1
HDFS
map
reduce
HDFS
Block 2
map
reduce
Write Snapshot
Answer
4- Online Aggregation
• 4.1 Multi-Job Online Aggregation
• Similar to the single-job Online Aggregation,
but approximate answers are pipelined to
map tasks of next job. (j1, j2, …)
• Unfortunately output of the reduce function is
not monotonic. Why that co-scheduling a
sequence of jobs is required.
• Consumer job computes an approximation
4- Online Aggregation
Write Answer
reduce
map
HDFS
reduce
Job 1
Reducers
map
Job 2
Mappers
4- Online Aggregation
• Fault Tolerance for Multi-Job Online
Aggregation:
• Let’s assume we have 2 jobs (j1,j2)… We suppose
3 cases:
• 1. Task in j1 fails (we discussed earlier)
• 2. Task in j2 fails (system restarts the failed task)
• 3. To handle failures in j1, we replace the most
recent snapshot from j2 to j1, with the failed one.
• If tasks from both jobs fail, a new task in j2 recovers the
most recent snapshot from j1.
Stream Processing
• MapReduce is often applied to streams of
data that arrive continuously
– Click streams, network traffic, web crawl data, …
• Traditional approach: buffer, batch process
1.Poor latency
2.Analysis state must be reloaded for each batch
• Instead, run MR jobs continuously, and
analyze data as it arrives
Monitoring
The thrashing host was detected very rapidly—notably faster than the 5-second
TaskTracker- JobTracker heartbeat cycle that is used to detect straggler tasks in
stock Hadoop. We envision using these alerts to do early detection of stragglers
within a MapReduce job.
5- Conclusion
• HOP extends the applicability of the model to
pipelining behaviors, while preserving the simple
programming model and fault tolerance of a fullfeatured MapReduce framework.
• Future topics
- Scheduling
- using MapReduce-style programming for even
more interactive applications.
Download