HPMR Prefetching and Pre-shuffling in Shared

advertisement
HPMR : Prefetching and Pre-shuffling in
Shared MapReduce Computation
Environment
IEEE 2009
Sangwon Seo(KAIST), Ingook Jang
Kyungchang Woo, Inkyo Kim
Jin-Soo Kim, Seungyoul Maeng
2013.04.25
파일처리 특론
김태훈
Contents
1.
Introduction
2.
Related Work
3.
Design
4.
Implementation
5.
Evaluations
6.
Conclusion
2 /27
Introduction
 It


is difficult to deal internet services
Enormous volumes of data
Generate a large amount of data which needs to be
processed every day
 To
solve the problem, use MapReduce
programing model

Support distributed and parallel processing for
largescale data-intensive application
 data-intensive
simulation
application e.g : data mining, scientific
3 /27
Introduction
 Hadoop;

Since hadoop is distributed system, it’s called HDFS(Hadoop
distributed file system)
 HDFS

master server that manages the namespace of a file system,
regulates clients’ access to file
A Number of DataNode

manage storage directly attached to each DataNode
 HDFS

cluster is consist of
A Single NameNode


based on MapReduce
placement policy
place each of three replicas on each node in the local rack

Advantage : improve write performance by cutting down interrack write traffic
4 /27
Introduction
Node2
Node1
Files loaded from HDFS stores
file
file
Split
Split
Split
RR
RR
RR
map
map
map
Combiner
Partitione
r
Writeback to
Local HDFS
store

Input format
Input format
RecordReaders
“Shuffling” process
(over the N/W)
Split
Split
Split
RR
RR
RR
map
map
map
file
file
Combiner
Partitione
r
(sort)
(sort)
reduce
reduce
Output
Format
Output
Format
Essential to reduce the shuffling overhead to improve the overall performance of the
MapReduce computation.

the network bandwidth between nodes is also an important factor of the shuffling overhead.
5 /27
Introduction
 Hadoop’s

Moving computation is better
 Better
 It’s

basic principle
to migrate the computation closer
used for when the size of data set is huge
the migration of the computation minimizes network
congestion and increase the overall throughput1) of
the system.
1)Throughput : 지정된 시간 내 전송된 처리량
6 /27
Introduction
 HOD(Hadoop-On-Demand,

developed by Yahoo!)
a management system for provisioning virtual Hadoop
cluster over a large physical large physical cluster
 All
physical nodes are shared by more than one Yahoo!
Engineers

Increase the utilization of physical resource
 When
the computing resources are shared by
multiple users, Hadoop policy(‘Moving
computation’) is not effective

Because resource are shared
 Resource
e.g : computing n/w, hardware resource
7 /27
Introduction
 To
solve the that problem, two optimization
scheme is proposed

Prefetching
 Intra-block
prefetching
 Inter-block prefetching

Pre-shuffling
8 /27
Related work
 J.

Dean and S. Ghemawat
Traditional prefetching techniques
 V.
Padmanabhan and J.Mogul, T.Kroeger and D.
long, P. Cao,E. Felten et al.,

Prefetching method to reduce I/O latency
9 /27
Related work
 Zaharia

et al.,
LATE(Longest Approximation Time to End)
 More
efficiently in the shared environment
 Drayd(Microsoft)

Can be expressed as direct acyclic graph
 The
degree of data locality is highly related to the
MapReduce performance
10 /27
Design(Prefetching Scheme)
Assigned input split for map task
Computation
In progress
Prefetching
In progress
Fig.1. The intra-block prefetching in Map Phase
Expected data for reduce task
Computation
In progress
Prefetching
In progress
Fig.2. The intra-block prefetching in Reduce Phase

Intra2)-block prefetching


Bi-directional processing
A simple prefetching technique that prefetches data within a single
block while performing a complex computation
2)Intra : 안 내부
11 /27
Design(Prefetching Scheme)
 While
a complex job is performed in the left
side, the to be-required data are prefetched and
assigned in parallel to the corresponding task
 Advantage


of Intra-block prefetching
1. Using the concept of processing bar that monitors
the current status of each side and invokes a signal if
synchronization is about to be broken
2. Try find the appropriate prefetching rate at which the
performance can be maximized while minimizing the
prefetching overhead
 Can
be minimize the network overhead
3)At which : when, where
12 /27
Design(Prefetching Scheme)
1
n1
2
n2
n3
block
block
block
block
block
block
D=1 D=5 D=8
3

Inter-block prefetching

runs in block level, by prefetching the expected block replica4) to a
local rack
4)replica : 복제본
• A2, A3, A4 is prefetching the required blocks D=Distance
13 /27
Design(Prefetching Scheme)

Inter-block prefetching

runs in block level, by prefetching the expected block replica4) to a
local rack
4)replica : 복제본
• A2, A3, A4 is prefetching the required blocks
14 /27
Design(Prefetching Scheme)
 Inter-block
prefetching
processing Algorithm


1. Assign map task to the
node that are the nearest to
the required blocks
2. The predictor generates
the list of data blocks, B, to
be prefetched for the target
task t
15 /27
Design(Pre-Shuffling Scheme)
 Pre-Shuffling

processing
The pre-shuffling module
in the task scheduler
looks over input split or
candidate data in the map
phase, and predicts which
reducer the key-value
pairs are partitioned into.
16 /27
Design(Optimization)

LATE(Longest Approximation
Time to End) algorithm

How to robustly perform
speculative execution to maximize
performance under heterogenous
environment


Did not consider data locality that
can accelerate the MapReduce
computation further
D-LATE(Data-aware LATE)
algorithm

Almost the same LATE, except that
a task is assigned as nearly as
possible to the location where the
needed data are present
17 /27
Implementation – Optimizer
scheduler)
 Optimized

scheduler
Predictor module
 Not
only finds stragglers, but
also predicts candidate data
blocks and the reducers into
which the key-value pairs are
partitioned

D-LATE
 These
predictions, the
optimized scheduler perform
the D-LATE algorithm
18 /27
Implementation – Optimizer
scheduler)
 Prefetcher

To Monitor the status of worker
threads and to manage the
prefetching synchronization
with processing bar
 Load


Balancer
Check the logs(include dis usage
per node and current n/w traffic
per data block)
Invoke to maintain load
balancing based on disk usage
and n/w traffic
19 /27
Evaluation

Two dual-core 2.0Ghz AMD, 4GB main memory

400GB ATA Hard disk drives

Gigabit Ethernet n/w interface card




The entire nodes are divided in to 40racks which are connected with
L3 routers
Yahoo! Grid which consists of 1670 nodes
All test configured that HDFS maintains four replicas for each data
block, whose size is 128MB
Three type of workload ; wordcount, search log aggregator, similarity
calculator
20 /27
Evaluation
 Fig7, We can observe that HPMR
shows significantly better
performance than the native Hadoop
for all of test sets
 Fig8, #1 : smallest ratio of number of
nodes to the number of map tasks.

#5 : due to significant reduction
in shuffling overhead
21 /27
Evaluation
 The prefetching latency is
affected by disk overhead or n/w
congestion
 Therefore, the long prefetching
latency indicates that the
corresponding node is heavily
loaded
 Prefetching rate increases
beyond 60%
22 /27
Evaluation
 This means that HPMR assures
consistent performance even in the
shared environment such as
Yahoo!Grid where the available
bandwidth fluctuates severely.
 4Kbps ~ 128Kbps
23 /27
Conclusion
 Two

The prefetching scheme


innovative schemes
Exploits data locality
The pre-shuffling scheme

Reduce the network overhead required to shuffle key-value
pairs
 HPMR
is implemented as a plug-in type component
for Hadoop
 HPMR improves the overall performance by up to
73% compared to the native Hadoop
 Next, step we plan to evaluate a more complicated
workload such as HAMA(Open-source Apache
incubator project)
24 /27
Appendix : MapReduce Example
 MapReduce
Example : Weather data set 분석

하나의 레코드는 라인 단위로 저장되며, 이때 저장 타입은 ASCII 형태

하나의 파일에서 각 필드는 구분자없이 고정길이로 저장되어 있음

레코드 예제) 0057332130999991950010103004+51317+028783FM12+017199999V0203201N00721004501CN0100001N9-01281-01391102681
 질의

1901년 ~ 2001년 동안 작성된 NCDC 데이터 파일들로부터 각 년도별 가장 높은
기온(F)을 측정하라
Input:
1st Map:
2nd Map:
Shuffle:
Reduce:
Chunk(64MB) 단위
파일로부터
각 레코드로부터
연도별 데이터 그룹
최종 결과
데이터 파일
<offset, 레코드>추출
<연도, 기온> 추출
으로 정리
병합 및 반환
25 /27
Appendix : MapReduce Example

1st Map : 파일에서, <Offset, Record> 추출

<Key_1, Value> = <offset, record>
<0,
0067011990999991950051507004...9999999N9+00001+99999999999...>
<106, 0043011990999991950051512004...9999999N9+00221+99999999999...>
<212, 0043011990999991950051518004...9999999N9-00111+99999999999...>
<318, 0043012650999991949032412004...0500001N9+01111+99999999999...>
<424, 0043012650999991949032418004...0500001N9+00781+99999999999...>
...
연
도

2nd Map : 각 레코드별 Year, Temp 추출

기온
<Key_2, Value> = <year, Temp>
<1950, 0>
<1950, 22>
<1950, −11>
<1949, 111>
<1949, 78>
…
26 /27
Appendix : MapReduce Example

Shuffle

2nd Map의 결과가 너무 많기 때문에, 이를 각 연도별 데이터 그룹으로 다시 정리

Reduce 과정에서 병합시, 처리 비용 감소
2nd Map
<1950,
<1950,
<1950,
<1949,
<1949,

0>
22>
−11>
111>
78>
Shuffle
<1949, [111, 78]>
<1950, [0, 22, −11]>
Reduce : 모든 Map의 후보집합을 병합하여 최종 결과 반환
Mapper_1
(1950, [0, 22, −11])
(1949, [111, 78])
Mapper_2
Reducer
(1950, [0, 22, −11, 25, 15])
(1950, 25)
(1949, [111, 78, 30, 45])
(1949, 111)
(1950, [25, 15])
(1949, [30, 45])
27 /27
Appendix : Hadoop the Definitive
Guide p19~20
1
2
3
4
28 /27
Download