1 - Cluster 2010

advertisement
The IEEE International Conference on Cluster Computing 2010
CDRM: A Cost-effective Dynamic
Replication Management Scheme
for Cloud Storage Cluster
Qingsong Wei
Data Storage Institute, A-STAR, Singapore
Bharadwaj Veeravalli, Bozhao Gong
National University of Singapore
Lingfang Zeng, Dan Feng
Huazhong University of Science & Technology, China
Agenda
Outline
1. Introduction
2. Problem Statement
3. Cost-effective Dynamic Replication
Management (CDRM)
4. Evaluation
5. Conclusion
Page 2 of 19
1,Outline
Introduction
HDFS Architecture
Clients
Meta Data
Name
Node
Network
Node
Disk
Disks
Node
Disk
Disks
Node
Disk
Disks
Node
Disk
Disks
Control
Node
Disk
Disks
Node
Disk
Disks
Data
Blocks
Data Nodes
Page 3 of 19
1. Introduction
 In the HDFS, files are striped into date blocks across multiple data
nodes to enable parallel access.
B1
Node1
B2
Node2
…
…
Bm
Data
Striping
Noden
 However, Block may be unaccessible due to date node
unavailable. If one of the blocks is unavailable, so as the whole
file.
 Failure is normal instead of exception in large scale storage cloud
system. Fault tolerance is required in such a system.
Page 4 of 19
1. Introduction
Clients
1
2
1
2
Data
nodes
3
4
4
5
5
4
5
2
3
 Replication is used in HDFS.
 When one data node fails, the data is still accessible from the
replicas and storage service need not be interrupted.
 Besides fault tolerance, replicas among data nodes can be used to
balance workload.
Page 5 of 19
2. Problem Statement
 Current replication managements
 Treat all data as same: same replica number for all data
 Treat all storage nodes as same
 Fixed and Static
1
2
5
3
1
4
4
2
5
5
3
1
4
2
3
 High cost & Poor load balance
Page 6 of 19
2. Problem Statement
 Replica number is critical to management cost. More replica, more cost.
1
The block 5
is modified
3
2
5
2
1
4
5
4
4
5
2
3
Update to maintain consistency
 Because large number of blocks are stored in system, even a small
increase of replica number can result in a significant increase of
management cost in the overall system.
 Then, how many minimal replica should be kept in the system to
satisfy availability requirement?
Page 7 of 19
2. Problem Statement
 Replica placement influences intra-request parallelism.
Client
B3
B2
File (B1, B2, B3)
B1
Requests
Blocked
B3
Data Node1
Sessionmax=3
Sessionfree=1
B2
B1
Data Node2
Sessionmax=3
Sessionfree=2
B1
Data Node3
Sessionmax=2
Sessionfree=0
Page 8 of 19
2. Problem Statement
 Replica placement also influences inter-request parallelism.
Client1
B3
B2
Client2
B1
B1
Requests
How to place these replicas among Data nodes
clusters in a balance way to improve access
parallelism?
B1
Data Node1
Sessionmax=3
Sessionfree=0
B2
B1
Data Node2
Sessionmax=3
Sessionfree=1
B3
Data Node3
Sessionmax=2
Sessionfree=0
Page 9 of 19
3. Cost-effective Dynamic Replication Management
 System Model
(p1, s1, r1, t1)
B1
(pj, sj, rj, tj)
……
Bj
(pM, sM, rM, tM)
……
BM
pj : popularity
sj : size
rj : replica number
tj : access latency
requirement
Total arrival rate: λ
Node1
(λ1, τ1, f1, c1)
…
Nodei
(λi, τi, fi, ci)
…
NodeN
(λN, τN, fN, cN)
 Data has different attributes
 Data nodes are different
λi : req. arr. rate
τi : average ser. time
fi : failure rate
ci : max sessions
Page 10 of 19
3. Cost-effective Dynamic Replication Management
 Availability
 Suppose file F is striped into m blocks {b1 , b2 ,…, bm}. To retrieve
whole file F, we must get all the m blocks.
 Availability is modeled as function of replica number.
m
rj
j 1
i 1
P( FA)  1   (1) j 1 Cmj ( fi ) j
 Suppose the expected availability for file F is Aexpect, which defined by
users. To satisfy the availability requirement for a given file, we get
m
rj
j 1
i 1
1   (1) j1 Cmj ( fi ) j  Aexp ect
Minimum replicas can be calculated from above Eq. for a
given expected availability.
Page 11 of 19
3. Cost-effective Dynamic Replication Management
 Blocking Probability
 Blocking probability is used as criterion to place replicas among data
nodes to improve load balance .
 An data node Si is modeled as M/G/ci system with arrival rate λi and
service time τi, and accordingly, the blocking probability of data node
Si can be given to be
(i i )  (i i ) 
Bi 

ci !  k 0 k! 
ci
ci
k
1
Replica placement policy: replica will be placed into data
node with lowest blocking probability to dynamically
maintain overall load balancing.
Page 12 of 19
3. Cost-effective Dynamic Replication Management
1
Request to create a file with
<Availability, Block Number>
Client
Bm …
B2
Name
Node
B1
3
Return replication
policy <Bi, Replication
factor, DataNode list>
2
Calculate the replication factor and
Search the Datanode B+Tree to
obtain Datanode list.
Flush and replicate
4
blocks to selected
Datanodes
Data
Nodes
Replication Pipelining
Framework of cost-effective dynamic replication
management in HDFS
Page 13 of 19
4. Evaluation
 Setup

Our test platform is built on a cluster with one name node and twenty
data nodes of commodity computer

The operating system is Red Hat AS4.4 with kernel 2.6.20.

Hadoop version is 0.16.1 and java version is 1.6.0.

AUSPEX file system trace is used

A synthesizer is developed to create workloads with different
characteristics, such as data sets of different sizes, varying data rates, and
different popularities. These characteristics reflect the differences among
various workloads to the cloud storage cluster.
Page 14 of 19
4. Evaluation
 Cost effective Availability
Initially, one replica per object.
CDRM only maintain minimal replicas to satisfy availability.
Higher failure rate, more replica required.
5
Replica Number



Failure Rate=0.2
Failure Rate=0.1
4
3
2
1
0
0
5
10
15
20
25
30
35
40
Time (Min)
Dynamic replication with Data node failure rate of
0.1 and 0.2 , Aexpect=0.8
Page 15 of 19
4. Evaluation
 Performance


CDRM vs. HDFS default Replication Management (HDRM) under different
popularity and workload intensity.
Performance of CDRM is much better than that of HDRM when popularity is small.
CDRM outperform HDRM under heavy workload.
Average Latency(ms)

40
35
30
25
20
15
10
5
0
HDRM λ=0.6
HDRM λ=0.2
10
20
30
40
50
60
CDRM λ=0.6
CDRM λ=0.2
70
80
90 100
Popularity(%)
Effect of popularity and access arrival rate, 20 data nodes
Page 16 of 19
4. Evaluation
 Load Balance
 The figure shows the difference of system utilization of each data node
comparing to the average system utilization of the cluster.
 CDRM can dynamically distribute workload among whole cluster.
CDRM
Diff. of sys. utilization
50
HDRM
40
30
20
10
0
-10
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
-20
-30
-40
-50
Data Node#
System utilization among data nodes, popularity=10%, λ=0.6
Page 17 of 19
5. Conclusion
Current replication
management policies
Data is same
Storage node are same
CDRM
Data is different
Storage nodes are different
Different replica number for
Same replica number for all data
different data
Static placement
Dynamic placement
High Cost
Poor load balance
Cost effective
Good balance
Low performance
High performance
Page 18 of 19
For more questions, please contact
Dr. Qingsong Wei by email:
WEI_Qingsong@dsi.a-star.edu.sg
Page 19 of 19
Download