Mobile Computing - University of Virginia

advertisement

GCC 2003 Presentation

Dynamic Data Grid Replication

Strategy based on Internet Hierarchy

Sang Min Park

, Jai-Hoon Kim, and Young-Bae Ko

Ajou University

South Korea

Ajou University, South Korea

GCC 2003 Presentation

Contents

Introduction to Data Grid

Optimizations in Data Grid

Novel Replication Strategy based on Internet Hierarchy

Simulation

Simulation Results

Conclusions

2 Ajou University, South Korea

Introduction to Data Grid

3

Data Grid Motivations

Petabyte scale data production

Distributed data storage to store parts of data

Distributed computing resources which process the data

Two Most Important Approaches for Data Grid

Secure, reliable, and efficient data transport protocol

(ex. GridFTP)

Replication (ex. Replica catalog)

Replication

Large size files are partially replicated among sites

Reduce data access time

Application Scheduling, Dynamic replication issues are emerging

Ajou University, South Korea

GCC 2003 Presentation

Introduction to Data Grid

User

Typical Job Execution Scenario

Job Broker

Internet and

Data Grid

Grid site

CE SE

Job Distribution

Grid site

CE SE

CE

Grid site

SE

CE

Grid site

SE

Data Fetch and

Replication

Grid site

CE

Grid site

SE

Grid site

CE SE

Initial replication

Data from

Experiments (HEP,

Bioinformatics, etc)

Master Site

Large-size

Storage

Ajou University, South Korea 4

GCC 2003 Presentation

Optimizations in Data Grid

GCC 2003 Presentation

Reducing the Overall Job Execution Time

Scheduling Optimization

Deciding where to allocate the job

Considering location of replicas and computational capabilities of sites

Short-term Optimization

Deciding from where to fetch replicas

Considering available network bandwidth between sites

Long-term Optimization (Dynamic Replication Strategy)

Shortage of storage in a site

Deciding which file should be remaining as a replica

Better to replicate popular files because of its future usage

5 Ajou University, South Korea

GCC 2003 Presentation

Existing Dynamic Replication Strategies

Replica Optimization based on Site-level Locality

Replicate the file that is predicted to be used in future from the perspective of a site

Try to reduce the number of fetch

Delete Oldest, Delete LRU Method

Economic Strategy from European Data Grid

Developing OptorSim –Data Grid Optimization Simulator

Using Auction Protocol to trigger Long-term Optimization

Site-level Locality based on File access patterns

6 Ajou University, South Korea

GCC 2003 Presentation

Existing Dynamic Replication Strategies

The Limitations of the site-level optimization

A Site certainly have limitations of their storage size , which means that the rate of data request locality is also limited

There should be predictable file access patterns , but we do not know if there will be.

7 Ajou University, South Korea

GCC 2003 Presentation

Replication Strategy based on

Bandwidth Hierarchy (BHR)

Network-level Locality

A site is not the only possible source of locality

Another source of locality : Network-level locality

8

Slow Replica

Transmission

Fast Replica

Transmission

Network Region

(e.g., a country)

If the replica is located in a close site, not long delay would be taken to fetch this replica

Ajou University, South Korea

GCC 2003 Presentation

Replication Strategy based on

Bandwidth Hierarchy (BHR)

Bandwidth Hierarchy

Network Region Network Region

Network

Region

Narrow bandwidth

Contending network traffic

Grid Site

Router

Data moving path between sites within region

Data moving path between sites across regions

Ajou University, South Korea 9

GCC 2003 Presentation

Replication Strategy based on

Bandwidth Hierarchy (BHR)

Maximizing Network-level locality

1. Avoiding Replica Duplication in a region

No space here!

We should remove some file

A

Receiving New

Replica

X a Site

Delete this one!

Replica X is duplicated here!

X a Site

A Region

2. Considering popularity of file request at the region-level

10 Ajou University, South Korea

Simulation

OptorSim

Data Grid Dynamic Replication Simulation tool

Developed as part of European Data Grid Project

Implemented in Java

GCC 2003 Presentation

Implemented Our own Region-based Optimizer in OptorSim

11 Ajou University, South Korea

Simulation

Simulation Environment

Region A Region B

GCC 2003 Presentation

Region C Region D

Grid Site

Router

Master Site

Ajou University, South Korea

Intra-region link

Inter-region link

Link to master

12

Simulations

General configuration of parameters

Parameters

Number of jobs

Number of job types

Number of file accessed per job

Size of single file

Total size of files

Values

1000

50

15

1 GB

750 GB

Bandwidth and Storage Size

Parameters

Intra-region bandwidth

Inter-region bandwidth

Master-router bandwidth

Storage space at site

Values

1000 Mbps

1000 Mbps

2000 Mbps

50 GB

Ajou University, South Korea 13

GCC 2003 Presentation

Simulation Results

14

60000

50000

40000

30000

20000

10000

0

33174.9

47962.2

50070.9

BHR Delete LRU

Dynamic Replication Strategy

Delete Oldest

Total Job times of three strategies

Ajou University, South Korea

GCC 2003 Presentation

GCC 2003 Presentation

Simulation Results

60000

50000

40000

30000

20000

10000

0

BHR

Delete LRU

Delete Oldest

600 800 1000 1200 1400 1600 1800 2000

Bandwidth of Inter-region link

70000

60000

50000

40000

30000

20000

10000

0

30

BHR

Delete LRU

Delete Oldest

50 70 90

Size of SE

110 130

Total job time with varying bandwidth and storage size

15 Ajou University, South Korea

GCC 2003 Presentation

Conclusions

The existing dynamic replication strategies are based only on site-level locality of file request

BHR strategy is based on the network-locality

BHR shows quite good performance when hierarchy of bandwidth clearly appears, and size of storage at a site is small

We extend current site-level replica optimization study to more scalable way

16 Ajou University, South Korea

GCC 2003 Presentation

References

William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, and

Floriano Zini.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proc. of the 3rd Int'l. IEEE Workshop on Grid Computing (Grid'2002), Baltimore, USA, November

2002. Springer Verlag, Lecture Notes in Computer Science.

William H. Bell, David G. Cameron, Ruben Carvajal-Schiaffino, A. Paul Millar, Kurt

Stockinger, and Floriano Zini.: Evaluation of an Economy-Based File Replication Strategy for a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at

CCGrid 2003, Tokyo, Japan, May 2003. IEEE Computer Society Press.

Mark Carman, Floriano Zini, Luciano Serafini, and Kurt Stockinger.: Towards an Economy-

Based Optimisation of File Access and Replication on a Data Grid. In International

Workshop on Agent based Cluster and Grid Computing at International Symposium on

Cluster Computing and the Grid (CCGrid'2002), Berlin, Germany, May 2002. IEEE

Computer Society Press.

Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke.: The

Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large

Scientific Datasets. Journal of Network and Computer Applications , 23:187-200, 2001.

EU Data Grid Project: http://www.eu-datagrid.org

17 Ajou University, South Korea

GCC 2003 Presentation

References

I. Foster, C. Kesselman and S. Tuecke.: The Anatomy of the Grid: Enabling Scalable Virtual

Organizations. International J. Supercomputer Applications, 15(3), 2001.

Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt

Stockinger.: Data Management in an International Data Grid Project. 1st IEEE/ACM

International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000.

OptorSim – A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edgwp2/optimization/optorsim.html

Sang-Min Park and Jai-Hoon Kim.: Chameleon: A Resource Scheduler in a Data Grid

Environment. 2003 IEEE/ACM International Symposium on Cluster Computing and the Grid

(CCGRID'2003), Tokyo, Japan, May 2003. IEEE Computer Society Press.

Kavitha Ranganathan and Ian Foster.: Design and Evaluation of Dynamic Replication

Strategies for a High Performance Data Grid. International Conference on Computing in

High Energy and Nuclear Physics, Beijing, September 2001.

Kavitha Ranganathan and Ian Foster.: Identifying Dynamic Replication Strategies for a High

Performance Data Grid. International Workshop on Grid Computing, Denver, November

2001.

18 Ajou University, South Korea

Download