Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min Park Jai-Hoon Kim Ajou University South Korea Ajou University, South Korea Contents Introduction to Data Grid Related Works Scheduling Model Scheduler Implementation Testbed and Application Results Conclusions 2 Ajou University, South Korea Introduction to Data Grid Data Grid Motivations Petabyte scale data production Distributed data storage to store parts of data Distributed computing resources which process the data Two Most Important Approaches for Data Grid Secure, reliable, and efficient data transport protocol (ex. GridFTP) Replication (ex. Replica catalog) Replication Large size files are partially replicated among sites Reduce data access time Application Scheduling, Dynamic replication issues are emerging 3 Ajou University, South Korea Related Works Data Grid Replica catalog – mapping from logical file name to physical instance GridFTP – Secure, reliable, and efficient file transfer protocol Job Scheduling Various scheduling algorithms for computational Grid Application Level Scheduling (AppLes) Large data collection has not been concerned Job Scheduling in Data Grid Roughly analytical and simulation studies are presented Our works define more in-depth scheduling model 4 Ajou University, South Korea Scheduling Model - Assumptions Assumptions Job (Data Processing) Requests Site A Scheduler data store computing facilities Site B Site D data store data store Internet computing facilities computing facilities Site C Site has both data storage and computing facilities Files are replicated at part of Grid sites Each site has different amount of computational capability Grid users request job execution through Job schedulers data store computing facilities 5 Ajou University, South Korea Scheduling Model - System Factors Dynamic system factors - Factors change over time Network bandwidth Data transfer time is proportional to network bandwidth NWS- tool for measuring and forecasting network bandwidth Available computing nodes Determines execution time of jobs Decided according to job load on a site System attributes Machine architecture (clusters, MPPs, etc) Processor speed, Available memory, I/O performance, etc. 6 Ajou University, South Korea Scheduling Model - System Factors Application specific factors - Unique factors Data Grid applications have Size of input data (replica) If not in the computing site, data fetch is needed Much time will be consumed to transfer large size data Size of application code Application code should be migrated to sites which perform computation Not critical to the overall performance (small size) Size of produced output data When the computing job takes place at the remote site, result data should be returned back to the local Strongly related to the size of input data 7 Ajou University, South Korea Scheduling Model - application scenarios The model consists of 5 distinct application scenarios 1. Local Data and Local Execution 2. Local Data and Remote Execution 3. Remote Data and Local Execution 4. Remote Data and Same Remote Execution 5. Remote Data and Different Remote Execution 8 Ajou University, South Korea Scheduling Model - application scenarios Terms in the scenarios Parameter Ni Number of available computing nodes at the site Dinput Size of input data (replica) Dapp Size of application codes Doutput Size of produced output data BWWAN(i 9 Meaning j) Bandwidth of WAN connection between sites BWLAN (i ) Bandwidth of LAN connection between nodes Execi Expected execution time of jobs Ajou University, South Korea Scheduling Model - application scenarios 1. Local Data and Local Execution Time1 Nlocal( Dinput Dapp) Doutput Execlocal BWLAN (local) Local Site Master Node input data app codes result data 10 Computing Node Computing Node Computing Node Execution Execution Execution Input data (replica) is located in local, and processing is performed with local available processors Data in move consists of Input data (replica) Application code Output data Cost consists of 1. Data transfer time between master and computing nodes via LAN 2. Job execution time using local processors Ajou University, South Korea Scheduling Model - application scenarios 2. Local Data and Remote Execution input Dapp Doutput Time2 D BW W AN( localremote_i ) Nremote_ i( Dinput Dapp) Doutput Execremote _ i BW LAN ( remote_ i ) Locally copied replica is transferred to remote computation site Cost consists of 1. Local Site Master Node input data app codes 2. result data Remote Site i Master Node input data WAN 3. app codes result data 11 Computing Node Computing Node Computing Node Execution Execution Execution Ajou University, South Korea Data (input+codes+output) movement time via WAN between local and remote site Data movement time via LAN in a remote site Job execution time on a remote site Scheduling Model - application scenarios 3. Remote Data and Local Execution input Time3 BW W AND ( localremote_ i ) Nlocal( Dinput Dapp) Doutput Execlocal BW LAN (local) Remote replica is copied into local site, and processing is performed on local Cost consists of 1. 2. Local Site Master Node app codes input data 3. WAN result data Remote Site i Replica Store 12 Computing Node Computing Node Computing Node Execution Execution Execution input data Ajou University, South Korea Input data movement time via WAN between local and remote site Data movement time via LAN in a local site Job execution time on a local processors Scheduling Model - application scenarios 4. Remote Data and Same Remote Execution app Doutput Time 4 BWD W AN( local remote_ i ) Nremote_ i( Dinput Dapp) Doutput Execremote _ i BW LAN ( remote_ i ) Remote site having replica performs computation Cost consists of 1. Local Site 2. Master Node app codes result data WAN Remote Site i Master Node app codes input data result data 13 Computing Node Computing Node Computing Node Execution Execution Execution Ajou University, South Korea 3. Data (code+output) movement time via WAN between local and remote site Data movement time via LAN in a remote site Job execution time on a remote site Scheduling Model - application scenarios 5. Remote Data and Different Remote Execution Dinput Dapp Doutput Time5 BW W AN( remote _ i remote_ j ) BW W AN(localremote_ j ) Nremote_ j( Dinput Dapp) Doutput Execremote _ BW LAN ( remote_ j ) Local Site j Remote Site i Master Node Remote site j performs computation with replica copied from remote site i Cost consists of 1. Replica Store app codes result data input data 2. WAN WAN 3. Remote Site j Master Node app codes input data 4. result data 14 Computing Node Computing Node Computing Node Execution Execution Execution Ajou University, South Korea Input replica movement time via WAN between remote site i and j Data (codes + output) movement time via WAN between local and remote j Data movement time via LAN in a remote site j Job execution time in a remote site j Scheduling Model - scheduler Operations of the scheduler 1. Predict the response time of each scenario 2. Compare the response time of scenarios 3. Choose the best scenario and sites holding data and to perform job execution 4. Requests data movement and job execution 15 Ajou University, South Korea Scheduler Implementation Develop scheduler prototype, called Chameleon, for evaluating the scheduling model Built on top of services provided by Globus HEP, Earth Observation, Biology Applications Chameleon Chameleon (Resource Scheduler) job submission take resource locations data copy Scheduler GRAM MDS GridFTP Replica Catalog gather informations Information Monitor Runner Middlewares GRAM MDS Replica Service Globus Grid Fabric (Resources) 16 Location Finder GridFTP Data Mover ... Network monitoring NWS Computational Resources, Storage, Networks, etc. Local schedulers NWS is used for measuring and forecasting network bandwidth Scheduling algorithms are based on the scheduling models presented Ajou University, South Korea Testbed for experiments Site Location Number of proc. Local Scheduler Ajou University S.Korea 8 PBS Yonsei Univ. 1 S.Korea 12 PBS Yonsei Univ. 2 S.Korea 12 PBS KISTI S.Korea 36 LSF KUT S.Korea 6 PBS Chonbuk Univ. S.Korea 1 Fork Pusan Univ. S.Korea 24 PBS POSTECH S.Korea 8 PBS AIST Japan 10 SGE 17 Ajou University, South Korea Applications Gene sequence comparison applications (Bioinformatics) Computationally intensive analysis on the large size protein database Bio-scientists predict structure and functions of newly found protein by comparing it with well known protein database The size of database reaches over 500 MB There are various versions of protein database Large databases are replicated in Data Grid Two well-known applications, Blast and FASTA, are executed 18 Ajou University, South Korea Applications - parameters Parameters PSI-BLAST FASTA Size of Input replica (Protein Database) 502 MB 502 MB Size of output data 10 MB 200 MB Size of application codes 7 MB 1 MB 19 Ajou University, South Korea Experimental Results (1) Yonsei Univ. SP LAB (site A) Yonsei Univ. BIO LAB (site B) X: execution Y: replica fetch Z: code+result move X: 2277 Y: 0 Z: 0 X: 1351 Y: 0 Z: 153 2000 Ajou Univ. (Local) AIST WAN X: 977 Y: 743 Z: 113 Time (site H) X: 1110 Y: 698 Z: 112 KUT X: 1216 Y: 0 Z: 115 1000 (site D) POSTECH KISTI (site F) (site C) Chonbuk Univ. Pusan Univ. (site E) (site G) : Site with replicated database 0 : Site without database local site A site B site C prediction(site A) Chameleon Replication scenario 20 Results when executing PSI-BLAST Ajou University, South Korea Experimental Results (2) X: execution Y: replica fetch Z: code+result move X: 1351 Y: 0 Z: 153 X: 977 Y: 743 Z: 113 Time X: 1110 Y: 698 Z: 112 3000 X: 1216 Y: 0 Z: 115 Time X: 2277 Y: 0 Z: 0 2000 X: 3140 Y: 0 Z: 0 X: 1637 Y: 0 Z: 1163 X: 1584 Y: 620 Z: 689 X: execution Y: replica fetch Z: code+result move X: 1473 Y: 628 Z: 402 X: 1401 Y: 700 Z: 314 2000 1000 1000 0 local site A site B site C prediction(site A) Chameleon 0 local site A site B site C prediction(site C) Chameleon Results on the previous slide 21 Results when executing FASTA in the above replication scenario Ajou University, South Korea Experimental Results (3) Yonsei Univ. SP LAB (site A) Yonsei Univ. BIO LAB (site B) 3000 Ajou Univ. (Local) AIST WAN X: 2277 Y: 0 Z: 0 X: 1351 Y: 932 Z: 41 X: 1813 Y: 791 Z: 45 X: execution Y: replica fetch Z: code+result move X: 977 Y: 1088 Z: 33 X: 1095 Y: 840 Z: 50 2000 Time (site H) KUT 1000 (site D) POSTECH KISTI (site F) (site C) : Site with replicated database Chonbuk Univ. Pusan Univ. (site E) (site G) 0 : Site without database local site A siteG site C prediction(site C) Chameleon No replication takes place 22 Results when executing PSIBLAST Ajou University, South Korea Experimental Results (4) Sites with Replica 2400 Local 2200 2 Local, E 3 Local, E, D 4 Local, E, D, F 5 Local, E, D, F, G 6 Local, E, D, F, G, H 7 Local, E, D, F, G, H, B 8 Local, E, D, F, G, H, B, A 9 Local, E, D, F, G, H, B, A, C Increasing the number of replica 23 Response-Time (sec.) Number of Replica 1 Prediction Actual Execution 2000 1800 1600 1400 1200 1000 1 2 3 4 5 6 7 Number of Replica 8 Decreasing response time Ajou University, South Korea 9 Conclusions Job scheduling models for Data Grid The models consist of 5 distinct scenarios Scheduler prototype, called Chameleon, is developed which is based on the presented scheduling models Perform meaningful experiments with Chameleon on a constructed Grid testbed We achieve better performance by considering data locations as well as computational capabilities 24 Ajou University, South Korea References ANTZ: http://www.antz.or.kr ApGrid: http://www.apgrid.org B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. “Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing,” IEEE Mass Storage Conference, 2001. Mark Baker, Rajkumar Buyya and Domenico Laforenza. “The Grid: International Efforts in Global Computing,” International Conference on Advances in Infrastructure for E-Business, Science, and Education on the Internet, SSGRR2000, L'Aquila, Italy, July 2000. F. Berman and R. Wolski. “The AppLes project: A status report,” Proceedings of the 8th NEC Research Symposium, Berlin, Germany, May 1997. Rajkumar Buyya, Kim Branson, Jon Giddy and David Abramson. “The Virtual Laboratory: A Toolset for Utilising the World-Wide Grid to Design Drugs,” 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May 2002. CERN DataGrid Project: http://www.cern.ch/grid/ Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke. “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Applications, 23:187-200, 2001. Dirk Düllmann, Wolfgang Hoschek, Javier Jean-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Models for Replica Synchronisation and Consistency in a Data Grid,” 10th IEEE Symposium on High Performance and Distributed Computing (HPDC-10), San Francisco, California, August 2001. I. Foster and C. Kesselman. “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, 1999. I. Foster, C. Kesselman and S. Tuecke. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International J. Supercomputer Applications, 15(3), 2001. Cynthia Gibas. “Developing Bioinformatics Computer Skills,” O’REILLY, April 2001. The Globus Project: http://www.globus.org 25 Ajou University, South Korea References Leanne Guy, Erwin Laure, Peter Kunszt, Heinz Stockinger, and Kurt Stockinger. “Replica management in data grids,” Technical report, Global Grid Forum Informational Document, GGF5, Edinburgh, Scotland, July 2002. Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Data Management in an International Data Grid Project,” 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000. Kavitha Ranganathan and Ian Foster. “Decoupling Computation and Data Scheduling in Distributed DataIntensive Applications,” 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, July 2002. Kavitha Ranganathan and Ian Foster. “Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid,” International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001. Kavitha Ranganathan and Ian Foster. “Identifying Dynamic Replication Strategies for a High Performance Data Grid,” International Workshop on Grid Computing, Denver, November 2001. Heinz Stockinger, Kurt Stockinger, Erich Schikuta and Ian Willers. “Towards a Cost Model for Distributed and Replicated Data Stores,” 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001, Mantova, Italy, February 2001. S. Vazhkudai, S. Tuecke and I. Foster. “Replica Selection in the Globus Data Grid,” Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), Brisbane, Australia, May 2001. Rich Wolski, Neil Spring, and Jim Hayes. “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Journal of Future Generation Computing Systems, Volume 15, Numbers 5-6, pp. 757-768, October 1999. 26 Ajou University, South Korea