Experiences Using Cloud Computing for A Scientific

Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by NSF grant OC 0910812 This Talk  Experience in cloud computing talk  FutureGrid:      Pegasus-WMS Periodograms Experiments    2011-06-08 Hardware Middlewares Periodogram I Comparison of clouds using periodograms Periodogram II ScienceCloud’11 2 What is FutureGrid  Something Different For Everyone   6 centers across the nation     Nimbus Eucalyptus Moab “bare metal” Start here:  2011-06-08 Test bed for Cloud Computing (this talk). http://www.futuregrid.org/ ScienceCloud’11 3 What Comprises FutureGrid Resource Type Cores Host CPU + RAM IU india iDataPlex 1,416 8 x 2.9 GHz Xeon @ 24 GB IU xray Cray XT5m 672 2 x 2.6 GHz 285 (E6) @ 8 GB UofC hotel iDataPlex 672 8 x 2.9 GHz Xeon @ 24 GB UCSD sierra iDataPlex 672 8 x 2.5 GHz Xeon @ 32 GB UFl foxtrot iDataPlex 256 8 x 2.3 GHz Xeon @ 24 GB PowerEdge 1,016 8 x 2.6 GHz Xeon @ 12 GB TACC alamo TOTALS  Proposed:   2011-06-08 4,704 16 x (192 GB + 12 TB / node) cluster 8 node GPU-enhanced cluster ScienceCloud’11 4 Middlewares in FG Resource Type Cores Moab Eucalyptus Nimbus IU india iDataPlex 1,416 832 (59%) 400 (28%) - IU xray Cray XT5m 672 672 (100%) - - UofC hotel iDataPlex 672 336 (50%) - 336 (50%) UCSD sierra iDataPlex 672 312 (46%) 120 (18%) 160 (24%) UFl foxtrot iDataPlex 256 - - 248 (97%) PowerEdge 1,016 896 (88%) - 120 (12%) 4,704 3,048 (65%) 520 (11%) 744 (18%) TACC alamo TOTALS Available resources as of 2011-06-06 2011-06-08 ScienceCloud’11 5 Pegasus WMS I Automating Computational Pipelines Funded by NSF/OCI, is a collaboration with the Condor group at UW Madison Automates data management Captures provenance information Used by a number of domains   Across a variety of applications Scalability    2011-06-08 Handle large data (kB…TB), and Many computations (1…106 tasks) ScienceCloud’11 6 Pegasus WMS II    Reliability Retry computations from point of failure Construction of complex workflows     Can run pure locally, or Distributed among institutions  2011-06-08 Based on computational blocks Portable, reusable WF descr. Laptop, campus cluster, grid, cloud ScienceCloud’11 7 How Pegasus Uses FutureGrid  Focus on Eucalyptus and Nimbus   No Moab “bare metal” at this point During Experiments in Nov’ 2010    544 Nimbus cores 744 Eucalyptus cores 1,288 total potential cores    2011-06-08 across 4 clusters in 5 clouds. Actually used 300 physical cores (max). ScienceCloud’11 8 Pegasus FG Interaction 2011-06-08 ScienceCloud’11 9 Periodograms  Find extra-solar planets by   Wobbles in radial velocity of star, or Dips in star’s intensity Star Planet Star 2011-06-08 Time Brightness Red Blue Planet Light Curve Time ScienceCloud’11 10 Kepler Workflow    210k light-curves released in July 2010 Apply 3 algorithms to each curve Run entire data-set    This talk’s experiments:   2011-06-08 3 times, with 3 different parameter sets 1 algorithm, 1 parameter set, 1 run Either partial or full data-set ScienceCloud’11 11 Pegasus Periodograms  1st experiment is a “ramp-up”  Try to see where things trip     Across 3 comparable infrastructures 3rd experiment runs full set  2011-06-08 Already found places needing adjustments 2nd experiment also 16k light curves   16k light curves 33k computations (every light-curve twice) Testing hypothesized tunings ScienceCloud’11 12 Periodogram Workflow 2011-06-08 ScienceCloud’11 13 Excerpt: Jobs over Time 2011-06-08 ScienceCloud’11 14 Hosts, Tasks, and Duration (I) 100% 90% 80% 50 10,290 50 50 352 250.6 70% 60% 50% 20 20 6,678 17 20 126 20 40% 30% 140 29 20% 10% 28 7,080 30 30 8 Req. Hosts Avail. Hosts Eucalyptus india 86.8 162 0% 2011-06-08 7,134 20 30 77.5 Act. Hosts Eucalyptus sierra Nimbus sierra ScienceCloud’11 119.2 19 1,900 Jobs Tasks Nimbus foxtrot 0.4 Cum. Dur. (h) Nimbus hotel 15 Resource- and Job States (I) 2011-06-08 ScienceCloud’11 16 Cloud Comparison  Compare academic and commercial clouds     Constrained node- and core selection    2011-06-08 NERSC’s Magellan cloud (Eucalyptus) Amazon’s cloud (EC2), and FutureGrid’s sierra cloud (Eucalyptus) Because AWS costs $$ 6 nodes, 8 cores each node 1 Condor slot / physical CPU ScienceCloud’11 17 Cloud Comparison II Site CPU Magellan 8 x 2.6 GHz 19 (0) GB 5.2 h 226.6 h 43.6 Amazon 8 x 2.3 GHz 7 (0) GB 7.2 h 295.8 h 41.1 FutureGrid 8 x 2.5 GHz 29 (½) GB 5.7 h 248.0 h 43.5  Cum. Dur. Speed-Up Given 48 physical cores     2011-06-08 RAM (SW) Walltime Speed-up ≈ 43 considered pretty good AWS cost ≈ $31 7.2 h x 6 x c1.large ≈ $29 1.8 GB in + 9.9 GB out ≈ $2 ScienceCloud’11 18 Scaling Up I  Workflow optimizations    Submit-host Unix settings    Increase open file-descriptors limit Increase firewall’s open port range Submit-host Condor DAGMan settings  2011-06-08 Pegasus clustering ✔ Compress file transfers Idle job limit ✔ ScienceCloud’11 19 Scaling Up II  Submit-host Condor settings   Socket cache size increase File descriptors and ports per daemon   Remote VM Condor settings    2011-06-08 Using condor_shared_port daemon Use CCB for private networks Tune Condor job slots TCP for collector call-backs ScienceCloud’11 20 Hosts, Tasks, and Duration (II) 100% 90% 80% 50 50 356.5 4,012 101,283 1,428 34,480 809 21,539 1,074 24,600 125.8 1,135 28,761 102.3 Jobs Tasks Cum. Dur. (h) 70% 60% 50% 20 20 20 19 40% 30% 30 29 20% 10% 30 26 Req. Hosts Act. Hosts 164.1 85.5 0% Eucalyptus india 2011-06-08 Eucalyptus sierra Nimbus sierra ScienceCloud’11 Nimbus foxtrot Nimbus hotel 21 Resource- and Job States (II) 2011-06-08 ScienceCloud’11 22 Lose Ends    Saturate requested resources Clustering Better submit host tuning   2011-06-08 Requires better monitoring ✔ Better data staging ScienceCloud’11 23 Acknowledgements Funded by NSF grant OC 0910812      Ewa Deelman, Gideon Juve, Mats Rynge, Bruce Berriman FG help desk ;-) http://pegasus.isi.edu/ 2011-06-08 ScienceCloud’11 24

Experiences Using Cloud Computing for A Scientific

Related documents

Products

Support

Experiences Using Cloud Computing for A Scientific

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib