Performance - Directed Resource Allocation Seung-Hye Jang, Xingfu Wu, Valerie Taylor Department of Computer Science Texas A&M University The International Grid Performance Workshop, Edinburgh, UK, June 22-23, 2005 Outline Motivation & Goals Performance Prediction: Prophesy Case Studies Grid Physics Network (GriPhyN) Grid2003 infrastructure GEO LIGO pulsar search Educational Application Multiple servers ADDAMLSS http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 2 Motivation LHC Simulations Galaxy Formation Simulation http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 3 Distributed Systems are available… http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 4 Goals To efficiently map jobs to appropriate resources To present a resource planner that uses performance prediction based upon historical data To select resources to reduce the execution time of the application http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 5 Performance Prediction Prophesy (http://prophesy.cs.tamu.edu) Performance analysis and modeling of parallel and distributed applications Three main components: PAIDE System Database Model builder http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 6 Prophesy System Web-based Prophesy GUI Profiling & Instrumentation Template Database Performance Database Actual Model Builder Symbolic Predictor Execution Systems Database DATA COLLECTION http://prophesy.cs.tamu.edu DATABASES DATA ANALYSIS The International Grid Performance Workshop, June 23, 2005 7 Prophesy Model Builder Utilize historical information in the Prophesy databases Performance database Template database System database Three techniques Curve Fitting Parameterization Easy to generate the model Very few exposed parameters Requires one-time manual analysis Exposes many parameters , Explore different system scenarios Coupling Represents application performance in terms of its kernels and components Builds upon previous techniques – c fit for each kernel and combines them into polynomial function http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 8 Curve Fitting: Usage Analytical Equation (Octave: LSF) Matrix-matrix multiply: LSF : 3 Performance Data Model Template http://prophesy.cs.tamu.edu Application Performance Function Performance Basic Unit Performance Data Structure Performance The International Grid Performance Workshop, June 23, 2005 9 Matrix-matrix multiplication, 16P, IBM SP http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 10 Case Study 1: GriPhyN Transform using VDL Chimera Virtual Data System Grid Middleware Submission Resource Selection Prophesy Ganglia Monitoring GRID 2003 http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 11 Case Study 1: GriPhyN Transform using VDL Chimera Virtual Data System Grid Middleware Submission Resource Selection Prophesy Ganglia Monitoring GRID 2003 http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 12 Resource Selector Prophesy Application Name Input Parameters, List of available sites http://prophesy.cs.tamu.edu Interface Predictor Rankings of sites Weights of each site The International Grid Performance Workshop, June 23, 2005 13 Application - GEO LIGO Pulsar Search The pulsar search is a process of finding celestial objects that may emit gravitational waves GEO (German-English Observatory) LIGO (Laser Interferometer Gravitationalwave Observatory) pulsar search is the most frequent coherent search method that generates F-statistic for known pulsars http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 14 Grid2003 Testbed http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 15 Execution Environment Site Name CPUs Compute Nodes Batch Processors Cache Size Memory alliance.unm.edu (UNM) 436 PBS 1 X PIII 731 GHz 256 KB 1 GB atlas.iu.edu (IU) 400 PBS 2 X Intel Xeon 2.4 GHz 512 KB 2.5 GB pdsfgrid3.nersc.gov (PDSF) 349 LSF 2 X PIII 650-1.8 GHz 2 X AMD 2100+ - 2600+ 256 KB 2 GB atlas.dpcc.uta.edu (UTA) 158 PBS 2 X Intel Xeon 2.4 – 2.6 GHz 512 KB 2 GB nest.phys.uwm.edu (UWM) 296 CONDOR 1 X PIII 1GHz 256 KB 0.5 GB boomer1.oscer.ou.edu (OU) 286 PBS 3 X Intel Xeon 2 GHz 512 KB 2 GB cmsgrid.hep.wisc.edu (UWMadison) 64 CONDOR 1 X Intel Xeon 2.8 GHz 512 KB 2 GB cluster28.knu.ac.kr (KNU) 104 CONDOR 1 X AMD Athlon XP 1700+ 256 KB 0.8 GB PBS 1 X Intel Xeon 1.6 GHz 256 KB 3.7 GB acdc.ccr.buffalo.edu (Ubuffalo) 74 Selected Grid2003 Sites (Batch: local scheduler, VO: Virtual Organization) http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 16 Comparison Load-based selection method Ganglia [1] to monitor the Grid2003 Selects the least loaded site Random selection method [1] Ganglia, http://ganglia.sourceforge.net/. http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 17 Experimental Results Parameters Prediction-based Site Load-based Time (sec) Site Time (sec) Random Alpha Freq Error Selected Site 0.0065 0.002 PDSF 3863.66 UWMadison 9435.80 59.05% UWMilwaukee 0.0085 0.001 IU 2850.39 UWMadison 11360.28 74.91% 0.0075 0.009 IU 22090.17 PDSF 20197.88 0.0055 0.009 IU 16216.25 UTA 0.0005 0.009 PDSF 1365.51 0.0075 0.003 PDSF 6723.30 0.0065 0.007 PDSF 13561.01 0.0085 0.004 PDSF 0.0035 0.005 PDSF 0.0065 0.009 IU 0.0045 0.009 0.0085 0.009 Time (sec) Error 48065.83 60.09% KNU 7676.56 62.87% -9.37% UNM 77298.13 71.42% 27412.45 40.84% UWMadison 31555.10 48.61% Ubuffalo 3226.00 57.67% UWMilwaukee 16009.82 91.47% IU 7343.37 8.44% KNU 8287.77 18.88% PDSF 13561.01 0.00% UNM 52379.31 74.65% 10121.27 Ubuffalo 19649.22 48.49% IU 11158.72 9.30% 5241.28 Ubuffalo 20799.05 74.80% UWM 51936.49 89.91% 19184.36 UWMadison 24995.94 23.25% OU 23441.16 18.16% IU 13278.68 UTA 20453.30 35.08% UWMadison 14137.44 6.07% IU 25021.39 UWMadison 26246.68 4.67% OU 31538.22 20.66% Average http://prophesy.cs.tamu.edu 32.62% The International Grid Performance Workshop, June 23, 2005 52.01% 18 Case Study 2: AADMLSS African American Distributed Multiple Learning Styles System (AADMLSS) developed by Dr. Juan E. Gilbert http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 19 Site Selection Process User logs into AADMLSS YES NO Valid Username and Password? Measure Network Performance First time access? YES NO Get last concept Get default concept Measure Server Performance Display Concept NO Pass Quiz? NO YES Current concept (different instructor) Select server with best overall site performance Next concept (same instructor) Exit? YES User logs out http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 20 Testbed Overview CATEGORY Hardware SPECS Loner (TX) Tina (MA) Interact (AL) CPU Speed (MHz) 997.62 1993.56 697.87 Bus Speed (MB/s) 205 638 214 Memory (MB) 256 256 256 30 40 10 Redhat Linux 9.0 Redhat Linux 9.0 Redhat Linux 9.0 Web Server Apache 2.0 Apache 2.0 Apache 2.0 Web Application PHP 4.2 PHP 4.2 PHP 4.1 Hard Disk (GB) O/S Software http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 21 Experimental Results – 3 Servers Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 6.21 14.05 3/0/1 D 12.13 21.94 3/0/2 N 14.02 25.83 3/0/3 N 18.12 23.52 3/1/0 N 8.05 12.04 3/1/1 N 7.31 12.25 3/1/2 N 12.60 18.74 3/1/3 N 10.96 19.11 3/2/0 N 7.93 12.58 3/2/1 N 8.05 14.25 3/2/2 N 9.14 15.97 3/2/3 D 9.79 20.58 3/3/0 D 8.94 13.64 3/3/1 D 8.26 16.74 3/3/2 D 9.21 15.21 3/3/3 D 9.97 19.36 AVERAGE 10.04 17.24 http://prophesy.cs.tamu.edu Site Selection Distribution 100% 75% Loner Tina Interact 50% 25% 0% Random (D) Random(N) Load (D) Load (N) The International Grid Performance Workshop, June 23, 2005 SRT (D) SRT (N) 22 Experimental Results – 2 Servers (local and remote) Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 9.91 10.24 3/0/1 D 13.04 15.06 3/0/2 D 18.06 19.16 3/0/3 D 20.54 21.29 3/1/0 N 9.81 9.58 3/1/1 N 7.02 7.91 3/1/2 N 11.35 12.15 3/1/3 N 10.47 10.36 3/2/0 D 8.56 8.67 3/2/1 D 8.75 9.75 3/2/2 D 10.06 10.92 3/2/3 D 10.15 10.50 3/3/0 N 8.41 9.56 3/3/1 N 8.58 8.08 3/3/2 N 8.31 7.95 3/3/3 N 10.21 10.19 AVERAGE 10.83 11.34 http://prophesy.cs.tamu.edu Site Selection Distribution 100% 75% Loner Interact 50% 25% 0% Random (D) Random (N) Load (D) Load (N) The International Grid Performance Workshop, June 23, 2005 SRT (D) SRT (N) 23 Experimental Results - 2 Servers (remote and remote) Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 3.13 4.03 3/0/1 D 4.26 5.97 3/0/2 D 7.02 8.28 3/0/3 D 8.64 9.02 3/1/0 D 3.25 4.94 3/1/1 D 3.27 4.10 3/1/2 D 3.93 5.97 3/1/3 D 3.64 4.08 Tina 3/2/0 D 3.15 3.32 Interact 3/2/1 D 4.39 5.20 3/2/2 D 5.80 5.97 3/2/3 D 6.52 6.95 3/3/0 D 4.39 5.64 3/3/1 D 4.16 5.20 3/3/2 D 4.81 5.73 3/3/3 D 5.02 5.58 AVERAGE 4.71 5.62 http://prophesy.cs.tamu.edu Site Selection Distribution (DAY) 100% 75% 50% 25% 0% Random Load The International Grid Performance Workshop, June 23, 2005 SRT 24 Summary Presented a Resource selection method based upon performance predictions Illustrated the advantages of using performance predictions using two case studies Large scale scientific application GEO LIGO on Grid2003 An average 33% better than load-based selection Now considering queue wait time predictions AADMLSS on 3 servers An average 10% better than load-based selection http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 25 Thanks! Questions? http://prophesy.cs.tamu.edu Seung-Hye Jang (jangs@cs.tamu.edu) http://prophesy.cs.tamu.edu The International Grid Performance Workshop, June 23, 2005 26