Prophesy: Analysis and Modeling of Parallel and Distributed Applications Valerie Taylor Texas A&M University Seung-Hye Jang, Mieke Prajugo, Xingfu Wu – TAMU Ewa Deelman – ISI Juan Gilbert – Auburn University Rick Stevens – Argonne National Laboratory SPONSORS: NSF, NASA Performance Modeling MD Code Throughput Timesteps/s 4 3 Exper. Theo. 2 1 12 1 81 49 25 9 1 0 Number of Processors Necessary for good performance Requires significant time and effort http://prophesy.cs.tamu.edu 2 Outline Prophesy Infrastructure Modeling Techniques Case Studies Summary http://prophesy.cs.tamu.edu 3 Problem Statement Given: • Performance models and analyses are critical – Requires significant development time • Parallel and distributed systems are complex Goal Efficient execution of parallel & distributed applications Proposed Solution • Automate as much as possible • Community involvement http://prophesy.cs.tamu.edu 4 Prophesy System PROPHESY GUI Profiling & Instrument. Template Database Model Builder Performance Database Actual Execution Performance Predictor Systems Database DATA COLLECTION http://prophesy.cs.tamu.edu DATABASES DATA ANALYSIS 5 Automated Instrumentation Profiling & Instrument. Actual Execution In-line data collection Instrument at one of several predefined levels Allow for user-specified instrumentation T=E * f; for (I=1; I<N; I++){ V(I) = A(I) * C(I); B(I) = A(2I + 4); } http://prophesy.cs.tamu.edu T=E * f; INSTRUMENTATION CODE for (I=1; I<N; I++){ V(I) = A(I) * C(I); B(I) = A(2I + 4); } INSTRUMENTATION CODE 6 Databases Template Database Performance Database Hierarchical organization Organized into 4 areas: • • • • Application Executable Run Performance Statistics Systems Database http://prophesy.cs.tamu.edu 7 Prophesy Database Application Executable Module_Info Modules Compilers Model Template Application Performance Run Inputs Systems Functions Function_Info Resource Connection Function Performance Basic Unit Performance Control Flow Model_Info http://prophesy.cs.tamu.edu Library Data Structure Performance 8 Data Analysis Model Builder Performance Predictor http://prophesy.cs.tamu.edu Develop performance models Make predictions Performance tune codes Identify best implementation Identify trends 9 Automated Modeling Techniques Utilize information in the template and system databases Currently include three techniques • Curve fitting • Parameterization • Composition using coupling values http://prophesy.cs.tamu.edu 10 Curve Fitting: Usage Analytical Equation (Octave: LSF) Matrix-matrix multiply: LSF : 3 Model Template http://prophesy.cs.tamu.edu Performance Data Application Performance Function Performance Basic Unit Performance Data Structure Performance 11 Matrix-matrix multiplication, 16P, IBM SP http://prophesy.cs.tamu.edu 12 Parameterization: Usage Analytical Equation (Octave: Parameterization) System Data: Matrix-matrix multiply: Parameterization : Parameter(P, SGI Origin2000, N, ADDM, MPISR, MPIBC) Model Template http://prophesy.cs.tamu.edu MPISR, MPIBC, ADDM Systems Resource Connection 13 Modeling Techniques Curve Fitting • Easy to generate the model • Very few exposed parameters Parameterization • Requires one-time manual analysis • Exposes many parameters • Explore different system scenarios Coupling • Builds upon previous techniques • Identify how to combine kernel models http://prophesy.cs.tamu.edu 14 Kernel Coupling Two kernels (i & j) Three measurements • Pi: performance of kernel i isolated • Pj: performance of kernel j isolated • Pij: performance of kernels i & j coupled Compute Cij = http://prophesy.cs.tamu.edu Pij Pi + Pj 15 Coupling Categories Cij = 1: no coupling Cij > 1: destructive coupling Cij < 1: constructive coupling http://prophesy.cs.tamu.edu 16 Coupling Categories Cij = 1: No Coupling Kernel A Kernel A Kernel B Shared Resource Kernel B Cij > 1: Destructive Coupling Kernel A Shared Resource http://prophesy.cs.tamu.edu Kernel B Cij < 1: Constructive Coupling Kernel A Kernel B Shared Resource 17 Using Coupling Parameters Use weighted averages to determine how to combine coupling values Example: • Given the pair-wise coupling values Want: T = Kernel A Kernel B Kernel C http://prophesy.cs.tamu.edu 1EA + 2EB + 3 EC 1 = 2 (CAB * PAB + CAC * PAC ) PAB + PAC = (CAB * PAB + CBC * PBC ) PAB + PBC 3 = (CBC * PBC+ CAC * PAC ) PBC + PAC 18 Composition Method Synthetic kernels (array updates) Kernel A (196.44) Kernel B (207.16) Kernel C (574.19) Kernel Pair Coupling A-B 0.97 B-C 0.75 C-A 0.76 1 = 0.8472 2 = 0.8407 3 = 0.7591 Actual total time: 799.63s Coupling time: 776.52s (Error: 2.89%) Adding individual times: 971.81s (Error: 23%) http://prophesy.cs.tamu.edu 19 Coupling Method: Usage Analytical Equation (Octave: Coupling) Data and System Info Adjacent Kernels Coupling Values and Performance data Run Functions Inputs Systems Control Flow http://prophesy.cs.tamu.edu Function Performance Coupling 20 Case Studies Predication: Resource Allocation • Grid Physics Network (GriPhyN) • Utilizes Grid 2003 infrastructure • GeoLIGO application Prediction: Resource Allocation • AADMLSS: Educational Application • Utilizes multiple servers http://prophesy.cs.tamu.edu 21 Case 1: GEO LIGO (GriPhyN) The pulsar search is a process of finding celestial objects that may emit gravitational waves • GEO (German-English Observatory) LIGO (Laser Interferometer Gravitationalwave Observatory) pulsar search is the most frequent coherent search method that generates F-statistic for known pulsars http://prophesy.cs.tamu.edu 22 GriPhyN Transform using VDL Chimera Virtual Data System Grid Middleware Submission Resource Selection Prophesy Ganglia Monitoring GRID 2003 http://prophesy.cs.tamu.edu 23 Resource Selector Prophesy Application Name Input Parameters, List of available sites http://prophesy.cs.tamu.edu Interface Predictor Rankings of sites Weights of each site 24 Grid2003 Testbed http://prophesy.cs.tamu.edu 25 Execution Environment Site Name CPUs Batch Compute Nodes Processors Cache Size Memory alliance.unm.edu (UNM) 436 PBS 1 X PIII 731 GHz 256 KB 1 GB atlas.iu.edu (IU) 400 PBS 2 X Intel Xeon 2.4 GHz 512 KB 2.5 GB pdsfgrid3.nersc.gov (PDSF) 349 LSF 2 X PIII 650-1.8 GHz 2 X AMD 2100+ - 2600+ 256 KB 2 GB atlas.dpcc.uta.edu (UTA) 158 PBS 2 X Intel Xeon 2.4 – 2.6 GHz 512 KB 2 GB nest.phys.uwm.edu (UWM) 296 CONDOR 1 X PIII 1GHz 256 KB 0.5 GB boomer1.oscer.ou.edu (OU) 286 PBS 3 X Intel Xeon 2 GHz 512 KB 2 GB cmsgrid.hep.wisc.edu (UWMadison) 64 CONDOR 1 X Intel Xeon 2.8 GHz 512 KB 2 GB cluster28.knu.ac.kr (KNU) 104 CONDOR 1 X AMD Athlon XP 1700+ 256 KB 0.8 GB acdc.ccr.buffalo.edu (Ubuffalo) 74 PBS 1 X Intel Xeon 1.6 GHz 256 KB 3.7 GB http://prophesy.cs.tamu.edu 26 Experimental Results Parameters Prediction-based Load-based Selected Site Time (sec) 59.05% UWMilwaukee 48065.83 60.09% 11360.28 74.91% KNU 7676.56 62.87% PDSF 20197.88 -9.37% UNM 77298.13 71.42% UTA 27412.45 40.84% UWMadison 31555.10 48.61% Ubuffalo 3226.00 57.67% UWMilwaukee 16009.82 91.47% IU 7343.37 8.44% KNU 8287.77 18.88% PDSF 13561.01 0.00% UNM 52379.31 74.65% 10121.27 Ubuffalo 19649.22 48.49% IU 11158.72 9.30% 5241.28 Ubuffalo 20799.05 74.80% UWM 51936.49 89.91% 19184.36 UWMadison 24995.94 23.25% OU 23441.16 18.16% IU 13278.68 UTA 20453.30 35.08% UWMadison 14137.44 6.07% IU 25021.39 UWMadison 26246.68 4.67% OU 31538.22 20.66% Alpha Freq 0.0065 0.002 PDSF 3863.66 UWMadison 9435.80 0.0085 0.001 IU 2850.39 UWMadison 0.0075 0.009 IU 22090.17 0.0055 0.009 IU 16216.25 0.0005 0.009 PDSF 1365.51 0.0075 0.003 PDSF 6723.30 0.0065 0.007 PDSF 13561.01 0.0085 0.004 PDSF 0.0035 0.005 PDSF 0.0065 0.009 IU 0.0045 0.009 0.0085 0.009 Average Site Time (sec) Random Site Time (sec) Error 33.68% Error 58.62% Case Study 2: AADMLSS African American Distributed Learning System (AADMLSS) developed by Dr. Juan E. Gilbert http://prophesy.cs.tamu.edu 28 Site Selection Process User logs into AADMLSS YES NO Valid Username and Password? First time access? YES NO Get last concept Measure Network Performance Get default concept Measure Server Performance Display Concept NO Pass Quiz? NO YES Current concept (different instructor) Select server with best overall site performance Next concept (same instructor) Exit? YES User logs out http://prophesy.cs.tamu.edu 29 Testbed Overview CATEGORY Hardware SPECS Loner (TX) Prophesy (TX) Tina (MA) Interact (AL) CPU Speed (MHz) 997.62 3056.85 1993.56 697.87 Bus Speed (MB/s) 205 856 638 214 Memory (MB) 256 2048 256 256 Hard Disk (GB) 30 146 40 10 Redhat Linux 9.0 Redhat Linux Enterprise 3.0 Redhat Linux 9.0 Redhat Linux 9.0 Apache 2.0 Apache 2.0 Apache 2.0 Apache 2.0 PHP 4.2 PHP 4.3 PHP 4.2 PHP 4.1 O/S Software Web Server Web Application http://prophesy.cs.tamu.edu 30 Course/Module/Co ncept DAY SRT-LOAD (%) NIGHT SRTRANDOM (%) SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 9.75 16.97 8.76 13.54 3/0/1 12.58 24.76 12.30 22.54 3/0/2 16.75 29.70 15.75 28.95 3/0/3 20.54 27.10 18.75 25.54 3/1/0 9.14 16.92 8.76 13.96 3/1/1 8.67 15.76 8.01 14.15 3/1/2 13.38 23.57 11.94 20.67 3/1/3 12.16 19.76 11.87 19.11 3/2/0 8.95 15.15 8.64 15.09 3/2/1 11.57 17.40 9.95 15.54 3/2/2 10.95 19.75 9.60 15.27 3/2/3 11.04 23.08 12.54 22.84 3/3/0 8.91 15.94 7.69 15.91 3/3/1 9.07 17.90 8.47 16.95 3/3/2 9.46 16.77 9.31 15.76 3/3/3 10.55 19.57 9.87 17.95 20.01 10.76 18.36 AVERAGE 11.47 http://prophesy.cs.tamu.edu 4-Servers 31 Results - 4 Servers Site Selection Distribution 100% 75% Loner Prophesy 50% Tina Interact 25% 0% Random (D) Random (N) http://prophesy.cs.tamu.edu Load (D) Load (N) SRT (D) SRT (N) 32 Results – 3 Servers Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 6.21 14.05 3/0/1 D 12.13 21.94 3/0/2 N 14.02 25.83 3/0/3 N 18.12 23.52 3/1/0 N 8.05 12.04 3/1/1 N 7.31 12.25 3/1/2 N 12.60 18.74 3/1/3 N 10.96 19.11 3/2/0 N 7.93 12.58 3/2/1 N 8.05 14.25 3/2/2 N 9.14 15.97 3/2/3 D 9.79 20.58 3/3/0 D 8.94 13.64 3/3/1 D 8.26 16.74 Site Selection Distribution 100% 75% Loner Tina Interact 50% 25% 0% 3/3/2 D 9.21 15.21 3/3/3 D 9.97 19.36 AVERAGE 10.04 17.24 http://prophesy.cs.tamu.edu Random (D) Random(N) Load (D) Load (N) SRT (D) SRT (N) 33 Results – 3 Servers Average Service Response Time - AGENT 3000 2500 Netw ork Delay Server Access Time msec 2000 1500 1000 3/0/0 D http://prophesy.cs.tamu.edu 3/0/1 D 3/0/2 N 3/0/3 N 3/2/0 N 3/2/1 N Loner Tina Interact Loner Tina Interact Loner Tina Interact Loner Tina Interact Loner Tina Interact Loner Tina Interact Loner Tina Interact 0 Loner Tina Interact 500 3/2/2 D 3/2/3 D 34 Results – 2 Servers Concept SRT-LOAD (%) SRT-RANDOM (%) 3/0/0 D 3.13 4.03 3/0/1 D 4.26 5.97 3/0/2 D 7.02 8.28 3/0/3 D 8.64 9.02 3/1/0 D 3.25 4.94 3/1/1 D 3.27 4.10 3/1/2 D 3.93 5.97 3/1/3 D 3.64 4.08 3/2/0 D 3.15 3.32 3/2/1 D 4.39 5.20 3/2/2 D 5.80 5.97 3/2/3 D 6.52 6.95 3/3/0 D 4.39 5.64 3/3/1 D 4.16 5.20 3/3/2 D 4.81 5.73 3/3/3 D 5.02 5.58 Site Selection Distribution (DAY) 100% 75% Tina Interact 50% 25% 0% AVERAGE 4.71 http://prophesy.cs.tamu.edu 5.62 Random Load SRT 35 Summary Prophesy Two case studies with resource allocation • Geo LIGO: on average 33% better than loadbased selection • AADMLSS: on average 4-11% better than loadbased selection Future work • Continue extending application base • Work on queue wait time predictions http://prophesy.cs.tamu.edu 36 Performance Analysis Projects Prophesy • http://prophesy.cs.tamu.edu • Published over 20 conference and journal papers PAPI • http://icl.cs.utk.edu/papi/ SCALEA-G • http://www.dps.uibk.ac.at/projects/scaleag/ PerfTrack • http://web.cecs.pdx.edu/~karavan/perftrack Paradyn • http://www.cs.wisc.edu/~paradyn/ Network Weather Service • http://nws.cs.ucsb.edu http://prophesy.cs.tamu.edu 37