Prophesy: Analysis and Modeling of Parallel and Distributed Applications

advertisement
Prophesy: Analysis and Modeling of Parallel
and Distributed Applications
Valerie Taylor
Texas A&M University
Seung-Hye Jang, Mieke Prajugo, Xingfu Wu – TAMU
Ewa Deelman – ISI
Juan Gilbert – Auburn University
Rick Stevens – Argonne National Laboratory
SPONSORS: NSF, NASA
Performance Modeling
MD Code Throughput
Timesteps/s
4
3
Exper.
Theo.
2
1
12
1
81
49
25
9
1
0
Number of Processors


Necessary for good performance
Requires significant time and effort
http://prophesy.cs.tamu.edu
2
Outline

Prophesy Infrastructure

Modeling Techniques

Case Studies

Summary
http://prophesy.cs.tamu.edu
3
Problem Statement

Given:
• Performance models and analyses are critical
– Requires significant development time
• Parallel and distributed systems are complex
 Goal
 Efficient execution of parallel & distributed
applications

Proposed Solution
• Automate as much as possible
• Community involvement
http://prophesy.cs.tamu.edu
4
Prophesy System
PROPHESY GUI
Profiling &
Instrument.
Template
Database
Model
Builder
Performance
Database
Actual
Execution
Performance
Predictor
Systems
Database
DATA
COLLECTION
http://prophesy.cs.tamu.edu
DATABASES
DATA
ANALYSIS
5
Automated Instrumentation


Profiling &
Instrument.

Actual
Execution
In-line data collection
Instrument at one of several predefined levels
Allow for user-specified
instrumentation
T=E * f;
for (I=1; I<N; I++){
V(I) = A(I) * C(I);
B(I) = A(2I + 4);
}
http://prophesy.cs.tamu.edu
T=E * f;
INSTRUMENTATION CODE
for (I=1; I<N; I++){
V(I) = A(I) * C(I);
B(I) = A(2I + 4);
}
INSTRUMENTATION CODE
6
Databases

Template
Database
Performance
Database

Hierarchical organization
Organized into 4 areas:
•
•
•
•
Application
Executable
Run
Performance Statistics
Systems
Database
http://prophesy.cs.tamu.edu
7
Prophesy Database
Application
Executable
Module_Info
Modules
Compilers
Model
Template
Application
Performance
Run
Inputs
Systems
Functions
Function_Info
Resource
Connection
Function
Performance
Basic Unit
Performance
Control
Flow
Model_Info
http://prophesy.cs.tamu.edu
Library
Data Structure
Performance
8
Data Analysis

Model
Builder


Performance
Predictor


http://prophesy.cs.tamu.edu
Develop performance
models
Make predictions
Performance tune codes
Identify best
implementation
Identify trends
9
Automated Modeling Techniques


Utilize information in the template and
system databases
Currently include three techniques
• Curve fitting
• Parameterization
• Composition using coupling values
http://prophesy.cs.tamu.edu
10
Curve Fitting: Usage
Analytical Equation
(Octave: LSF)
Matrix-matrix multiply:
LSF : 3
Model
Template
http://prophesy.cs.tamu.edu
Performance
Data
Application
Performance
Function
Performance
Basic Unit
Performance
Data
Structure
Performance
11
Matrix-matrix multiplication, 16P, IBM SP
http://prophesy.cs.tamu.edu
12
Parameterization: Usage
Analytical Equation
(Octave: Parameterization)
System Data:
Matrix-matrix multiply:
Parameterization :
Parameter(P, SGI Origin2000, N, ADDM, MPISR,
MPIBC)
Model
Template
http://prophesy.cs.tamu.edu
MPISR, MPIBC, ADDM
Systems
Resource
Connection
13
Modeling Techniques

Curve Fitting
• Easy to generate the model
• Very few exposed parameters

Parameterization
• Requires one-time manual analysis
• Exposes many parameters
• Explore different system scenarios

Coupling
• Builds upon previous techniques
• Identify how to combine kernel models
http://prophesy.cs.tamu.edu
14
Kernel Coupling


Two kernels (i & j)
Three measurements
• Pi: performance of kernel i isolated
• Pj: performance of kernel j isolated
• Pij: performance of kernels i & j coupled

Compute Cij =
http://prophesy.cs.tamu.edu
Pij
Pi + Pj
15
Coupling Categories



Cij = 1: no coupling
Cij > 1: destructive coupling
Cij < 1: constructive coupling
http://prophesy.cs.tamu.edu
16
Coupling Categories
Cij = 1: No Coupling
Kernel A
Kernel A
Kernel B
Shared
Resource
Kernel B
Cij > 1: Destructive Coupling
Kernel A
Shared
Resource
http://prophesy.cs.tamu.edu
Kernel B
Cij < 1: Constructive Coupling
Kernel A
Kernel B
Shared
Resource
17
Using Coupling Parameters


Use weighted averages to determine how to
combine coupling values
Example:
• Given the pair-wise coupling values
Want: T =
Kernel A
Kernel B
Kernel C
http://prophesy.cs.tamu.edu
 1EA +  2EB +  3 EC
1
=
2
(CAB * PAB + CAC * PAC )
PAB + PAC
=
(CAB * PAB + CBC * PBC )
PAB + PBC
3
=
(CBC * PBC+ CAC * PAC )
PBC + PAC
18
Composition Method

Synthetic kernels (array updates)
Kernel A (196.44)
Kernel B (207.16)
Kernel C (574.19)
Kernel Pair
Coupling
A-B
0.97
B-C
0.75
C-A
0.76
 1 = 0.8472  2 = 0.8407
 3 = 0.7591
Actual total time: 799.63s
Coupling time: 776.52s (Error: 2.89%)
Adding individual times: 971.81s (Error: 23%)
http://prophesy.cs.tamu.edu
19
Coupling Method: Usage
Analytical Equation
(Octave: Coupling)
Data and
System Info
Adjacent Kernels
Coupling
Values and
Performance
data
Run
Functions
Inputs
Systems
Control
Flow
http://prophesy.cs.tamu.edu
Function
Performance
Coupling
20
Case Studies

Predication: Resource Allocation
• Grid Physics Network (GriPhyN)
• Utilizes Grid 2003 infrastructure
• GeoLIGO application

Prediction: Resource Allocation
• AADMLSS: Educational Application
• Utilizes multiple servers
http://prophesy.cs.tamu.edu
21
Case 1: GEO LIGO (GriPhyN)

The pulsar search is a
process of finding celestial
objects that may emit
gravitational waves
• GEO (German-English
Observatory) LIGO (Laser
Interferometer Gravitationalwave Observatory) pulsar
search is the most frequent
coherent search method that
generates F-statistic for
known pulsars
http://prophesy.cs.tamu.edu
22
GriPhyN
Transform
using VDL
Chimera
Virtual Data
System
Grid Middleware
Submission
Resource
Selection
Prophesy
Ganglia
Monitoring
GRID
2003
http://prophesy.cs.tamu.edu
23
Resource Selector
Prophesy
Application Name
Input
Parameters,
List of available
sites
http://prophesy.cs.tamu.edu
Interface
Predictor
Rankings of sites
Weights of each
site
24
Grid2003 Testbed
http://prophesy.cs.tamu.edu
25
Execution Environment
Site Name
CPUs
Batch
Compute Nodes
Processors
Cache
Size
Memory
alliance.unm.edu (UNM)
436
PBS
1 X PIII 731 GHz
256 KB
1 GB
atlas.iu.edu (IU)
400
PBS
2 X Intel Xeon 2.4 GHz
512 KB
2.5 GB
pdsfgrid3.nersc.gov (PDSF)
349
LSF
2 X PIII 650-1.8 GHz
2 X AMD 2100+ - 2600+
256 KB
2 GB
atlas.dpcc.uta.edu (UTA)
158
PBS
2 X Intel Xeon 2.4 – 2.6 GHz
512 KB
2 GB
nest.phys.uwm.edu (UWM)
296
CONDOR
1 X PIII 1GHz
256 KB
0.5 GB
boomer1.oscer.ou.edu (OU)
286
PBS
3 X Intel Xeon 2 GHz
512 KB
2 GB
cmsgrid.hep.wisc.edu
(UWMadison)
64
CONDOR
1 X Intel Xeon 2.8 GHz
512 KB
2 GB
cluster28.knu.ac.kr (KNU)
104
CONDOR
1 X AMD Athlon XP 1700+
256 KB
0.8 GB
acdc.ccr.buffalo.edu (Ubuffalo)
74
PBS
1 X Intel Xeon 1.6 GHz
256 KB
3.7 GB
http://prophesy.cs.tamu.edu
26
Experimental Results
Parameters
Prediction-based
Load-based
Selected Site
Time
(sec)
59.05%
UWMilwaukee
48065.83
60.09%
11360.28
74.91%
KNU
7676.56
62.87%
PDSF
20197.88
-9.37%
UNM
77298.13
71.42%
UTA
27412.45
40.84%
UWMadison
31555.10
48.61%
Ubuffalo
3226.00
57.67%
UWMilwaukee
16009.82
91.47%
IU
7343.37
8.44%
KNU
8287.77
18.88%
PDSF
13561.01
0.00%
UNM
52379.31
74.65%
10121.27
Ubuffalo
19649.22
48.49%
IU
11158.72
9.30%
5241.28
Ubuffalo
20799.05
74.80%
UWM
51936.49
89.91%
19184.36
UWMadison
24995.94
23.25%
OU
23441.16
18.16%
IU
13278.68
UTA
20453.30
35.08%
UWMadison
14137.44
6.07%
IU
25021.39
UWMadison
26246.68
4.67%
OU
31538.22
20.66%
Alpha
Freq
0.0065
0.002
PDSF
3863.66
UWMadison
9435.80
0.0085
0.001
IU
2850.39
UWMadison
0.0075
0.009
IU
22090.17
0.0055
0.009
IU
16216.25
0.0005
0.009
PDSF
1365.51
0.0075
0.003
PDSF
6723.30
0.0065
0.007
PDSF
13561.01
0.0085
0.004
PDSF
0.0035
0.005
PDSF
0.0065
0.009
IU
0.0045
0.009
0.0085
0.009
Average
Site
Time
(sec)
Random
Site
Time (sec)
Error
33.68%
Error
58.62%
Case Study 2: AADMLSS
African American Distributed Learning System (AADMLSS) developed by Dr. Juan E. Gilbert
http://prophesy.cs.tamu.edu
28
Site Selection Process
User logs
into AADMLSS
YES
NO
Valid Username
and Password?
First time
access?
YES
NO
Get last concept
Measure Network
Performance
Get default concept
Measure Server
Performance
Display Concept
NO
Pass Quiz?
NO
YES
Current concept
(different instructor)
Select server with best
overall site performance
Next concept
(same instructor)
Exit?
YES
User logs out
http://prophesy.cs.tamu.edu
29
Testbed Overview
CATEGORY
Hardware
SPECS
Loner (TX)
Prophesy (TX)
Tina (MA)
Interact (AL)
CPU Speed
(MHz)
997.62
3056.85
1993.56
697.87
Bus Speed
(MB/s)
205
856
638
214
Memory (MB)
256
2048
256
256
Hard Disk (GB)
30
146
40
10
Redhat Linux
9.0
Redhat Linux Enterprise
3.0
Redhat Linux
9.0
Redhat Linux
9.0
Apache 2.0
Apache 2.0
Apache 2.0
Apache 2.0
PHP 4.2
PHP 4.3
PHP 4.2
PHP 4.1
O/S
Software
Web Server
Web Application
http://prophesy.cs.tamu.edu
30
Course/Module/Co
ncept
DAY
SRT-LOAD
(%)
NIGHT
SRTRANDOM
(%)
SRT-LOAD
(%)
SRT-RANDOM
(%)
3/0/0
9.75
16.97
8.76
13.54
3/0/1
12.58
24.76
12.30
22.54
3/0/2
16.75
29.70
15.75
28.95
3/0/3
20.54
27.10
18.75
25.54
3/1/0
9.14
16.92
8.76
13.96
3/1/1
8.67
15.76
8.01
14.15
3/1/2
13.38
23.57
11.94
20.67
3/1/3
12.16
19.76
11.87
19.11
3/2/0
8.95
15.15
8.64
15.09
3/2/1
11.57
17.40
9.95
15.54
3/2/2
10.95
19.75
9.60
15.27
3/2/3
11.04
23.08
12.54
22.84
3/3/0
8.91
15.94
7.69
15.91
3/3/1
9.07
17.90
8.47
16.95
3/3/2
9.46
16.77
9.31
15.76
3/3/3
10.55
19.57
9.87
17.95
20.01
10.76
18.36
AVERAGE
11.47
http://prophesy.cs.tamu.edu
4-Servers
31
Results - 4 Servers
Site Selection Distribution
100%
75%
Loner
Prophesy
50%
Tina
Interact
25%
0%
Random (D) Random (N)
http://prophesy.cs.tamu.edu
Load (D)
Load (N)
SRT (D)
SRT (N)
32
Results – 3 Servers
Concept
SRT-LOAD (%)
SRT-RANDOM (%)
3/0/0 D
6.21
14.05
3/0/1 D
12.13
21.94
3/0/2 N
14.02
25.83
3/0/3 N
18.12
23.52
3/1/0 N
8.05
12.04
3/1/1 N
7.31
12.25
3/1/2 N
12.60
18.74
3/1/3 N
10.96
19.11
3/2/0 N
7.93
12.58
3/2/1 N
8.05
14.25
3/2/2 N
9.14
15.97
3/2/3 D
9.79
20.58
3/3/0 D
8.94
13.64
3/3/1 D
8.26
16.74
Site Selection Distribution
100%
75%
Loner
Tina
Interact
50%
25%
0%
3/3/2 D
9.21
15.21
3/3/3 D
9.97
19.36
AVERAGE
10.04
17.24
http://prophesy.cs.tamu.edu
Random (D) Random(N)
Load (D)
Load (N)
SRT (D)
SRT (N)
33
Results – 3 Servers
Average Service Response Time - AGENT
3000
2500
Netw ork Delay
Server Access Time
msec
2000
1500
1000
3/0/0 D
http://prophesy.cs.tamu.edu
3/0/1 D
3/0/2 N
3/0/3 N
3/2/0 N
3/2/1 N
Loner
Tina
Interact
Loner
Tina
Interact
Loner
Tina
Interact
Loner
Tina
Interact
Loner
Tina
Interact
Loner
Tina
Interact
Loner
Tina
Interact
0
Loner
Tina
Interact
500
3/2/2 D 3/2/3
D
34
Results – 2 Servers
Concept
SRT-LOAD (%)
SRT-RANDOM (%)
3/0/0 D
3.13
4.03
3/0/1 D
4.26
5.97
3/0/2 D
7.02
8.28
3/0/3 D
8.64
9.02
3/1/0 D
3.25
4.94
3/1/1 D
3.27
4.10
3/1/2 D
3.93
5.97
3/1/3 D
3.64
4.08
3/2/0 D
3.15
3.32
3/2/1 D
4.39
5.20
3/2/2 D
5.80
5.97
3/2/3 D
6.52
6.95
3/3/0 D
4.39
5.64
3/3/1 D
4.16
5.20
3/3/2 D
4.81
5.73
3/3/3 D
5.02
5.58
Site Selection Distribution (DAY)
100%
75%
Tina
Interact
50%
25%
0%
AVERAGE
4.71
http://prophesy.cs.tamu.edu
5.62
Random
Load
SRT
35
Summary


Prophesy
Two case studies with resource allocation
• Geo LIGO: on average 33% better than loadbased selection
• AADMLSS: on average 4-11% better than loadbased selection

Future work
• Continue extending application base
• Work on queue wait time predictions
http://prophesy.cs.tamu.edu
36
Performance Analysis Projects

Prophesy
• http://prophesy.cs.tamu.edu
• Published over 20 conference and journal papers

PAPI
• http://icl.cs.utk.edu/papi/

SCALEA-G
• http://www.dps.uibk.ac.at/projects/scaleag/

PerfTrack
• http://web.cecs.pdx.edu/~karavan/perftrack

Paradyn
• http://www.cs.wisc.edu/~paradyn/

Network Weather Service
• http://nws.cs.ucsb.edu
http://prophesy.cs.tamu.edu
37
Download