Performance - Directed Resource Allocation Seung-Hye Jang, Xingfu Wu, Valerie Taylor

advertisement
Performance - Directed
Resource Allocation
Seung-Hye Jang, Xingfu Wu, Valerie Taylor
Department of Computer Science
Texas A&M University
The International Grid Performance Workshop,
Edinburgh, UK, June 22-23, 2005
Outline



Motivation & Goals
Performance Prediction: Prophesy
Case Studies

Grid Physics Network (GriPhyN)



Grid2003 infrastructure
GEO LIGO pulsar search
Educational Application


Multiple servers
ADDAMLSS
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
2
Motivation
LHC Simulations
Galaxy Formation Simulation
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
3
Distributed Systems are available…
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
4
Goals

To efficiently map jobs to appropriate
resources


To present a resource planner that uses
performance prediction based upon historical
data
To select resources to reduce the execution
time of the application
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
5
Performance Prediction

Prophesy (http://prophesy.cs.tamu.edu)

Performance analysis and modeling of
parallel and distributed applications

Three main components:



PAIDE System
Database
Model builder
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
6
Prophesy System
Web-based Prophesy GUI
Profiling &
Instrumentation
Template
Database
Performance
Database
Actual
Model
Builder
Symbolic
Predictor
Execution
Systems
Database
DATA
COLLECTION
http://prophesy.cs.tamu.edu
DATABASES
DATA
ANALYSIS
The International Grid Performance Workshop, June 23, 2005
7
Prophesy Model Builder

Utilize historical information in the Prophesy databases




Performance database
Template database
System database
Three techniques

Curve Fitting



Parameterization



Easy to generate the model
Very few exposed parameters
Requires one-time manual analysis
Exposes many parameters , Explore different system scenarios
Coupling


Represents application performance in terms of its kernels and
components
Builds upon previous techniques – c fit for each kernel and
combines them into polynomial function
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
8
Curve Fitting: Usage
Analytical Equation
(Octave: LSF)
Matrix-matrix multiply:
LSF : 3
Performance
Data
Model
Template
http://prophesy.cs.tamu.edu
Application
Performance
Function
Performance
Basic Unit
Performance
Data
Structure
Performance
The International Grid Performance Workshop, June 23, 2005
9
Matrix-matrix multiplication, 16P, IBM SP
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
10
Case Study 1: GriPhyN
Transform
using VDL
Chimera
Virtual Data
System
Grid Middleware
Submission
Resource
Selection
Prophesy
Ganglia
Monitoring
GRID
2003
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
11
Case Study 1: GriPhyN
Transform
using VDL
Chimera
Virtual Data
System
Grid Middleware
Submission
Resource
Selection
Prophesy
Ganglia
Monitoring
GRID
2003
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
12
Resource Selector
Prophesy
Application Name
Input
Parameters,
List of available
sites
http://prophesy.cs.tamu.edu
Interface
Predictor
Rankings of sites
Weights of each
site
The International Grid Performance Workshop, June 23, 2005
13
Application
- GEO LIGO Pulsar Search 
The pulsar search is a
process of finding celestial
objects that may emit
gravitational waves

GEO (German-English
Observatory) LIGO (Laser
Interferometer Gravitationalwave Observatory) pulsar
search is the most frequent
coherent search method that
generates F-statistic for
known pulsars
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
14
Grid2003 Testbed
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
15
Execution Environment
Site Name
CPUs
Compute Nodes
Batch
Processors
Cache Size
Memory
alliance.unm.edu (UNM)
436
PBS
1 X PIII 731 GHz
256 KB
1 GB
atlas.iu.edu (IU)
400
PBS
2 X Intel Xeon 2.4 GHz
512 KB
2.5 GB
pdsfgrid3.nersc.gov (PDSF)
349
LSF
2 X PIII 650-1.8 GHz
2 X AMD 2100+ - 2600+
256 KB
2 GB
atlas.dpcc.uta.edu (UTA)
158
PBS
2 X Intel Xeon 2.4 – 2.6
GHz
512 KB
2 GB
nest.phys.uwm.edu (UWM)
296
CONDOR
1 X PIII 1GHz
256 KB
0.5 GB
boomer1.oscer.ou.edu (OU)
286
PBS
3 X Intel Xeon 2 GHz
512 KB
2 GB
cmsgrid.hep.wisc.edu
(UWMadison)
64
CONDOR
1 X Intel Xeon 2.8 GHz
512 KB
2 GB
cluster28.knu.ac.kr (KNU)
104
CONDOR
1 X AMD Athlon XP 1700+
256 KB
0.8 GB
PBS
1 X Intel Xeon 1.6 GHz
256 KB
3.7 GB
acdc.ccr.buffalo.edu (Ubuffalo) 74
Selected Grid2003 Sites (Batch: local scheduler, VO: Virtual Organization)
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
16
Comparison

Load-based selection method



Ganglia [1] to monitor the Grid2003
Selects the least loaded site
Random selection method
[1] Ganglia, http://ganglia.sourceforge.net/.
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
17
Experimental Results
Parameters
Prediction-based
Site
Load-based
Time
(sec)
Site
Time (sec)
Random
Alpha
Freq
Error
Selected Site
0.0065
0.002
PDSF
3863.66
UWMadison
9435.80
59.05%
UWMilwaukee
0.0085
0.001
IU
2850.39
UWMadison
11360.28
74.91%
0.0075
0.009
IU
22090.17
PDSF
20197.88
0.0055
0.009
IU
16216.25
UTA
0.0005
0.009
PDSF
1365.51
0.0075
0.003
PDSF
6723.30
0.0065
0.007
PDSF
13561.01
0.0085
0.004
PDSF
0.0035
0.005
PDSF
0.0065
0.009
IU
0.0045
0.009
0.0085
0.009
Time
(sec)
Error
48065.83
60.09%
KNU
7676.56
62.87%
-9.37%
UNM
77298.13
71.42%
27412.45
40.84%
UWMadison
31555.10
48.61%
Ubuffalo
3226.00
57.67%
UWMilwaukee
16009.82
91.47%
IU
7343.37
8.44%
KNU
8287.77
18.88%
PDSF
13561.01
0.00%
UNM
52379.31
74.65%
10121.27
Ubuffalo
19649.22
48.49%
IU
11158.72
9.30%
5241.28
Ubuffalo
20799.05
74.80%
UWM
51936.49
89.91%
19184.36
UWMadison
24995.94
23.25%
OU
23441.16
18.16%
IU
13278.68
UTA
20453.30
35.08%
UWMadison
14137.44
6.07%
IU
25021.39
UWMadison
26246.68
4.67%
OU
31538.22
20.66%
Average
http://prophesy.cs.tamu.edu
32.62%
The International Grid Performance Workshop, June 23, 2005
52.01%
18
Case Study 2: AADMLSS
African American Distributed Multiple Learning Styles System (AADMLSS) developed by Dr. Juan E. Gilbert
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
19
Site Selection Process
User logs
into AADMLSS
YES
NO
Valid Username
and Password?
Measure Network
Performance
First time
access?
YES
NO
Get last concept
Get default concept
Measure Server
Performance
Display Concept
NO
Pass Quiz?
NO
YES
Current concept
(different instructor)
Select server with best
overall site performance
Next concept
(same instructor)
Exit?
YES
User logs out
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
20
Testbed Overview
CATEGORY
Hardware
SPECS
Loner
(TX)
Tina (MA)
Interact
(AL)
CPU Speed
(MHz)
997.62
1993.56
697.87
Bus Speed
(MB/s)
205
638
214
Memory
(MB)
256
256
256
30
40
10
Redhat
Linux 9.0
Redhat
Linux 9.0
Redhat
Linux 9.0
Web
Server
Apache
2.0
Apache
2.0
Apache
2.0
Web
Application
PHP 4.2
PHP 4.2
PHP 4.1
Hard Disk
(GB)
O/S
Software
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
21
Experimental Results
– 3 Servers Concept
SRT-LOAD (%)
SRT-RANDOM (%)
3/0/0 D
6.21
14.05
3/0/1 D
12.13
21.94
3/0/2 N
14.02
25.83
3/0/3 N
18.12
23.52
3/1/0 N
8.05
12.04
3/1/1 N
7.31
12.25
3/1/2 N
12.60
18.74
3/1/3 N
10.96
19.11
3/2/0 N
7.93
12.58
3/2/1 N
8.05
14.25
3/2/2 N
9.14
15.97
3/2/3 D
9.79
20.58
3/3/0 D
8.94
13.64
3/3/1 D
8.26
16.74
3/3/2 D
9.21
15.21
3/3/3 D
9.97
19.36
AVERAGE
10.04
17.24
http://prophesy.cs.tamu.edu
Site Selection Distribution
100%
75%
Loner
Tina
Interact
50%
25%
0%
Random (D) Random(N)
Load (D)
Load (N)
The International Grid Performance Workshop, June 23, 2005
SRT (D)
SRT (N)
22
Experimental Results
– 2 Servers (local and remote) Concept
SRT-LOAD (%)
SRT-RANDOM (%)
3/0/0 D
9.91
10.24
3/0/1 D
13.04
15.06
3/0/2 D
18.06
19.16
3/0/3 D
20.54
21.29
3/1/0 N
9.81
9.58
3/1/1 N
7.02
7.91
3/1/2 N
11.35
12.15
3/1/3 N
10.47
10.36
3/2/0 D
8.56
8.67
3/2/1 D
8.75
9.75
3/2/2 D
10.06
10.92
3/2/3 D
10.15
10.50
3/3/0 N
8.41
9.56
3/3/1 N
8.58
8.08
3/3/2 N
8.31
7.95
3/3/3 N
10.21
10.19
AVERAGE
10.83
11.34
http://prophesy.cs.tamu.edu
Site Selection Distribution
100%
75%
Loner
Interact
50%
25%
0%
Random
(D)
Random
(N)
Load (D)
Load (N)
The International Grid Performance Workshop, June 23, 2005
SRT (D)
SRT (N)
23
Experimental Results
- 2 Servers (remote and remote) Concept
SRT-LOAD (%)
SRT-RANDOM (%)
3/0/0 D
3.13
4.03
3/0/1 D
4.26
5.97
3/0/2 D
7.02
8.28
3/0/3 D
8.64
9.02
3/1/0 D
3.25
4.94
3/1/1 D
3.27
4.10
3/1/2 D
3.93
5.97
3/1/3 D
3.64
4.08
Tina
3/2/0 D
3.15
3.32
Interact
3/2/1 D
4.39
5.20
3/2/2 D
5.80
5.97
3/2/3 D
6.52
6.95
3/3/0 D
4.39
5.64
3/3/1 D
4.16
5.20
3/3/2 D
4.81
5.73
3/3/3 D
5.02
5.58
AVERAGE
4.71
5.62
http://prophesy.cs.tamu.edu
Site Selection Distribution (DAY)
100%
75%
50%
25%
0%
Random
Load
The International Grid Performance Workshop, June 23, 2005
SRT
24
Summary


Presented a Resource selection method based
upon performance predictions
Illustrated the advantages of using performance
predictions using two case studies

Large scale scientific application GEO LIGO on
Grid2003



An average 33% better than load-based selection
Now considering queue wait time predictions
AADMLSS on 3 servers

An average 10% better than load-based selection
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
25
Thanks!
Questions?
http://prophesy.cs.tamu.edu
Seung-Hye Jang (jangs@cs.tamu.edu)
http://prophesy.cs.tamu.edu
The International Grid Performance Workshop, June 23, 2005
26
Download