Dynamic Workflow Management Using Performance Data

advertisement
Dynamic Workflow Management
Using Performance Data
David W. Walker, Yan Huang ,
Omer F. Rana, and Lican Huang
Cardiff School of Computer Science
20 October 2006
Workflow Optimization in
Distributed Environments
Outline of Talk
• Background and introduction.
• The WOSE architecture for dynamic
Web services.
• Performance experiments and results.
• Summary and conclusions.
20 October 2006
Workflow Optimization in
Distributed Environments
The WOSE Project
• The Workflow Optimisation Services for eScience Applications (WOSE).
• Funded by EPSRC Core e-Science
Programme.
• Collaboration between:
– Cardiff University
– Imperial College (Prof John Darlington)
– Daresbury Lab (Drs Martyn Guest and Robert Allan)
20 October 2006
Workflow Optimization in
Distributed Environments
Workflow Optimisation
• Types of workflow optimisation
– Through service selection
– Through workflow re-ordering
– Through exploitation of parallelism
• When is optimisation performed?
– At design time (early binding)
– Upon submission (intermediate binding)
– At runtime (late binding)
20 October 2006
Workflow Optimization in
Distributed Environments
Service Binding Models
• Late binding of abstract service to
concrete service instance means:
– We use up-to-date information to decide
which service to use when there are.
multiple semantically equivalent services
– We are less likely to try to use a service
that is unavailable.
20 October 2006
Workflow Optimization in
Distributed Environments
Late Binding Case
• Search registry for all services that are
consistent with abstract service description.
• Select optimal service based on current
information, e.g, host load, etc.
• Execute this service. If it is not currently
available then try the next best service.
• Doesn’t take into account time to transfer
inputs to the service.
• In early and late binding cases we can
optimise overall workflow.
20 October 2006
Workflow Optimization in
Distributed Environments
WOSE Architecture
Configuration
script
User
Web service
instance
Workflow
script
Converter
ActiveBPEL
workflow engine
Discovery
Service
Registry services
(such as UDDI)
20 October 2006
Proxy
Optimization
Service
Workflow Optimization in
Distributed Environments
Work at Cardiff has
focused on implementing
a late binding model for
dynamic service
discovery, based on a
generic service proxy,
and service discovery
and optimisation
services.
History database
Performance
Monitor Service
Service Discovery Issues
• Discovery of equivalent services could
be based on:
– Service name. Applicable when all service
providers agree on the naming of services.
– Service metadata.
– Service ontology.
• So far we have used the service name.
20 October 2006
Workflow Optimization in
Distributed Environments
Performance-Based Service
Selection
• In general, “performance” could refer to:
– Service response time.
– The availability of the service.
– The accuracy of the results returned by the
service.
– The security of the service.
• In our work we have used service response
time as the basis for service selection.
• Our approach can be readily adapted for
other performance metrics.
20 October 2006
Workflow Optimization in
Distributed Environments
Estimating Service Response Time
•
Two methods for estimating the expected
service response time:
1. Based on current performance metrics from the
service hosts, e.g., load averages.
2. Based on the history of previous service
invocations on the service hosts. In general, this
requires a model that, for a given set of service
inputs on a given service host, will return the
expected service response time.
•
So far we have used current (or very recent)
performance metrics returned by the
Ganglia monitoring system.
20 October 2006
Workflow Optimization in
Distributed Environments
Estimating Service Response Time
(Continued)
•
•
•
•
•
Distributed job management systems such as
Nimrod use the rate at which a computer completes
jobs as an indicator of how “good” the computer is.
Nimrod doesn’t distinguish between different jobs.
This approach requires a substantial long-term
record of job statistics in order to give satisfactory
results.
Same approach could be applied to dynamic
invocation of Web services. This avoids need for a
performance model for each Web service.
Such an approach will sometimes make bad
decisions in individual cases, but overall should be
effective.
20 October 2006
Workflow Optimization in
Distributed Environments
WOSE Sequence Diagram
WOSE can either invoke a static
Web service directly (steps 2A
and 3A), or a dynamic Web
service (steps 2 – 11),
Workflow
script
XSLT
converter
Workflow
deploy
WOSE client
1. Request
Workflow
engine
Web service
Proxy Service
Discovery
Service
Optimisation
Service
Performance
Service
2A. Direct
invocation
3A. Direct result
3. Service query
2. Dynamic invocation through
proxy
4. List of services
5. Performance query
6. Performance data
7. List of services
9. Invoke service
8. Selected service
10. Result
12. Result
20 October 2006
11. Result through proxy
Workflow Optimization in
Distributed Environments
Dynamic Service Selection
within a Workflow
Dynamic
invocation is
worthwhile only
for sufficiently
long-running
services since the
performance
gained must offset
the overhead of
service discovery
and selection.
20 October 2006
Service A
Select from one
of the services
B1 – B5.
Proxy service
Service B1
Service B2
Service B3
Service B
Service B4
Service B5
If the selected service is not
available, WOSE will automatically
try the next best one.
Workflow Optimization in
Distributed Environments
Performance Experiments
• Is there any relationship between the current
load and service response time?
• This will depend on how variable the load is
over the duration of the service execution, as
well as how the OS schedules jobs.
• In general, we would expect the loadresponse time relationship to be stronger
when the service hosts are lightly loaded.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 1
• Try to keep load constant during service
execution by running N instances of a
long-running computation to create a
background workload.
• Then invoke Web service and measure
response time, i.e., time from invoking
dynamic service to receiving back the
result.
• The blastall Web service was used.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 1: Results
Service response time
6000
5000
4000
3000
2000
1000
0
0
2
4
6
8
Load average
20 October 2006
Workflow Optimization in
Distributed Environments
10
12
Experiment 1: Discussion
• Plot shows that a higher load average
results in a longer service response
time.
• The scatter in results for any particular
value of the load average is probably
due to the fact that the experiments
were done on a machine used by others
so we could not fully control the load.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 2
• Create a synthetic, varying background
workload.
• Then invoke Web service and measure
response time.
• The blastall Web service was used.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 2: Results
Service response time
6000
5000
4000
3000
2000
1000
0
0
5
10
Load average
20 October 2006
Workflow Optimization in
Distributed Environments
15
Experiment 2: Discussion
• Both experiments show a general tendency
for high load averages to result in longer
service response times.
• Large amount of scatter results from the fact
that the load changes while the Web service
is running.
• No method can predict what the future load
will be, and hence any method of estimating
which service host will complete execution
the soonest will give the wrong answer
sometimes.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 3
• Is selection based on the current load average
better than making a random selection?
• If services are hosted on heterogeneous
machines we all have to take into account the
processing speed.
• Thus, we base service selection on the
performance factor, P, defined as:
CPU Speed in GHz
P
Load average  1
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 3 (continued)
• Run synthetic workload on one computer.
Record service response time for several
executions of the workflow, and compute the
average.
• Run synthetic workload on N computers each
hosting the service. Run the workflow and
dynamically select the service host based on
the performance factor. Do this several times
and compute the average.
20 October 2006
Workflow Optimization in
Distributed Environments
Experiment 3: Results
• The average service response time for the
single machine was 4252 seconds.
• The average service response time when
selecting the optimal service from three hosts
was 932 seconds.
• Since all the machines used are of the same
type, this indicates the dynamic selection
based on the current load average does
result in better performance.
20 October 2006
Workflow Optimization in
Distributed Environments
Conclusions and Future Work
• Dynamic service selection based on the load and CPU
speed can result in faster execution of a workflow.
• We are currently repeating the experiments using a
service that performs a molecular dynamics simulation.
• In the future we will also investigate dynamic service
selection based on performance history data, such as
rate at which a host completes service requests.
• Would like to develop statistical model of dynamic
service selection for different types of background
workload.
20 October 2006
Workflow Optimization in
Distributed Environments
Thank you.
Questions?
20 October 2006
Workflow Optimization in
Distributed Environments
Download