Workflow Management and AI Planning for Grid applications Outline Jim Blythe,

advertisement
Workflow Management
and AI Planning for Grid applications
Jim Blythe,
(joint work with Ewa Deelman and Yolanda Gil)
Outline
„
Motivation
¾
¾
„
Research on workflow planning at USC/ISI
¾
„
Workflows in grid-using communities
Challenges in supporting workflow management
Using AI techniques in Pegasus to generate executable grid
workflows
Current and future directions
¾
¾
¾
¾
Working with varying levels of information
Resource allocation in workflows
Intelligent interactive assistance and automatic completion
Cognitive grids
USC INFORMATION SCIENCES INSTITUTE
2
Workflows in grid communities
Models composed into end-to-end workflows that model/analyze
complex phenomena or interactions
„
¾
¾
¾
Reproducibility, reusability, pedigree
„
UTM
(, , , )
In-silico experimentation
Data collection and analysis
Encode community practice
UTM
Converter
(get-Lat-Longgiven-UTM)
Task Result: Hazard curve: SA vs.
prob. exc.
Lat.
long
PEER-Fault
Gaussian Dist
No Truncation
Total Moment
Rate
Duration-Year
Fault-Grid-Spacing
Rupture Offset
Mag-Length-sigma
Dip
Rake
Ruptures
rfml
Ruptures
Magnitude (min)
Rupture
Magnitude (max)
Magnitude (mean)
CVM-getVelocityat-point
Lat
Long.
Lat
Long.
Basin-Depth
Calculator
Velocity
Hazard curve: SA
vs. prob. exc.
Hazard Curve
Calculator: SA
vs. prob. exc.
Lat
Long.
SA exc.
probs.
Site VS30
Site Basin-Depth-2.5
Basin-Depth
SA Period
Gaussian
Truncation
Field
(2000)
IMR: SA
exc. prob.
rfml
SA exc.
prob.
Std. Dev. Type
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
3
Example: LIGO Experiment
(Laser Interferometer Gravitational-Wave Observatory)
„
„
Can be used to detect astronomical
objects such as pulsars
Two installations: in Louisiana
(Livingston) and Washington State
¾
„
„
Other projects: Virgo (Italy), GEO
(Germany), Tama (Japan)
Data collected during experiments is a collection of
time series (multi-channel)
Analysis is performed in time and Fourier domains
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
4
Interferom
eter
LIGO’s Pulsar Search
(Laser Interferometer Gravitational-wave Observatory)
archive
Extract
channel
Long time frames
raw channels
Single Frame
Extract
frequency
range
Short
Fourier
Transform
transpose
30 minutes
Short time frames
Time-frequency
Image
Construct
image
Hz
USC INFORMATION SCIENCES INSTITUTE
Find Candidate
Store
Time
event
DB
Jim Blythe, NeSC 7/04
5
Using Today’s Grid
„
Users have high level requirements naturally stated in terms of
the application domain
¾
„
Users have to turn these requirements into executable job
workflows in detailed scripts
¾
¾
„
Ex: Obtain frequency spectrum for signal S in instrument I and
timeframe T
Which code generates desired products? which files contain it?
Where are the files? Which hosts that support execution given code
requirements? Are they available to me? What are the access
policies? etc...
Users must query Grid middleware: metadata catalog, replica
locator, resource descriptor and monitoring, etc.
Users must oversee execution
¾
Diagnose failures, design recovery strategies
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
6
Challenges
„
„
„
Usability: users must be proficient in grid computing
Complexity: many interrelated choices and dead ends
Solution cost (quality): evaluate alternatives
¾
„
Global cost: minimize cost across organizations
¾
„
Performance, reliability, resource usage
May have contention over resources or collaboration on tasks
Reliability of execution: job resubmission upon failure
vs flexible recovery vs anticipation and avoidance
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
7
Outline
„
Motivation
¾
¾
„
Research on workflow planning at USC/ISI
¾
„
Scientific workflows
Challenges in supporting workflow management
Using AI techniques in Pegasus to generate executable grid
workflows
Current and future directions
¾
¾
¾
¾
Working with varying levels of information
Resource allocation in workflows
Intelligent interactive assistance and automatic completion
Cognitive grids
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
8
Desiderata for workflow generator
„
Allow users to refer to data requirements by
descriptions, not file names
¾
„
Model variety of constraints declaratively
¾
„
Intuitive, requires far less input
Data dependencies, resource constraints, user access
rights, ….
Seek high quality workflows according to
customizable metrics
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
9
Abstract
Workflow
Generation
Concrete
Workflow
Generation
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
10
Pegasus workflow
environment
Grid
USC INFORMATION SCIENCES INSTITUTE
Built on Globus
toolkit
Jim Blythe, NeSC 7/04
11
Jim Blythe, NeSC 7/04
12
Grid
USC INFORMATION SCIENCES INSTITUTE
Pegasus: Using AI Planning Techniques
to Generate Executable Grid Workflows
„
Given: desired result and constraints
¾
¾
¾
¾
„
Find: an executable job workflow
¾
¾
„
A desired result (high-level, metadata description)
A set of application components described in the grid [poss through TC]
A set of resources in the grid (dynamic, distributed) [through MDS, NWS]
A set of constraints and preferences on solution quality
A configuration of components that generates the desired result
A specification of resources where components can be executed and
data can be stored
Approach: Use AI planning techniques to search the solution space
and evaluate tradeoffs
¾
Exploit heuristics to direct the search for solutions and represent
optimality and policy criteria
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
13
Workflow Generation as
AI Planning
Goal (Provided by the user)
„ A specification of the information the user requires and the desired
location for the output file
Initial State (Automatically extracted from Grid environment)
Information about the state of the Grid, Information about data
location
„
Operators (Encoded for the application domain)
Represent the execution of a component at a particular location
and the generation a particular file(s)
„ File movements across the network
„
Heuristics as search control rules (Grid or application specific)
specify options that should be considered at a choice point in the
search algorithm (e.g., execute “close” to the data)
„
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
14
Advantages of Using AI Planning
„
„
„
„
„
„
„
„
Provide broad-base, generic foundation
Use general techniques to search for solutions
Explores alternatives, supports backtracking
Incorporates domain-specific and domainindependent heuristics (as search control rules)
Allow easy addition of new constraints and rules
Incorporate optimality and policy into the search for
solutions
Interleave decisions at various levels
Can integrate the generation of workflows across
users and policies within virtual orgs.
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
15
Chimera for abstract workflow generation
„
Chimera
¾
Input-ouput transforms for files, in ‘Virtual Data Language’:
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},
b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd"},
t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",
fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",
fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
16
Ground-level planning operator
with opaque file names
(operator pulsar-search
User can say: I want the file called
(preconds
“H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”))
(
(<start-time> 7143800)
(<channel> LSC-AS-Q)
(<fcenter> 0.5)
(<right-ascension> 50)
(<sample-rate> 20)
…)
(and
(created “H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd”))
(effects
()
( (add
(created “H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”))
)
))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
17
General operator
with metadata parameters
(operator pulsar-search
(preconds
(
(effects
(<start-time> Number)
()
(<channel> Channel)
(
(<fcenter> Number)
(add (created <file>))
(<right-ascension> Number)
(<sample-rate> Number)
(add (pulsar <start-time> <end-time> <channel>
(<file> File-Handle)
<instrument> <format>
;; These two are parameters for the frequency-extract.
<fcenter> <fband>
(<f0> (and Number (get-low-freq-from-center-and-band
<fderv1> <fderv2> <fderv3> <fderv4> <fderv5>
<fcenter> <fband>)))
<right-ascension> <declination> <sample-rate>
(<fN> (and Number (get-high-freq-from-center-and-band
<file>))
<fcenter> <fband>)))
)
…)
))
(and
(forall ((<sub-sft-file-group>
(and File-Group-Handle
(gen-sub-sft-range-for-pulsar-search
<f0> <fN> <start-time> <end-time>
<sub-sft-file-group>))))
User can say: I want the results
(and (sub-sft-group <start-time> <end-time>
of a pulsar search at this
<channel> <instrument> <format>
location and time
<f0> <fN> <sample-rate> <sub-sft-file-group>)
))))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
18
Operator with host identified
(operator pulsar-search
(preconds
((<host> (or Condor-pool Mpi))
(effects
(<start-time> Number)
()
(<channel> Channel)
(
(<fcenter> Number)
(add (created <file>))
(<right-ascension> Number)
(add (at <file> <host>))
(<sample-rate> Number)
(add (pulsar <start-time> <end-time> <channel>
(<file> File-Handle)
<instrument> <format>
;; These two are parameters for the frequency-extract.
<fcenter> <fband>
(<f0> (and Number (get-low-freq-from-center-and-band
<fderv1> <fderv2> <fderv3> <fderv4> <fderv5>
<fcenter> <fband>)))
<right-ascension>
<declination> <sample-rate>
(<fN> (and Number (get-high-freq-from-center-and-band
<file>))
<fcenter> <fband>)))
)
(<run-time> (and Number
))
(estimate-pulsar-search-run-time
<start-time> <end-time> <sample-rate>
<f0> <fN> <host> <run-time>)))
…)
(and (available pulsar-search <host>)
(forall ((<sub-sft-file-group>
Result is executable
(and File-Group-Handle
(gen-sub-sft-range-for-pulsar-search
<f0> <fN> <start-time> <end-time>
<sub-sft-file-group>))))
(and (sub-sft-group <start-time> <end-time>
<channel> <instrument> <format>
<f0> <fN> <sample-rate> <sub-sft-file-group>)
(at <sub-sft-file-group> <host>)))))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
19
Planning for workflow generation
„
Application components as operators
¾
¾
All needed parameters are specified
Parameters include host: plan is a concrete workflow
„
Desired data (in descriptive form) as goals
„
Preconditions combine
¾
¾
„
data dependencies – derive input requirements from outputs
Task constraints – e.g. component must be run on an MPI machine
World state should include available hosts, existing data
products, network bandwidths, …
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
20
Plan quality
„
Objective function may include
¾
¾
¾
Performance – expected runtime, variance
Reliability – probability of failure, expected number
of retries
Computational cost – use of ‘expensive’ resources,
conformance to policies
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
21
Search control knowledge
(control-rule only-transfer-from-loc-with-greatest-bandwidth
(if (and (considering transfer-file)
Grid-specific
(trying-to-achieve (at ?file ?dest))
(currently (at ?file ?loc1))
(currently (at ?file ?loc2))
(higher-bandwidth ?loc1 ?loc2 ?dest)))
(then reject value ?loc2 as source))
Domain-specific
(control-rule prefer-mpi-to-condor-for-pulsar-search
(if (and (considering pulsar-search)
(type-of ?mpi Mpi)
(type-of ?condor Condor-pool)))
(then prefer value ?mpi as host to ?condor as host))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
22
Using local heuristics and global metrics
„
Need local heuristics since search space is
intractable
¾
„
e.g. prefer host for program with high-bandwidth connection
to where the output is required
Need to test a global metric (e.g. overall runtime)
since local heuristics can lead to globally poor
solution
¾
¾
Create as many plans as possible, return best
Search control to eliminate redundant solutions
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
23
Searching for Pulsars with the Pegasus
Planner
„
„
„
„
„
Used AI planning techniques to
compose executable grid
workflows with hundreds of jobs
Laser-Interferometer Gravitational
Wave Observatory (LIGO) data,
which aims to detect waves
predicted by Einstein’s theory of
relativity
Used LIGO’s data collected during
the first scientific run of the
instruments in Fall 2002
Targeted a set of 1000 locations of
known pulsars as well as random
locations in the sky
Performed using compute and
storage resources at Caltech,
University of Southern California,
and University of Wisconsin
Milwaukee.
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
24
Sample Pulsar Search Results
SC 2002:
„ Over 58 pulsar searches
„ Total of
¾
¾
¾
„
Fall 2002:
185 pulsar searches
„ Total of
„
330 tasks
469 data transfers
330 output files produced.
The total runtime was
11:24:35.
¾
¾
¾
„
975 tasks
1365 data transfers
975 output files
Total runtime
96:49:47
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
25
Pegasus:
Status and Ongoing Work
„
„
Fully automated generation of executable grid workflows
Heuristic state-space search AI planner
¾
¾
„
Integration with grid environment
¾
¾
„
Initially application and resource information populated manually
Work almost completed to do so automatically
Exploring tradeoffs and optimization
¾
¾
„
Prodigy [Veloso et al 94]
Expressive language for control rules and heuristic estimation
Current heuristics address minimal execution time
Adding criteria for resource and replica selection
If components are (well) described, AI planner can
select application components and generate the entire
workflow from scratch
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
26
Many areas of planning and scheduling
research relevant for grid problems
„
„
„
„
„
Planning for a dynamic environment: plan monitoring
and repair, planning under uncertainty
Scheduling: resource reasoning, temporal reasoning
Plan quality: learning, acquiring preferences, local
search planning
Planning for information gathering: integrating access
to grid services with workflow creation
Domain modeling: merging ontologies, acquiring
metadata descriptions, acquiring operators
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
27
Outline
„
Motivation
¾
¾
„
Research on workflow planning at USC/ISI
¾
„
Scientific workflows
Challenges and opportunities for Artificial Intelligence
Using AI techniques in Pegasus to generate executable grid
workflows
Current and future directions
¾
¾
¾
¾
Working with varying levels of information
Resource allocation in workflows
Intelligent interactive assistance and automatic completion
Cognitive grids
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
28
Astronomy
„
Galaxy Morphology (National
Virtual Observatory)
¾
¾
¾
¾
Investigates the dynamical
state of galaxy clusters
Explores galaxy evolution inside
the context of large-scale
structure.
Uses galaxy morphologies as a
probe of the star formation and
stellar distribution history of the
galaxies inside the clusters.
Data intensive computations
involving hundreds of galaxies
in a cluster
The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the
positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue
dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the
most symmetric, indicative of elliptical galaxies, are concentrated more toward the center.
People involved: Gurmeet Singh, Mei-Hui Su, many others
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
29
Astronomy
•Sloan Digital Sky Survey (GriPhyN project, Fermi, ANL)
•finding clusters of galaxies from the Sloan Digital Sky
Survey database of galaxies.
„
„
Montage (NASA and NVO)
Deliver science-grade
custom mosaics on demand
¾
¾
Produce mosaics from a
wide range of data sources
(possibly in different
spectra)
User-specified parameters
of projection, coordinates,
size, rotation and spatial
sampling.
Mosaic created by Pegasus-based Montage from a
run of the M101 galaxy images on the Teragrid.
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
30
BLAST: set of sequence comparison algorithms that are used to search sequence
databases for optimal local alignments to a query
2 major runs were performed using
Chimera and Pegasus:
1) 60 genomes (4,000 sequences each),
In 24 hours processed Genomes selected
from DOE-sponsored sequencing
projects
67 CPU-days of processing time
delivered
~ 10,000 Grid jobs
>200,000 BLAST executions
50 GB of data generated
2) 450 genomes processed
Speedup of 5-20 times were achieved
because we used the compute nodes
efficiently by keeping the submission
of the jobs to the compute cluster
constant.
USC by
INFORMATION
SCIENCES INSTITUTE
Lead
Veronika Nefedova
(ANL) as part of the Paci Data Quest Expedition
program
Jim Blythe, NeSC
7/04
31
Biology Applications (cont’d)
Tomography (NIH-funded project)
„ Derivation of 3D structure from a
series of 2D electron microscopic
projection images,
„ Reconstruction and detailed structural
analysis
¾
¾
complex structures like synapses
large structures like dendritic spines.
„
Acquisition and generation of huge
amounts of data
„ Large amount of state-of-the-art image
processing required to segment
structures from extraneous
background.
Dendrite structure to be rendered by
Tomography
Work performed by Mei Hui-Su with Mark Ellisman, Steve Peltier, Abel Lin, Thomas
Molina (SDSC)
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
32
Domain modeling
Current system:
Knowledge from several
sources must be used
task
requirements
available
resources
resource
policies
Info from Grid services
(RLS, MCS etc)
existing data
in files
Comp.
selector
Resource
selector
Exec.
monitor
User
policies
Resource
queues
State info
(files, resources)
Monolithic planner
KBs combined
in one location
Concrete tasks
Network
bandwidth
USC INFORMATION SCIENCES INSTITUTE
Grid task schedulers
Jim Blythe, NeSC 7/04
33
Workflow planning:
types of knowledge used
„ Knowledge about application components and hosts
¾ Constraints on appropriate hosts for components
¾ Explicit preferences for workflow construction search
„
Knowledge about data
¾
¾
¾
Input-output conditions for components
Requires sufficient information for regression through
workflow
Focused file semantics
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
34
Automatically generated operators for
several application domains
{
Digital sky survey
LIGO
GEO
Galaxy morphology
Tomography
task
resource
requirements
data
dependencies
(VDL*)
(Operator …
(preconditions
..
))
policies
(effects
..
))
Investigating patterns of data descriptions for
more efficient planning
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
35
Planning under a range of knowledge
conditions
The LIGO application --- one point on a spectrum of
domains characterized by available knowledge:
Representation of
dependencies
More knowledge
Metadata describes
sets or parts of files
Metadata:
1-1 per file
Less knowledge
Logical files
USC INFORMATION SCIENCES INSTITUTE
Ligo
Planning techniques
possible
Good operator generality
and efficient
Good operator generality,
may be inefficient
Regression possible,
Poor operator generality
Jim Blythe, NeSC 7/04
36
Representing appropriate units of
information with metadata
„
e.g. One domain has 60,000 files, want to allocate 60
tasks each dealing with 1,000 files.
In Chimera’s VDL, application components specified in
terms of specific files:
1000 files
DV run59000->extractSFTData(
„
input=[@{input:“nSFT.59000"},…,@{input:”nSFT.59999”}],
output=[@{output:” eSFT.59000”},…,@{output:”eSFT.59999”}],
t1="714384000", t2="714384063", freq=“1008”,band=“4”,instrument="H2");
… 59 similar clauses…
60000 files
DV final->computeFStatistic(
input=[@{input:”eSFT.00000”},…,@{input:”eSFT.59999”}],…);
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
37
Reasoning with metadata about sets of
files
„
Replace with one operator, two input predicates
¾
¾
A metadata predicate now represents a range of files
Simpler to model, greater generality, more efficient for reasoner
(operator extractSFTData-range
(params <begin-file> <number-of-files>)
(preconds
((<begin-file> Number)
(<number-of-files> (and Number
(> <number-of-files> 0)
(<= <number-of-files> (max-files extractSFT))))
(created-range noiseSFT <begin-file> 2 1 <number-of-files>))
(effects ()
((add (created-range extractSFT <begin-file> 2 1 <number-of-files>)))))
Need to reify the metadata type
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
38
Requires library operators for ranges
„ e.g.
if a range of files exists, then so does
any subrange
„ Identifying a sufficient set of operators:
¾ split-range, (create a range by creating smaller ranges)
¾ verify-range-from-files, (f1, f2, .., fn => (range 1 n))
¾ subranges-exist, ((range 1 10) => (range 2 5))
¾ empty-range-exists
¾ join-ranges..
More details: Blythe et al. IEEE Int. Sys 04
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
39
Outline
„
Motivation
¾
¾
„
Research on workflow planning at USC/ISI
¾
„
Scientific workflows
Challenges in supporting workflow management
Using AI techniques in Pegasus to generate executable grid
workflows
Current and future directions
¾
¾
¾
¾
Working with varying levels of information
Resource allocation in workflows
Intelligent interactive assistance and automatic completion
Cognitive grids
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
40
A small workflow from the Montage domain
1202 nodes
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
41
Issues in resource allocation
„
Reasoning about the whole workflow versus limited
lookahead
¾
¾
„
May need to consider whole workflow for good allocation if
later tasks are more constrained
Approximations may be necessary for large workflows.
e.g. Limited lookahead, task aggregation, other approaches..
Developed Grasp approach (greedy randomized
adaptive search), with S. Jain
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
42
e.g. Grasp-based approach
versus min-min and max-min
obs Workfl Random Min-Min
Max-Min
Grasp
Grasp ovement factor
Makespan MS+Idle
CF: 0.1 DMF: 100
446
nous case
1705
nous case
272
363
234
270
166
212
177
194
1.409639
1.391753
CF: 0.1 DMF: 1000
nous case 12107
3383
nous case
867
901
1557
1550
833
771
658
710
1.317629
1.269014
CF: 0.1 DMF: 10000
nous case 122956
nous case 26028
10219
10565
10213
8731
5228
5266
5373
5425
1.95352
1.657995
CF: 1.0 DMF: 100
1514
nous case
nous case 15334
1946
2321
2280
3512
1604
1544
1030
1235
1.88932
1.879352
CF: 10.0 DMF: 100
nous case 14493
nous case 154190
11919
14499
14174
17464
10115
13996
9820
11497
1.213747
1.261112
CF: 100.0 DMF: 100
nous case 145473
nous case 1537510
101672
117895
105873
119980
101844
126548
101448
112551
1.002208
1.047481
USC INFORMATION SCIENCES INSTITUTE
In Montage, Grasp
approach always
best, by up to 2x in
many scenarios
Related work:
McCallum and
Levine using ANTS
Jim Blythe, NeSC 7/04
43
The Process of Creating an Executable
Workflow
User guided
1. Creating a valid workflow template (human guided)
¾
¾
Selecting application components and connecting inputs and
outputs
Adding other steps for data conversions/transformations
2. Creating instantiated workflow
¾
Providing input data to pathway inputs (logical assignments)
3. Creating executable workflow (automatically)
¾
¾
¾
Given requirements of each model, find and assign adequate
resources for each model
Select physical locations for logical names
Include data movement steps, including data deposition steps
Automated
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
44
CAT: Composition Analysis Tool
to Create Workflow Templates
J. Kim, M. Spraragen,
Y. Gil, IUI 04,
ICAPS WS 04
Declarative descriptions
of models are linked to
ontologies and reasoners
System reasons about model
constraints and points out
errors and fixes
User builds a workflow
specification from library of
models
System guarantees
correctness of workflow
templates
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
45
Combining CAT with planning for
workflows [w M. Spraragen]
„
User creates initial workflow using CAT
„
AI planner treats CAT workflow as a template to
follow in re-generating a valid workflow
¾
¾
¾
„
Follows CAT structure where possible
Seamlessly completes and/or corrects workflow
c.f. derivational analogy [e.g. Veloso 94]
Changes to workflow presented to user as
suggestions
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
46
Pervasive Knowledge Sources and Reasoners
(work with J. Blythe, E. Deelman, C. Kesselman, H.
Tangmurarunkit)
[Gil et al, IEEE IS 04]
High-level specification of
desired results, constraints,
requirements, user policies
Resource
KB
Resource
Indexes
Application
KB
Policy
Management
Workflow
Workflow
history
Workflow
history
Workflow
Refinement
History
Simulation
codes
Replica
Locators
Smart Workflow
Pool
Resource
Matching
Workflow
Repair
Community Distributed Resources
(e.g., computers, storage, network,
simulation codes, data)
Workflow Manager
Policy
KB
Other
Grid
services
Policy
Information
Services
Other
KB
Pervasive Knowledge Sources
Intelligent Reasoners
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
47
Cognitive Grids: Pervasive Semantic
Representations of the Environment at all Levels
User and VO policy
models
Application Component
Models
Semantics for
File-based data
Users and Applications
High-level
Request
descriptions
Current Request Status, Results,
Provenance Information
Intelligent Reasoners (matchmaking, refinement, repair, coordination, negotiation…)
Refined Workflow
Policy Knowledgebases
Provenance and
Monitoring
Resource Knowledgebases
Higher-Level Service (Virtual Data Tools, Resource Brokers)
Tasks
Monitoring, Resources
knowledge
Resource Policy
Descriptions
Semantic Resource
Descriptions
Basic Grid Middleware (Globus Toolkit, Condor-G, DAGMan)
Grid Resources (Compute, Data, Network)
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
48
Cognitive Grids: Distributed Intelligent Reasoners
that Incrementally Generate the Workflow
User’s
Request
Workflow
refinement
Levels of
abstraction
Application
-level
knowledge
Policy
reasoner
Workflow
repair
Relevant
components
Logical
tasks
Full
abstract
workflow
Tasks
bound to
resources
and sent for
execution
Onto-based
Matchmaker
Not yet
executed
Partial
execution
executed
time
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
49
Many Opportunities for AI Techniques
The Grid Now
„
Syntax-based matchmaking of
resources to job requirements
¾
¾
„
Scheduling of jobs based on Gridable users that specify job
execution sequences and
computing requirements
¾
¾
¾
„
„
Condor matchmaker
Attribute based discovery and
selection
The Future Grid
„
¾
¾
USC INFORMATION SCIENCES INSTITUTE
Semantic matchmaking
Aggregate resource reasoning
„
Task-level reasoning to plan and
schedule jobs and resources
„
Wide range of users can specify
high level requirements in a
mixed-initiative mode
¾
Scripting languages
Workflow languages,
Task graphs
Explicit mappings from task to
jobs, simple job brokers
Explicit service negotiation and
recovery strategies
Knowledge-based reasoning about
resources enables
¾
„
More agility and coordination
Mapping of high-level
requirements to details required
for execution
End-to-end resource negotiation
and adaptive strategies to
accommodate failure
Jim Blythe, NeSC 7/04
50
Summary:
Scientific Workflows and AI
„
Clear requirement to operate in complex, humanguided, dynamic decision space
Need to support scientific exploration process
Tremendous opportunity for AI techniques: flexible
and expressive representations and reasoners
Work to date demonstrates step forward
„
Many opportunities ahead for AI!
„
„
„
¾
¾
¾
¾
¾
Pegasus can isolate users from complexities of the grid
Cognitive grids
Interactive assistance and automatic completion
Active workflows
…
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
51
Related Work
„
Improving grids with algorithmic approaches
¾
„
Improving grids with knowledge/semantics
¾
¾
„
myGrid (semantic component matching)
Semantic grid, Knowledge grid
Planning techniques for software and service composition
¾
„
GRaDS, GriPhyN (Chimera)
[Lansky et al 94] [Chien et al 96] [Golden et al 02]
[McDermott 02] [McIlraith et al 02]
ICAPS 04 Workshop on planning and scheduling for web
and grid services
http://www.isi.edu/ikcap/icaps04-ws
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
52
pegasus.isi.edu
Publications in AI forums
„
“The Role of Planning in Grid Computing” Jim Blythe, Ewa Deelman,
Yolanda Gil, Carl Kesselman, Amit Agarwal, Gaurang Mehta, Karan
Vahi. International Conference on Automated Planning and
Scheduling (ICAPS) 2003.
“Transparent Grid Computing: a Knowledge-Based Approach”
Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman. Innovative
Applications of Artificial Intelligence Conference (IAAI) 2003.
“Artificial Intelligence in Grids: Workflow Planning and Beyond”
Yolanda Gil, Ewa Deelman, Jim Blythe, Carl Kesselman, H.
Tangmurarunkit. IEEE Intelligent Systems, Jan/Feb 2004.
…
„
Publications in Grid forums
„
"Mapping Abstract Complex Workflows onto Grid Environments," Ewa
Deelman, Jim Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta,
Karan Vahi, Adam Arbree, Richard Cavanaugh, Kent Blackburn, Albert
Lazzarini, Scott Koranda. Journal of Grid Computing, Vol. 1 No. 1,
2003.
“Workflow Management in GriPhyN”, Chapter in “The Grid Resource
Management” book, E. Deelman, J. Blythe, Y. Gil, Carl Kesselman
2003.
„
„
„
„
USC INFORMATION SCIENCES INSTITUTE
„
Jim Blythe, NeSC 7/04
53
More info..
http://www.isi.edu/~blythe
http://www.isi.edu/ikcap/cognitive-grids
http://pegasus.isi.edu
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
54
Back-up slides
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
55
What makes a Grid?
„
There are three key criteria:
¾
¾
¾
„
Coordinates distributed resources,
using standard, open, general-purpose protocols and interfaces,
to deliver non-trivial qualities of service.
What is not a Grid?
¾
¾
A cluster, a network attached storage device, a scientific
instrument, a network, etc.
Each is an important component of a Grid, but by itself does not
constitute a Grid
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
56
Challenging Technical Requirements
„
„
„
„
„
„
Dynamic formation and management of virtual organizations
Discovery & online negotiation of access to services: who, what,
why, when, how
Configuration of applications and systems able to deliver
multiple qualities of service
Autonomic management of distributed infrastructures, services,
and applications
Management of distributed state
Open, extensible, evolvable infrastructure
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
57
Outline
„
„
„
„
„
„
The Grid and the Globus Toolkit
Workflow composition as an AI planning application
The GriPhyN project and Grid Applications
Pegasus, Planning for Execution in Grids
Example of AI planning for workflows in Pegasus
Recent work:
variable-grain semantics and resource allocation
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
58
Applications
„
Increasing in the level of complexity
Use of individual application components
Reuse of individual intermediate data products
Description of Data Products using Metadata Attributes
„
Execution environment is complex and very dynamic
„
„
„
¾
¾
¾
„
Resources come and go
Data is replicated
Components can be found at various locations or staged in on demand
Separation between
¾
¾
the application description
the actual execution description
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
59
GriPhyN Data Grid Challenge
„ Provide
a framework which enables Virtual Organizations around
the world to perform computationally demanding analysis of large,
geographically distributed datasets.
„ The
Virtual Organizations are large and highly distributed
„ The
datasets are large, currently on the order of Terabytes and
expected to grow to the level of 100s of Petabytes in the next
decade
„ Provide
a seamless access to data: experimental raw data or
processed data products
„ Enable
a user/application to ask for any domain-specific data,
whether computed or not
Concept of Virtual Data
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
60
Planning for workflow generation
„
Application components as operators
„
Desired data as goals
„
World state includes available hosts, existing data
products, network bandwidths, …
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
61
Jim Blythe, NeSC 7/04
62
Globus Toolkit Services
„
Security
¾
„
Resource Management
¾
„
GRAM (job scheduling to remote resources)
Information Services
¾
„
GSI (secure access to resources)
MDS (resource discovery)
Data Management
¾
¾
GridFTP (data movement)
RLS (replica management)
USC INFORMATION SCIENCES INSTITUTE
Pegasus, Planning for Execution in Grids
„
„
„
„
„
„
„
performs the mapping from an abstract workflow to a concrete workflow,
which can be executed on the Grid
isolates the user from many Grid details
automatically locates physical locations for both components
(transformations) and data, via Globus RLS and the Transformation
Catalog
finds appropriate resources to execute the components (via Globus MDS)
whenever several alternatives are possible (e.g., alternative physical files,
alternative resources) it makes a random choice (other heuristics in
progress)
publishes newly derived data products
reuses existing data products where applicable
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
63
Use of MDS in Pegasus
„
MDS provides up-to-date Grid state information
¾
¾
¾
¾
„
Can be used for resource discovery and selection
¾
„
Total and idle job queues length on a pool of resources (condor)
Total and available memory on the pool
Disk space on the pools
Number of jobs running on a job manager
Developing various task to resource mapping heuristics
Can be used to publish information necessary for replica
selection
¾
Developing replica selection components
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
64
Abstract Workflow Reduction
Job a
Job c
Job b
Job f
Job d
KEY
The original node
Job e
Input transfer node
Job g
Registration node
Job h
Output transfer node
Node deleted by Reduction
algorithm
Job i
•
•
•
The output jobs for the Dag are all the leaf nodes i.e. f, h, i.
Each job requires 2 input files and generates 2 output files.
The user specifies the output location
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
65
Making use of Virtual Data
Job c
Job a
Job b
Job f
Job d
KEY
The original node
Job e
Input transfer node
Job g
Job i
Job h
Registration node
Output transfer node
Node deleted by Reduction
algorithm
• Jobs d, e, f have output files that have been found in the
Replica Location Service.
• Additional jobs are deleted.
• All jobs (a, b, c, d, e, f) are removed from the DAG.
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
66
Planner picks execution and replica locations
Job c
Job a
adding transfer
nodes for the
input files for
the root nodes
Job b
Job f
Job d
Job e
Job g
Job h
KEY
The original node
Job i
Input transfer node
Registration node
Output transfer node
Node deleted by Reduction
algorithm
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
67
Staging data out and registering
new derived products in the RLS
Job c
Job a
Job b
Job f
Job d
Job e
Job g
Staging and
registering for
each job that
materializes data
(g, h, i ).
transferring the
output files of the
leaf job (f) to the
output location
Job h
Job i
KEY
The original node
Input transfer node
Registration node
Output transfer node
Node deleted by Reduction
algorithm
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
68
The final, executable DAG
Input DAG
Job a
Job g
Job b
Job h
Job c
Job i
Job f
Job d
Job e
Job g
KEY
The original node
Job h
Input transfer node
Registration node
Job i
Output transfer node
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
69
Pegasus Configuration
VDL
Chimera
Abstract Workflow
Logical file name
Pegasus
Concrete Workflow
Grid
DAGMan/CondorG
Tasks
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
70
Example: LIGO Experiment
(Laser Interferometer Gravitational-Wave Observatory)
„
„
Aims to detect gravitational waves predicted
by theory of relativity.
Can be used to detect
¾
¾
¾
binary pulsars
mergers of black holes
“starquakes” in neutron stars
„
Two installations: in Louisiana (Livingston) and Washington State
„
Instruments are designed to measure the effect of gravitational
waves on test masses suspended in vacuum.
Data collected during experiments is a collection of time series
(multi-channel)
Analysis is performed in time and Fourier domains
¾
„
„
Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
71
Generating the planning problem
„
Currently, static file representation for available
hosts, bandwidths
„
Query grid services prior to planning to find which
relevant files exist
¾
„
Future versions will make dynamic queries
Goal is translated from user request, plan is
translated into DAG format suitable for grid
scheduler.
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
72
Physics (GriPhyN Project)
„
High-energy physics
¾
„
CMS—collaboration with Rick Cavannaugh, UFL
– Processed simulated events
– Cluster of 25 dual-processor Pentium machines.
– Computation: 7 days, 678 jobs with 250 events each
– Produced ~ 200GB of simulated data.
¾ Atlas
Gravitational-wave science (collaboration with Bruce
Allen, A. Lazzarini and S. Koranda)
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
73
Pegasus’ Initial Solution
„
„
„
„
Reused existing data products
Used in a variety of complex applications
Provided a feasible solution, but not necessarily a lowcost one
To improve the quality of the solution:
¾ Need to efficiently search large problem space and
apply both local and global optimizations
¾ Need to compose various optimization strategies
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
74
Grid
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
75
Jim Blythe, NeSC 7/04
76
LIGO’s Pulsar Search at SC’02
¾
Used LIGO’s data collected during
the first scientific run of the
instrument
¾
Targeted a set of 1000 locations:
known pulsar or random locations
¾
Results of the analysis published
to the LIGO Scientific Collaboration
¾
Performed using LDAS and
compute and storage resources at
Caltech, University of Southern
California, University of Wisconsin
Milwaukee.
USC INFORMATION SCIENCES INSTITUTE
Fault-tolerant planning for a dynamic
environment
„
Grid resources become unavailable, queue length &
network bandwidth change
„
Exploring plan repair strategies, balance of work
done off-line and on-line
„
Modeling failures, keeping statistics for creating plans
more likely to succeed, conditional plans, ..
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
77
Fault-tolerant straw men
1.
Current version: build fully detailed plan offline,
resource allocation is fixed
„
2.
Ignores world dynamics
Build abstract plan (without specifying hosts) offline,
use a matchmaker online
„
Matchmaker makes local decisions only
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
78
Global reasoning is needed
for resource allocation
Finish
C (5)
A (3)
B (1)
Start
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
79
Approaches for fault-tolerant planning in
dynamic domains
„
RAX (Jonsson et al.) general framework. As implemented:
offline: builds complete plan
online: adjusts temporal intervals
„
Combining planning and scheduling
offline: build several abstract plans
online: reason about critical path to instantiate each plan
„
MDP/POMDP approaches
„
Open area..
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
80
Summary: benefits of planning
„
Automating workflow composition
¾
„
Reasoning with explicit descriptions of data
¾
¾
„
Beginning to be addressed in Grid middleware
More intuitive for users
Far fewer inputs required than at file level
Better workflows by searching many plans
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
81
Challenge: understanding when different
approaches are more important
„
„
Hypotheses:
¾
Uneven task distribution, in terms of computational and data
expense and resource constraints will indicate global
planning
¾
Time-dependency, e.g. need to re-plan during execution, will
indicate local planning
Interesting project: use experiments in synthetic and
real domains to test hypotheses and uncover new
insights
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
82
Empirical tests
with synthetic LIGO problems
„
Example: Problem requires 100 files on one
machine. Vary the number that exist.
distribution - 1 machine
800
min
run-time
700
max
600
p-max
500
g-max
avg
400
90
10
0
80
70
60
50
40
30
20
10
300
no of files
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
83
Example search heuristics
(control-rule only-transfer-from-loc-with-greatest-bandwidth
(if (and (current-ops (transfer-file))
(current-goal (at <file> <dest>))
(true-in-state (at <file> <loc1>))
(true-in-state (at <file> <loc2>))
(higher-bandwidth <loc1> <loc2> <dest>)))
(then reject bindings ((<from-loc> . <loc2>))))
(control-rule prefer-mpi-to-condor-for-pulsar-search
(if (and (current-ops (pulsar-search))
(type-of <mpi> Mpi)
(type-of <condor> Condor-pool)))
(then prefer bindings ((<host> . <mpi>)) ((<host> . <condor>))))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
84
What is Needed
„
We need alternative foundations that offer
¾
¾
„
expressive representations
flexible reasoners
Many Artificial Intelligence (AI) techniques are
relevant:
¾
¾
¾
¾
¾
¾
Planning to achieve given requirements
Searching through problem spaces of related choices
Using and combining heuristics
Expressive knowledge representation languages
Reasoners that can incorporate rules, definitions, axioms,
etc.
Schedulers and resource allocation techniques
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
85
Existing tools for building workflows:
abstract workflow generation
„
Chimera
¾
Input-ouput transforms at level of actual files, in ‘Virtual Data
Language’:
DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"},
t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q",
instrument="H2");
DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"},
t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q",
instrument="H2");
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},
b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"},
t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",
fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",
fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
86
Existing tools for building workflows:
abstract workflow generation
„
Chimera
¾
Input-ouput transforms for files, in ‘Virtual Data Language’:
DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"},
t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q",
instrument="H2");
DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"},
t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q",
instrument="H2");
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},
b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"},
t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",
fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",
fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
87
Existing tools 2: concrete planner
„
„
„
„
Assigns specific hosts and data locations for tasks
Makes random selection of resources and data
Provided a feasible solution
Reused existing data products
Gridftp host://f.a ….lumpy.isi.edu/
nfs/temp/f.a
INPUT:
OUTPUT:
lumpy.isi.edu://usr/local/
bin/extract
Jet.caltech.edu://home/malcom/
resample -I /home/malcolm/F.b1
F.c1
F.c2
Concat
Data
Transfer
Nodes
Replica
Catalog
Registration
Nodes
Register /F.d at home/malcolm/f2
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
88
Sample Pulsar Search Results to Date
SC 2002 run:
„ Over 58 pulsar searches
„ Total of
¾
¾
¾
„
To date:
185 pulsar searches
„ Total of
„
330 tasks
469 data transfers
330 output files produced.
The total runtime was
11:24:35.
¾
¾
¾
„
975 tasks
1365 data transfers
975 output files
Total runtime
96:49:47
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
89
Issues we’re looking at now
„
Planning and scheduling in a dynamic environment
¾
¾
„
Planning that is adaptable to the available information
¾
„
May want to delay allocating some (but not all) later tasks until
more info available
Variable-resolution translation to planning domains
e.g. [Blythe et al. Spring 04, ICAPS WS 04]
Don’t always have regressible metadata, resource constraints
Mixed-initiative control of the planner
¾
¾
¾
People will not use a system that disallows user control
Cannot specify all the needed information for automated
planning up front
Need interfaces for user control, e.g. [Kim & Gil IUI 04]
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
90
Where does knowledge used by our
planners come from?
task
resource
requirements
data
dependencies
(VDL*)
(Operator …
(preconditions
..
))
(effects
..
))
user policies
& preferences
resource
policies
Each knowledge component is used for other purposes
beyond planning
USC INFORMATION SCIENCES INSTITUTE
„
„
Jim Blythe, NeSC 7/04
91
Question: if operators are gathered from distributed
services, can we still guarantee soundness and
completeness?
Under what kinds of conditions?
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
92
Metadata representation
„
Replace with two clauses, two input predicates
¾
¾
A predicate now represents a range of files
Simpler to model, greater generality, more efficient for reasoner
(operator run-extractSFTData-range
(preconds
((<begin-file> Number)
(<number-of-files> (and Number (> <number-of-files> 0)))
(<local-begin-file> (and Number
(gen-smaller-number <number-of-files> 1000 <begin-file>))))
(and (range "eSFT" <begin-file> 2 1 <local-begin-file>)
(range "nSFT" <local-begin-file> 2 1 999)))
(effects ()
((add (range "eSFT" <begin-file> 2 <number-of-files>)))))
Need to reify the metadata type
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
93
Requires library operators for ranges
„
E.g. if a range of files exists, then so does any subrange
„
Identifying a sufficient set of operators based on examples:
¾
collate-into-range, join-subranges, subranges-exist, empty-range-exists
(operator subranges-exist
(preconds
((<begin-file> Number)
(<type> Object)
(<number-of-files> (and Number (> <number-of-files> 0)))
(<enclosing-begin> (and Number (gen-known-enclosing-begins <type> <begin-file>
2 1 <number-of-files>)))
(<enclosing-number-of-files>
(and Number (gen-known-enclosing-number-of-files <type> <enclosing-begin>
2 1 <number-of-files>
<begin-file>))))
(created-range <type> <enclosing-begin> 2 1 <enclosing-number-of-files>))
(effects ()
((add (created-range <type> <begin-file> 2 1 <number-of-files>)))))
USC INFORMATION SCIENCES INSTITUTE
Jim Blythe, NeSC 7/04
94
What workflow planning tells us about
data and process semantics in the grid
„
Data and process semantics are closely related
„
Fuzzy boundary between data content descriptions
and provenance (or reverse provenance)
A
B
Instrument 1
C
A
Instrument 2
SFT algm 1
SFT file for range B
Using instrument 1
Using algm 2
SFT algm 2
B
C
USC INFORMATION SCIENCES INSTITUTE
Run on sft.isi.edu
Created on 9/17/03
For Jim
Jim Blythe, NeSC 7/04
95
Download