Design Decisions Interoperability in a changing architecture Andrew Jones

advertisement
1
Design Decisions
Interoperability in a changing
architecture
Andrew Jones
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
2
BiodiversityWorld requirements (1)
• Biodiversity Problem Solving Environment –
• Heterogeneous diverse resources
• Facilitating integration of both legacy and newlydeveloped resources
• Flexible workflows
• Main challenges centre around metadata,
interoperability, resource discovery, etc;
• High-performance computing secondary
(though relevant)
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
3
BiodiversityWorld requirements (2)
• Distinctive features:
• a biodiversity informatics GRID
• interoperability with heterogeneous data, complex in
structure
• resilience to infrastructure change & interoperation with
other GRIDs
• interactive collaboration a secondary concern
• Assumptions about resources:
• A resource worked either:
• Essentially in ‘batch’ mode, or
• Supporting a sequence of operations on a single resource, but
involving exchange of minimal data
• Reasonable to treat each resource (including databases)
as a service offering its own, defined set of operations
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
BiodiversityWorld architectural
overview
User interface
Metadata
repository
Workflow
enactment
engine
Presentation
Native
BiodiversityWorld
Resources
BGI API
BiodiversityWorld-GRID Interface
(BGI)
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
The GRID
Andrew Jones
Interop. in changing infrastructure
Wrapped
resources
4
5
The BGI concept
• Standardised invocation mechanism
• Wrappers notionally divided into Grid-facing
and resource-facing parts
<<interface>>
1
BgiWrapperInterface
Bgi
Implementation_1
1
Bgi
Implementation_2
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
<<abstract>>
BdwAbstractWrapper
...
Concrete
Wrapper_1
Andrew
Jones
Interop. in changing infrastructure
Concrete
Wrapper_2
...
Why we protected ourselves from
‘the Grid’(!)
• Rapidly evolving standards
• Previous experience in GRAB
• Globus 2 approach needed ‘canned queries’, temporary files, etc …
unnatural for distributed request/response model
• BiodiversityWorld
• Globus and other software still evolving
• Globus 3: Grid Services; Globus 4: WSRF; …
• Trade-off: abstraction layer (BGI); invocation mechanism
• Insulates from change
• Performance penalty
• Assume computationally intensive applications lie in a single BDW
resource
• Proprietary invocation mechanism hinders interoperation with other
Grid/Web services
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
6
7
Implementations of BGI
•
•
•
•
RMI
GT3 Grid Services (incomplete)
Web services
GT4/WSRF/Grid-Service-as-portal
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
8
Benefits & limitations
• Too many standards, so we defined a new one!!
• Interoperability with other projects restricted
• Could wrap non-BDW resources, or
• Implement alternative Grid-facing “glue” replacing
invocation mechanism with some other standard
• Restrictions on highly interactive applications
• BGI OK for coarse-grained interaction; not for dynamic
interaction with potentially large data volumes
• Transmission and storage of intermediate results:
method not specified
• Can pass URI instead of data, but no specifications
restricting what this might refer to
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
9
Transmission/storage of data
• Desirable to have uniform mechanisms for
transmission and storage of data for:
• Efficient operation of workflows
• Re-use; composition of workflows
• Supporting more flexible experimentation
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
10
Are workflows sufficient for flexible
experimentation?
• Creating a workflow:
• Workflows clearly good for capturing complex tasks
• Good for ‘tweaking’ tasks
• But is this how users think?
• If not, we should provide an environment that supports a
more exploratory approach too, e.g.
• User tries out some small subtasks
• (S)he joins results together
• Builds larger workflows from fragments
• This requires recording of interactions, so re-usable
workflows can be composed
• Storage of intermediate data sets
• Provenance metadata (extending MDR)
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
11
How to achieve dynamic interaction?
• Some possibilities for future development
• Remote direct manipulation (And other remote interactions?)
• BGI not well suited to fine-grained interaction with resources
• Some resources may not be accessible except as stand-alone
• May need (less portable) ‘by-pass’ mechanisms, e.g.
• New BGI protocol
• Using existing techniques, such as VNC
• Local direct manipulation, etc.
• Achievable via component-based ‘plug-in’ approaches (e.g. using
JavaBeans), but component interface must be defined
• Requires data to be present locally; bandwidth concerns
• Some bandwidth problems can be addressed by combining local
specialised client component & remote server component (e.g. passing
vectors, not bitmaps)
• BGI may or may not be fast enough in this case
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
12
How to achieve data
transmission/intermediate result storage?
• Low level
• E.g. orchestrate facilities such as GridFTP,
GRAM, …
• Higher-level
• E.g. Inferno, SRB
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
13
Additional considerations
• Again, have problem of committing to other,
evolving standards
• Need at least a thin API layer to protect
resources from change
• And don’t want to break existing BDW system
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
More direct database exploitation
with OGSA-DAI
•
•
BioDA project is investigating relevance & suitability of
OGSA-DAI in relation to bioinformatics projects
2 main possibilities within BDW:
1. Augment BGI to support inclusion of queries in workflows and to be
sent directly to OGSA-DAI enabled databases.
•
Distributed query processing facilities could assist in planning execution
& distribution of data-orientated parts of a workflow. (For the current
status of OGSA-DQP see Section 4.)
•
•
Very major revision to BDW protocols; also,
many resources of interest are simply not exposed as databases.
2. Provide facilities within individual wrappers that benefit from OGSADAI.
•
Current exemplar (under development) takes approach (2)
…
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
14
15
BDW OGSA-DAI initial exemplar
OGSA-DAI R5 GDS
3. Invoke wrapper
1. BGI
invokeOperation ()
Wrapper Module
BDWQueryActivity
2. Create GDS
and query
Wrapper
Wrapper
Wrapper
6. url
deliverFromURL(url)
Format file (xsl)
OGSA-DAI
Client
8. getOutPut()
5. Download URL
Web DBs
7. XSL transform to BDW
format
XSLTransform
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
pull data
Andrew Jones
Interop. in changing infrastructure
4. Query
BDW OGSA-DAI exemplar
extension
OGSA-DAI R5 GDS
1. BGI
InvokeOperation ([ ])
7. XSL transform to BDW format
XSLTransform
XSLTransform
XSLTransform
8. integrate output
OGSA-DAI
Client
mergeOutput
9. To WF unit
deliverToURL /GFTP
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
16
17
Conclusions
• BDW interoperation layer designed to meet
requirements we were given
• Suitable for high-level interactions
• Not so good for dynamic interaction with resources (need
for this now generally recognised)
• Doesn’t specify how data is to be moved around
• Applicable to other domains meeting similar criteria
• Interesting possibilities for extension
• But we have achieved a sustainable architecture;
this is an important feature to retain in future
systems
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
18
Some discussion points
(Arising from Jaspreet’s and Andrew’s talks)
1. Balance of requirements for different kinds of
GRIDS –
(performance, resource discovery, sustainability, …)
– how does this affect decisions about
architectures, protocols, … ?
2. How can BDW protocols best be enhanced in
future projects?
3. How can we best achieve interoperability between
grids from different projects (including BDW)?
4. How can we make it easier for 3rd parties to
•
•
Introduce their resources to an existing
BgiWrapperService?
Develop their own additional BgiWrapperServices?
BiodiversityWorld GRID Workshop
NeSC, Edinburgh – 30 June and 1 July 2005
Andrew Jones
Interop. in changing infrastructure
Download