OGSA-DAI Users Group Meeting - 07 April 2004 ============================================

advertisement
OGSA-DAI Users Group Meeting - 07 April 2004
============================================
Notes by Mario Antonioletti and Andrew Borley.
Agenda for the day
=================
10.00 - 10.30
Registration
10.30 - 10.45
Welcome, Orientation & Vision of OGSA-DAI
Malcolm Atkinson
10.45 - 12.30
User Experiences
o Manfred Oevers - eDiamond Project.
"The Use of OGSA-DAI with DB2 Content Manager in the eDiaMoND
Project."
o Terry Sloan - FirstDIG Project.
"Using OGSA-DAI in a commercial environment"
o Noel Kelly - GeneGrid Project.
"GeneGrid: Using OGSA-DAI in Bioinformatics"
o Hans-Christian Hoppe.
"SIMDAT OGSA-DAI requirements"
o Arijit Mukherjee - myGrid Project.
"MyGrid and Gold and their use of OGSA-DAI"
o Kona Andrews - AstroGrid Project.
"AstroGrid and OGSA-DAI: Early experiences"
12.30 -13.15
13.15 - 14.00
14.00 - 15.30
OGSA-DAI team: Current Release, R5 & Road Map
Lunch
Breakout sessions
o Requirements Breakout Session
o OGSA-DAI model and infrastructure breakout session
15.30 - 15.45
Refreshment Break
15.45 - 16.15
Integration of Results from Discussions
16.15 - 16.45
General discussion of the Future of OGSA-DAI
16.45
Departures
-----Slides for this event are available from:
http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=401
or
http://www.ogsadai.org.uk/docs/docs.php#ug1
-----Malcolm Atkinson starts off the meeting.
Have two out of the three PIs for OGSA-DAI in the room.
We want people to feedback about the experiences of OGSA-DAI as it is.
Explain clearly what you would like to us to understand.
We will also be able to tell you about what we want to do.
We are constrained to how much we can achieve.
We are trying to:
- provide technology that is useful
- produce a standard
The two are inter-related.
Malcolm gives a potted history to OGSA-DAI (see slides).
OGSA-DAI started in Feb 2002. Initially motivated by a paper
by Paul Watson.
DAIT formally started in November 2003 but essentially there was
continuity. We are continuing to develop the same product.
GGF DAIS WG BoF at GGF4 largely by the OGSA-DAI team.
At this same meeting OGSI was announced.
GGF5 in Edinburgh the DAIS WG was started.
At GGF10 the WS-Resource Framework was announced.
We want to engage you in figuring out responses to this.
We can tell you our vision but you must tell us whether this
corresponds to what you want.
We believe that you cannot use data without knowing its
structure. Structure is supported by a DBMS. Structure
was also recognised in files which led to BinX and the
formation of the GGF DFDL WG.
Sources data are autonomously managed. Everyone who owns
data has requirements on how it should be used. Other may
want to know how their data is being used or charge for
their use of data.
To develop applications like this is difficult. Our goal
is to try and help with this by providing a generic
framework. A generic framework is more difficult to
use than a tailored environment. There must be some pay
off ... (some of the advantages enumerated in the slides)
Goal is to progressively develop what we want to do. Improve
the quality of the technology. We have to place a high value
on the ingenuity of our users. The technology must support
extensibility and not prevent users from doing what they
want to do.
Now we want to hear from the users what they are doing, what they like
and what they want.
---
Manfred Oevers (eDiaMoND)
Have been using OGSA-DAI in the eDiaMoND project.
Say a little about the eDiamond project and then how we used
OGSA-DAI.
Exposed data with OGSA-DAI and then how we extended OGSA-DAI.
eDiamond has been mentioned by Tony Blair (see slide).
Lots of partners.
Providing grid infrastructure to build a prototype how we can
support breast screening in the UK.
Have three application aspects:
- Capture of digitised mammograms, including the annotation
of images with metadata.
- need to enable the administration of those images.
- need to pull back the images to look at them
Implement this as an image store but also virtualize it.
Can use DB2 federation to federate databases and then expose this
to the Grid through OGSA-DAI.
Can also expose each database and federate using DQP.
Can also replicate this.
These are the possible solutions - the one adopted was federation
(DB3 information integrator) and expose this to the Grid using
OGSA-DAI.
Breast Care Units (BCU) operate independently - cannot warehouse the
data as there are issues as to who owns the data. The data is thus
distributed. Data is owned by each BCU - enable read only access
to each of these repositories.
Use DB2 and Content Manager. DB2 contains the data and images (DICOM).
Explanation given of the workflow (see slide) for each BCU.
Expose the Content Manager through OGSA-DAI.
Lessons learnt:
- ogsa-dai is very flexible. Integration of the content
manager
was straightforward. Just had to write three classes. Expose
this through activities.
- liked the fact that you could chain activities together.
- registries are very powerful for development. Just had to
look at different registries to see what services were
available to try things out.
- once you expose your data resources you can.
- enabled an IBM product for the Grid.
Q (Andy Borley). what are the major things that you would like in
OGSA-DAI?
A more powerful registry and the ability to register more content in
OGSA-DAI. The way it is presently implemented is not very scalable.
--Terry Sloan
I'm a project manager at EPCC. Have been involved with various Grid
projects. Talk about:
- FirstDig
- INWA
FirstDig a collaboration with First plc. First plc operate worldwide
- UK's largest transport. Have various distributed data resources
with different types of information. There are means to collect this
kind of information.
First wanted to combine the data from all these sources to answer
some business questions:
- complaints versus lateness.
- revenue vs lost miles
- ... (more in the slides)
Data geographically distributed. Issue when you try to combine this
data. The data sources are heterogeneous. Looked at OGSA-DAI to
see if it could provide a solution. Deployed it and used it to
combine two databases - a Microsoft access and dBase IV based system.
Implemented a browser that showed the integration.
The second project was the INWA - Innovation North Western Australia.
Funded by the ESRC under the collaboration programme. Various
partners (see slides). Using many different technologies to integrate
various public and private data resources.
Infrastructure shown on a slide.
Issues:
- client toolkit helps.
- interfaces to data analysis packages would be really great.
- CSV data resources are common and not supported well by
JDBC/ODBC.
- no support for BIT type field.
- Certain characters, "&","<" are not handled well.
- ...
- the rolemap file is not encrypted.
- installation & set-up still too complex and error prone.
- would be useful to have an "always on" standard DAI
reference
site to use to check installed system is working.
- ...
- large results sets .... increasing the JVM heap size is
not really viable ... want a streaming mechanism.
- integration - DQP is there but is restricted in the
platforms.
- Why use OGSA-DAI?, there are other solutions, e.g. easysoft
Q (Peter Henderson). With the first bus had access and db4 and you said
that you did a distributed query ....?
yes (explanation as to how this was done).
Q (Paul Watson). Conclusions about easy soft?
Nice thing about ogsa-dai is the driver issues ... the driver is
hidden away at the server and as a client you do not have to
worry about this.
Q (Guy Rixon). Why did you federate the database and not a data
warehouse?
It's all distributed - there are a lot of depots to deal with - a
data warehouse would entail a large maintenance effort.
Q. Was it cheaper?
OGSA-DAI not quite there yet ...
---Noel Kelly
Talk about GeneGrid.
Genegrid is commercially project in bioinformatics.
Fusion and Amtec are both companies, based in Belfast, that are involved
in this project.
There are lots and lots of data.
Objectives:
- Grid based framework for bioinformatics
- only 3 people are working on this so will use existing
solutions.
- want to do study in silico.
- develop specialist data sets.
- grid services for commercial 3rd party use.
- want to establish a R&D in N. Ireland.
Showed the GeneGrid Architecture (see slide)
Have three classes of databases:
- GeneGrid databases (being used by the project)
- Public databases
- proprietary databases (want to be included at some point)
Can use OGSA-DAI to integrate this. Have an Oracle databases at
Fusion. Amtec have not stated what databases they have.
Have:
- results (Xindice/Exist)
- workflow definition (xindice)
- workflow status (xindice)
Have five public biological databases: 1 use MySQL and 4 of them use
files.
Most of the databases are file based which are currently not supported
by
OGSA-DAI.
OGSA-DAI:
- ready to go solution
- easy implementation
- good documentation
- helpful and useful support
Issues:
- no support for flat file databases
Malcolm: our funders' Technical Advisory Group (TAG) told us not to do
files.
- service discovery (factories register with registry but no
metadata)
- CDATA wrappers - cannot access embedded xml documents.
- Perform documents addressed by the client toolkit.
- Service re-registration. Factories do not re-register or
register if the
factory is started after the registry - hence if
a registry fails there is no path to recovery.
Service discovery waiting for next release - need
re-registration. Sent an email to support recently and they replied
soon after.
Not sure if the CDATA wrapper is an OGSA-DAI issue.
Flat file databases is the main change that we have done
- implemented perl scripts in place of XML:DB/JDBC drivers
- extensibility support requires perl module development
Misc Contacts given at the end of the presentation.
Q (Patrick Dantressangle): large result sets were an issue in the last
talk - how did you do it?
Run a search on an index created by perl and not on the file. The
result database can build up very fast and run into memory problems -
this is a Xindice issue and not an OGSA-DAI one.
--Hans-Christan Hoppe
This is a project that has not started yet.
I'm working for Intel - used to work for Pallas but we were bought
out by Intel.
SimDAT project is currently under negotiation. Contains a number of
partners. Use Grid technologies that access data.
The project has to do with application areas and a number of key
technologies (see slide).
I am responsible for access to remote data repositories which may
use GT4 if this is stable enough or GT2/3 or Unicore.
Currently ogsa-dai is the only solution for data access and integration
out there.
phase 1: connectivity
phase 2: interoperability
phase 3: knowledge
(more details on the slides)
Have four application areas.
...
List of key requirements for the first 18 months in the slides.
Will start integrating with the upper layers. Users want a usable
product and not a research product.
For the next 12 months (after the first 18) want to work on
virtualizations and deal with a various other issues (see the slides).
Wonder whether messaging layers such as those provided by Oracle 10g
might pull things.
Also key requirements outlined for months 30-48 are given in the
slides. As a lot of the users come from the HPC space, performance
will be an issue. By the end of the project it must be clear how
to handle performance issues.
In the first 3-6 months consolidate the requirements from the product
analysis. Then have a look at OGSA-DAI and provide a gap analysis.
Prioritise requirements and provide a roadmap.
By the first 12 months want to integrated the prototypes and ensure that
they work together.
Q (Malcolm Atkinson). After the gap analysis you may find that there
are gaps and ogsa-dai may not cope with all of these - do you have any
resources?
My company has got some resources - may have money from/for(?)
external resources.
Q(Peter Henderson). Is Simdata going to use globus?
If you have asked me at the beginning of the year I would have said
that GT3 might have been useful. GT4 may be shaky ... possible
solution might be to use GT2 and then migrate to GT4 when this becomes
stable enough.
---Malcolm: myGrid and AstroGrid have a special relationship with OGSA-DAI.
TAG specified them as our first users.
Arijit Mukherjee
MyGrid and Gold
Said a little bit about MyGrid.
OGSA-DQP uses OGSA-DAI. Only component in MyGrid that uses and extends
OGSA-DAI.
Goals for DQP in MyGrid are enumerated in one of the slides.
...
Presentation stuck fairly close to the content of the slides.
Experiences of using OGSA-DAI (based on R3.0.2)
positives:
- uniform access
- wraps jdbc/xmldb details
- extensible
- not difficult installation process
- excellent user manual
negatives:
- still very slow - OGSI?
- metadata extractor is only there for mysql - needs extension
- high initialisation cost
- performance issues for large data sets
- possible bugs in xml utilities
- if you have special characters in the database, e.g.
alpha, beta or other special characters it leads to
problems.
- need customisable streaming (cursor like features)
- either you get the whole result set or only one row
data set.
- still contains hard-coded port number in the config files
- requires manual changes.
Limited use of OGSA-DAI/OGSA-DQP in myGrid:
- caused by the lack of stability of OGSI
- now uncertainty about WS-RF
- have wrapped OGSA-DQP as a Web Service.
Would like a WS-I version.
Gold is a new 2m pound e-Science pilot project (Newcastle and Lancaster)
Provide infrastructure in Gold.
Use the CLRC metadata model. Same model is being applied to MyGrid and
Gold.
Gold would like to use OGSA-DAI. Gold will build on WS-I (requires a
stable platform). Would like to use a WS-I version of OGSA-DAI if and
when this becomes available.
Q (John Murison): a general query to all user groups - was the initial
contact with ogsa-dai good? Would a course be good?
Documentation good and the interaction with the group was also
good.
Q (Simon Laws): in the use of WS-I for the two projects do you intend
to produce query interfaces or are these hidden behind?
There will be separate interfaces, one will be typed interface and
a general query interface. We are currently using hibernate to
hide the database. If ogsa-dai produces a WS-I version then
we would like to use it.
--Kona Andrews
Guy Rixon is also here.
Have different data sets that are distributed with different access
policies and different arrangements. Want to provide a virtual
observatory.
The component that we are working on in astrogrid at the moment is the
data access component. Another group in astrogrid has produced a
standard data access component that has been built using web services.
Decided fairly early on not to use Grid services.
Currently this lives in tomcat/axis.
Want an ogsa-dai client plug in on the datacenter component that talks
to ogsa-dai. Have had problems trying to get web services to talk to
grid services because globus distributes its own axis jars.
Have a temporary solution by installing ogsa-dai in a separate Java VM.
Want to use ogsa-dai as we want to talk to other grid resources
that deploy ogsa-dai.
Use ADQL that is fed to the data center - this is translated to
SQL. SQL gets shipped to ogsa-dai which uses sqlQueryStatement which
talks to postgres which is piped to deliverfromURL and is transformed
using xslTransform and use deliverToFile/deliverToGFTP.
Comments:
- astronomers often need large results sets
R3 not so good for this.
R3.1 better but still a problem.
- xslt Transform seems to be hogging memory
- for small results there is a 40 fold factor to use
ogsa-dai (<1000 rows), 10^4 - 10^6 acceptable,
fails for 2*10^6 rows. Measurements without xslt
which is a memory hog.
Many projects talk about read only access. We want to be able to
write to data resources (data warehousing). Would like to have
User-owned tables. Would like GDS to do transfer and bulk load.
Will be relying on the BulkLoad.
Hope to get query results to our warehouse.
...
Relying on sqlBulkLoadRowSet.
Will need table management facilities - currently this is rather thin.
Not using X.509 authentication at user granularity - Astrogrid uses
its own authentication mechanisms (users do not have their own
certificates).
Require to access Web Services (WS) from Grid Services (GS).
Requirements:
- ability to handle very large results sets (possible show
stopper)
- suite of db/table management activities
- list tables in lb, describe table column types/space
usage,
last access/modification time.
- ability to create and delete indices in tables
- ultra simple installation procedure
- porttype issue mentioned by Arijit is also an issue.
Q (Simon Laws). Can you say something about WS wrapping of GS?
Most astrogrid components to talk to the world via WS. Queries go
into WS which then invokes a GS.
Q (Simon Laws). Do you have a general mapping policy for naming
policies?
Not using X.509 to keep track of individual users ...
There is a registry in astrogrid which deals with EPRs but
also need contextual metadata.
Q (Malcolm Atkinson): about the table/index management - these could be
done by SQL but if you want special activities?
We have a development component called mySpace ... want to steal
functionality from that component ...
Guy Rixon: if the things that corresponded to the WSRF was a table
that might clean itself that would be really good. (???)
--OGSA-DAI: Status and Future Plans
Neil Chue Hong
Try and go through a status update and go through the release roadmap.
This UG meeting will be used to collect requirements for future
releases.
Have a slightly smaller team - 5 FTEs at EPCC, 5 FTEs at IBM.
One PDRA at each of NeSC, Manchester, Newcastle
(content is given in the slides)
R3.1 already out which is technical preview of R4.
Hoped that R5 would be DAIS compliant ... not sure when we can become
compliant with DAIS (question about stability of the spec). Reluctant
to change the interfaces too many times.
R4:
- uses GT3.2
- supported client toolkit library
- updated the activity engine
- additional dbms supported (SQL Server/Postgres)
- New GUI data browser
- bulk load supported
More functionality
- Metadata registration of DAISGR
- (more items mentioned in the slides)
Release date: end of April/early May.
R5: due October 2004.
- possible alignment with WS-RF and DAIS specs.
- at least a tech preview of OGSA-DAI for GT4
- possible WS-I interface implementation
- integrate DQP
- ...
- Coordinated OGSA-DAI contribution policy
R6: April 2005 R7: Oct 2005.
OGSA-DAI R5 will prob have a WSRF interface - up to you guys (i.e. the
users)? Currently investigating options for supporting different
interfaces in the future. Do not just have code changes to deal with also have to consider documentation and testing of code.
Issues outlined in slide.
Now have metadata extractors for Oracle/DB2 but this data comes out in
different formats - do we have to provide these in the same format?
... lots of points mentioned in the slides ...
Mention of the DAIS proposed specs. Do we support things like files
if they are not going to be standardised?
Malcolm: at GGF when we set up DAIS and at TAG there were strong
statements that we should not get involved with files which
was a complete contradiction from what our users
wanted. Trying to accommodate files but these are like tag
on additions.
Paul Watson: interested to know whether people are interested in doing
DQP across XML data resources.
...
Have been conducting a survey that Paul has been collating.
Paul: encourages folks to submit responses to the survey.
...
One of the problems that we have is versioning ... previously mentioned
the problem with jars when trying to use WS and GS at the same time.
Various different possible directions of OGSA-DAI development are
enumerated (see slides).
Malcolm: had a request for Perl ...
...
Neil emphasises the need for requirements to come from the user
community.
Q(Kona Andrews): how effectively will the client toolkit insulate
users from change?
We hope that it will be pretty good at doing this but it does depend
on the underlying infrastructure that is adopted and the amount of
change that is incurred from the Grid Community.
Malcolm Atkinson: that's an interesting question ... it's really about
whether the fundamental vision works, whether we can
amortise the development cost of a large m/w component
over lots of applications ... we joined the Globus
Alliance. Some users want to take things at face
value where as others like to tinker with things
under the hood to get the last ounce of speed.
So in the end we face the question whether you can
support these with different branches of the same
technology or need different types of product.
-------------------------------------------------------------------------There were two break out sessions.
- Requirements
- OGSA-DAI model and infrastructure
Details of the scope for each of these is given at:
http://www.nesc.ac.uk/esi/events/401/
-------------------------------------------------------------------------Infrastructure Breakout Session
===============================
Simon Laws presented some slides to motivate discussion.
Objective - to discuss
- Collection of WS-* specs
- support for DAIS
Can't do everything in OGSA-DAI. Need to identify:
- requirements from projects
- requirements across projects
- requirements of high priority
Should try to avoid the politics.
First thing to discuss: WS-* stuff.
o Generally have:
- OGSI
- WS-RF
- WS-I
o Which need to support
- identity of resources
- stateful interactions
- metadata
- sessions
- lifetime management
- endpoint fragility
- security
o Visibility of the differences
- what is the cost (to administer/counter) of the different
approaches
Guy: this discussion is key as to whether astrogrid continues to use
OGSA-DAI. Astrogrid can re-implement the functionality of
OGSA-DAI if we have to maintain compatibility with a given
platform. Astrogrid does not use OGSI. Might want to WSRF if it's
going to be the foundation of OGSA. We will not use if it it's
only going to buy us into ogsa-dai. We want to adopt a solution
across the virtual observatory. Having a WS-I OGSA-DAI followed
by WSRF would be cool.
Manfred: is it possible to start with WS-I and then migrate to WSRF
and then only add functionality.
Guy: if it is not possible to get a plain system.
Paul: would client libraries hide the differences and extra
functionality added later?...
Guy: suspicious about protocols based on the API and not what goes over
the
wire.
Simon: there are a set of standards that are being developed out there
...
there is a proposition that is being developed by WSRF tied to
resources. If we did not use that spec then we would have to
propose
something else...
Guy: playing devil's advocate here but you could propose something
that is a WS-Resource property that picks up a bit of the WSRF
but then you would not use the whole of the WSRF.
Simon: issues about how you pick and match about these things ... can
have
different approaches to identify the resource ... still need to
have
a metadata spec. There are a lot of things that we want to
describe
a resource.
Mario: another issue is the time frame that is going to take for the
specs
to mature. Difficult to build on top of moving specs.
...
Simon: interested in the registry aspect that was mentioned by previous
presentations. Is it useful to identify a naming scheme for a
resource or whether you produce ....
Guy: in astrogrid the registry is a fairly mature concept - an internal
standard ... want the hooks to apply that with ogsa-dai.
Paul: my experience is that in December Savas proposed a common way to
do
metadata that you could associate with a service but nobody liked
it as groups had already established a way of producing metadata.
Trying to standardise metadata or the naming scheme is very
difficult.
Manfred: also one person's data is another person's metadata ... one
point is
not to make the distinction as you don't want to replicate
the
methods....
...
Guy: it would be good in astronomy if for each GDS you could have a
chunk of xml that could be loaded by the owner ... so it is able
to upload a bit of contextual metadata. This would be associated
with the data resource but dished out by the GDS.
...
Paul: need to do the translation from the name of the resource to it's
location. Any name that takes you directly to the resource is
not going to do the whole thing ... you would need higher-level
services. ...
Simon: if you are trying to map to an endpoint this is a general issue..
Manfred: is that not what the GSH was about - a stable name that could
be resolved to a single things ....
Paul: a GSH was a Grid Service, an identifier for a Grid service and the
GSR
gave you what you actually contacted ...
Manfred: idea was to have a stable thing and something else to access
the
resource...
...
Guy: in astronomy working with things that label resources that do not
have to be services ... finding that you can't do squot with an
IVRN until you resolve these things. It works but it's a major
performance thing .... but this is probably out of scope with
what ogsa-dai are trying to do ...
Paul: there is a re-mapping thing in WSRF that has not been published
yet ... the renewable resource ...
Simon: do we believe that we are mapping to services or a resource or
the resource could be something that is being virtualised. Lots
of complexity that could provide flexibility, extensibility...
Guy: associate a set of properties with a resources ... always defined
outside of ogsa-dai.
Peter: would welcome an opportunity if we can all agree with what we
mean
by wsrf ... as I understand it it's a collection of growing specs
which does not include ws-addressing or ws-security ... what are
we referring to ... WS-I and WSRF are not the same way of doing
the
same thing WS-I is just WSDL and SOAP.
Guy: if you use WSRF you use WS-Addressing and not WSRF... use OGSI for
security .... WS with security would do us ...
Paul: I don't think that OGSI did not say anything about security - have
to distinguish between OGSI and GT3. GT3 had OGSI with a
particular
security policy.
Peter: what do we mean by WS-I? If the choice is WS-I then that leaves a
whole gap to fill in.
Simon: we are treating data as a first-class citizen which is what we
are
trying to do that with OGSA-DAI. Can do that by using WS-I or
WSRF. In WS-I could use a URI, or a registry ... you could other
implicit contexts...
Guy: are we actually using the added features of OGSI to identify the
resource?
Simon: discussion gets complicated as we begin to intersect with the
DAIS
model ... along all the changes for WSRF ... DAIS has also
changed.
OGSA-DAI has a simple model with data resources represented by
factories with session being created - DAIS has now a service
that
represents a data resource with an interface ... you can create
one
data service from another ... if you run a sql query on a data
resource the result set that could be exposed as another service.
Guy: the resource is identifier is spat out of the creation service ...
if you can do extra stuff, if you can speak wsrf, then that would
be a good thing.
Paul: That's one possibility ... also if people want a WS-I version of
OGSA-DAI would people accept a cut down version. Would people
accept less functionality?
Simon: may have minimal functionality within a WS-I and use registry
for identity and then describe a road map where you grow that
to add more functionality ...
Guy: thinking of not using the registry provided by ogsa-dai.
Functionality
depends on what you speak .... wsrf in which case you can pick up
the extra functionality.
Paul: something similar would happen in a WS-I ... in one case you get
an EPR and in the other you would provide a token from the
context.
Guy: the wsrf becomes a job identifier ...
Paul: you can use that to get the status.....
Guy: you could say that there is no way to check the status of an
operation
when you are using an asynchronous ...
Simon: could do the same thing in different ways ....from this morning
people have said that they want WS-I and are waiting for WSRF
to see what it has ....
?: it would be a mistake to drop GT4 support ... the people that want
WS-I are probably out of the room ... they just want the
functionality
and not care how this is done.
....
Guy: astrogrid do not want to use GT4...
Simon: there is an incremental way of doing this. ... Could provide a
WS-I
thing fairly simply and provide an upgrade path to wsrf... have
to talk to the guys... do we use wsrf or invent it ourselves?
Paul: the idea of having a set of technology previews might be a good
thing
but these hard things ... we need to do some assessment ... if
people
give us the functionality then we can decide but if people do not
give us the functionality then it's harder to do
Guy: could ogsa-dai provide a straw man for each of these cases to see
what
is available and what is not ...
Simon: have been doing this internally within ogsa-dai ....
Paul: doing a questionnaire at the moment ... have about 30 responses
back but there is no simple picture becoming available...
A straw man would be a great way of managing expectations.
Guy: if people said that you wanted WS-CAF what would you do...?
Paul: you could throw that in...
Peter: if you use WS-I would you throw out the Globus dependency?
Simon: hardest thing would be to have Globus/non-Globus ...
Peter: not likely to be an ogsa-dai user/implementer ...
Paul: question that we have been asking is what kind of platform
people want. For WSRF we would probably use Globus but for a
WS-I solution we would probably use a raw Axis solution...
Peter: one of the potentially biggest users of using ogsa-dai are Nees
Grid and they are probably screwed into Globus ... would you
abandon these?
Paul: it is not easy to pick a winner .. the community is not clear
Guy: you get three things ...
- what do you put at the service end
- what are you doing at the client
- what are you expecting from third party
Peter: WS-I/WSRF seems to be whether you go with Globus or not?
Simon: I don't agree with that ... there are things from Globus that you
would still need ... this is not an anti-Globus thing but how
you untangle bits of OGSA-DAI from Globus ... easy to be
philosophical
until you start to implement. Globus, in the first instance, will
provide a WSRF implementation or you can go to IBM to get
...???...
In the first instance globus will be the first implementer.
Guy: there's supposed to be a WSRF implementation in....
...
Simon: given the amount of effort there is not a lot that we can do
... are there any groups that require function in different
languages/platforms.
Paul: all of myGrid will use axis/tomcat.
Guy: if you use a client toolkit you will have to support a lot of
different
languages ...
....
Paul: the argument about client libraries ... some people will be happy
to
use client libraries ... there is some concern if you are using
BPEL
that has to talk directly to a service then there could be an
issue
... in myGrid we have a workflow you want to call things
directly...
Simon: Neil did suggest that we want to have identifiable resources in
work
flow. You are dealing with streaming resources together ... have
an
implication of the data that you are passing about ... the BPEL
people are looking at WSRF .... the client library would not have
a part to play in this.
Paul: getting other standards to get buy in wsrf is going to be key to
wsrf's success ... the same is true for transactions...
...
Manfred: I thought part of the thrust of wsrf was to push all of this
stuff
into the header ... the infrastructure can take care of all
this.
When you flow these things through the application interface
is
less composable.
Guy: you can do that but the service still has to provide some
interface...
Paul: I think you are right ... it's just to do with time scales for
tooling and specs to support this model ... therefore someone
who's project worked around BPEL would not work at the moment
until this changed.
Simon: we have been here before trying to change specifications
... trying to judge the mood of the community or the people
writing the specs ... I work fro IBM and clearly IBM are very
interested in evolving WSRF.
...
Paul: this comes through with what people are saying in the
questionnaire where people are saying that they would like a
WSRF and WS-I implementation ...
...
Simon: keen to look at a number of areas by tech previewing ... look
at certain things in WS-I and certain things in WSRF without
having to look at everything else...
....
Files came out as key this morning as something that is
supported.
?: what is the motivation for TAG keeping clear of files?
Paul: my view is that we should look at structure in files ... so not
sure why Malcolm is stating that files are out of scope..
Simon: that was certainly said by GGF....
....
Guy: would like a plug-in for having file access ...
Simon: that's the way that DAIS is looking at things at the moment
... the output format might be a property of the service or the
operation that you call....
Guy: my assumption is that the plug-in would have a fixed output format.
....
Guy: another project in astronomy treat telescopes as data sources and
they might be interested in using ogsa-dai for this kind of
things.
...
Mention of Beth Plale and usage of streams.
Manfred: Beth has been talking to Steve Fisher in the context of the
European data grid....
...
An attempt is made is made to summarise the discussion of the key
points.
o Infrastructure
- Uncertainty
- People want to GT4
--------------------------------------------------------------------------Requirements Breakout Group
===========================
This was by far the more popular group and were designated the smaller
space :-)
Apparently a flip chart was used with a set of requirements under
various
categories. Folks were then forced to prioritise these requirements.
Images of the flip charts are available from:
http://www.nesc.ac.uk/talks/401/flipchart.zip
Andrew Borley summarised the session as follows:
--NCH announced goals for the session:
- List the requirements of each project
- Identify common requirements
- Prioritise requirements
We identified categories then went round the projects to identify
project requirements:
Data Access
(access to data sources - SQL queries, DB management, translation)
Data Integration
Application Support (client)
Performance/Reliability
Security
Usability/Support
Metadata/Registration
PD has a list of all participants in this discussion
For each category, we brainstormed some sub-points and each
participant voted for those important to his/her project.
Key
--Item: No of votes [*** identifies major requirement]
Data access
----------Flat file access (structured/unstructured/arbitrary structure): 9 ***
Moving process closer to data: 5
Exploiting structure in file: 1
Unhandled characters (&, >, <, etc): 1
More data sources: 1
Customisable/incremental streaming of data: 2 (dependency for
Distributed
Query)
More data formats: 3
Large result sets (including large blobs): 6 ***
Temporary table creation: 1
DB indexes: 1
Translation on input: 1
Distributed write: 1
Xpath/Xquery queries over xml files and streams (not inside an xml db):
1
Distributed transactions: 1
Data Integration
---------------Schema integration: 7
Multi-model query (queries over relational, structured and
semi-structured data at the same time): 7
Distributed query (joins across different data sources): 10 ***
Mixed language queries (Xpath & SQL): 1
Data format transformation: 5
Application Support (client)
---------------------------Client API: 7 ***
Activities: 3
Authorization/Authentications/A...: 4
Instrumentation: 1
Diagnostic tools (performance analysis): 6 ***
Packaging/Deployment: 5
Logging/Auditing: 2
Notification: 2
Polling properties: 1
Configuration wizard: 1
Performance/Reliability
----------------------Size of results: 3
CPU:
Memory: 4
Overheads (WS..): 1
Network bandwidth: 2
Latency:
Number of users:
Number of queries:
Must not fall over: 5 ***
Recovery/checkpoints: 2
Re-register: 3
Security
-------Authorization: 8 ***
Authentication: 8 ***
Accounting: 3
Privacy: 7 ***
Role based access control: 8 ***
Roles for file access: 9 ***
Usability/Support
----------------Installation: 6
Configuration: 6
Plug-in packages/integration with other tools: 6
Reference site: 7 ***
Installation self test: 8 ***
Support: Everyone ***
Metadata/Registration
---------------------
More information in the registry
- schema: 4
- data: 3
- dms functions: 6
Tools to extract this data: 2
Matching algorithms: 1
Physical metadata: 6 (dependency for Distributed Query)
Logical metadata: 6 (dependency for Distributed Query)
Self-recovering registry: 7 ***
Conclusion
---------Reliability
File access
Big result sets
--------------------------------------------------------------------------Joining of groups - summaries from the two break out groups
(Summary of the infrastructure group)
Infrastructure
- community is not certain of what to do.
- suggested to provide straw men to show implications of
different frameworks.
- Some people want WS-I others GT4 (!not WSRF!) others
just want something that works
- WS-I first, WS-RF later as it matures
DAIS
- Keep functions that provide value
- perform documents, activities
- Validate the DAIS standard
- not a user requirement
- composability with other WS specs
Deployment
- Files, files, files
- some subclass of files, e.g. csv, dfdl
- model a file as a table in a database
- Perl
- What do we mean by WS-RF/WS-I
- standards or platforms
- Different client library languages
- Java, C++, C#, Perl, C?, Fortran?
- As a contribution
- Publish an API for the client library
Norman: speaking about the file requirements. In the DAIS context
there are many kind of files. People have different
requirements ... was there another kind of consensus ... there
are different issues for OGSA-DAI ... support for the document
style interface - assume that it's going to keep this but
alongside it might support a DAIS interface.
Malcolm: several people have expressed a desire for continued support
for perform ...
(Summary of the requirements group)
Data Access
Top two requirements:
- flat file access
- big result sets/big BLOBS
lots of other ones.
enumerated functions/extensions and then asked each of the projects
to talk about the requirements. 15 people in the group with about 7
projects.
Big results sets is more to do with the reliability as opposed to new
functionalities.
Data Integration
- schema integration
- multi model query
- distributed query (join between data services)
- ...
Application support
- extensibility client api
- ...
Performance/Reliability
- large file access
- must not fall over
- recover and checkpoint
Security
- Authorisation
- accounting
- ...
Usability/support
- installation
- self test
- reference site
- support
Metadata/Registry
- self recovering registry
- physical/registry
- ...
Conclusion
- reliable
- file access
- big result sets
Malcolm: quite a lot of overlaps between the two groups. One of the
challenges for the Technical Review Board and the managers is
to try to reconcile all these results. We should carefully
costs all these requirements. Also, there may be sampling
errors as not all projects are represented.
Paul: conclusions from the group I was in is backed by the survey.
We've had about 30 responses. There is no clear winners. In
terms of the functionality that reliability and big result sets
which are improving things that we have and not adding new
functionality ...
Malcolm: people were much clearer that it should do what it says on the
tin and
not fall over ... in particular as JVMs don't recover
after memory failures big result sets were an anathema...
Paul: appreciate that building reliable systems takes time and effort
... not sure the cost for building in file access ... as Norman
said file access means different things to different people ...
Malcolm: Martin has done some work on file access ....
Martin: got the impression that people are interested in csv files and
various ways of encoding records on files ... is there
anything else?
Malcolm: I don't think that you can categorise biological files in
that way ...
Martin: there is DFDL ...
Malcolm: data cutter can export function ... could package data cutter
with ogsa-dai as a special action space for driving things
through data cutter. ... One of the outcome is that there is
a fair level of consistency in terms of requirements. Came
out with a long list but people had to be pushed to
sub-select things .... do people want to have another user
group meeting ... in about October
*October was seen as a good time to have*
- A mini workshop will be held at the next AHM.
- need to decide where the next meeting will be.
- would like to have a chair for the user group meeting.
Would like input from people. Is there any input?
Nice to hear about what people were doing with ogsa-dai.
Noel: it seems that a lot of other projects have not presented.
Malcolm: it's always nice to hear about continuity. Do people like a
training event juxtaposed or is that irrelevant?
...
Rob: should not commit to a beta release schedule
Malcolm: will take input on a chair and the scheduling for the next UG
meeting. ... thanks for everyone for coming.
Discussion about contributions:
- Legal aspect
- Project management side - maintaining a product.
Discussion about maintaining software contributions that OGSA-DAI
might not be able to keep. The OMII model was discussed.
Download