2009-09 Vocabulary services and WDTS

advertisement
Vocabulary Services
“Huuh - what is it good for…” (in WDTS anyway…)
4th September 2009
Jonathan Yu
CSIRO Land and Water
Talk outline
Water Data Transfer Standards
What are vocab services?
How are they used?
How are we are using it in WDTS?
Future work
CSIRO.
Water Data Transfer Standards (WDTS)
• Joint effort by CSIRO & Bureau of Meteorology
(BoM)
• Problem space: standardising format of water
observation data
• Currently water data providers sending data in various
formats
CSIRO.
WDTF
• Develop Water Data Transfer Format (WDTF)
• Standardised format for sending and receiving water
related data using XML (e.g. groundwater, river flow)
• Primarily used by water providers to send their data
for ingestion by BoM
• But also for exchange between other organisations
• Part of integrated national water information system
to help with water crisis
CSIRO.
Validating WDTF
• Potentially lots of agencies (over 200) submitting WDTF
• Can’t possibly examine each XML file for valid structure and
content !
• Need mechanism(s) for validating WDTF
CSIRO.
Why not use XML Schema?
Hang on, can’t we just use XML Schema to enforce validation rules?
• XML Schema not sufficient enough and can’t capture a lot of the semantics in business
rules
<wdtf:Measurement gml:id="WQ24010722">
<om:procedure
xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/procedure/QualityMethod/bom/611/>
<om:procedureOperator
xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/party/laboratory/w00233/SWC-LAB/>
<om:observedProperty
xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/WaterpH_pH/>
<om:featureOfInterest xlink:href="#s209021491“/>
<wdtf:metadata>
Approp.
<wdtf:MeasurementObservationMetadata>
identifiers
<wdtf:regulationProperty>Reg200806.s3.9g</wdtf:regulationProperty>
<wdtf:securityConstraints>Unclassified</wdtf:securityConstraints>
</wdtf:MeasurementObservationMetadata>
</wdtf:metadata>
Valid content
<wdtf:result uom="[pH]">7.56</wdtf:result>
</wdtf:Measurement>
and contextual use
CSIRO.
What are Vocab Services?
•
Set of services for querying and managing vocabularies
1.
Interfaces
•
•
•
2.
•
Vocabularies
Vocabularies
•
CSIRO.
SPARQL Protocol and RDF Query Language (SPARQL) queries
HTTP Get/Post , REST
Html forms
Descriptions about a domain in specification language:
concepts, properties, relationships, assertions
Vocabularies
What do they look like?
CSIRO.
Vocabularies
• What do they look like?
• water regulation codes,
• units of measure,
• pizza classifications
• http://www.co-ode.org/ontologies/pizza/2007/02/12/
• wine vocabularies
CSIRO.
What are vocabs services good for?
1.
Dictionary lookup:
What does this term mean? What is beetroot? What is Metres?
http://localhost:8080/VocabLookup/get/concept/vocab1.0/unit:Metres
2. Discovery and analysis:
What is it related to? I know I have beetroot in my fridge, what other related food is
in my fridge? What water regulation parameters use the unit Metres? Where does
the concept “Gold” occur in Geological surveys in Victoria?
http://portal.auscope.org/gmap.html
3. Interoperability and shared definitions and semantics:
Oh, this concept in my Business model maps to this other concept in your
Business model! Oh, your parameter of WaterCourseLevel is measured in metres?
Mine is in millimetres – let’s talk
4. Data validation:
Do I have milk in my fridge? Is this a valid water parameter? Is my XML data
consistent with WDTF?
5. Config. management and Generating code:
Fill templates or Spit out some code or artifact based on concepts, properties, or
conceptual structure in the vocabulary (my dinner, sitemaps, Java code,
Schematron rules)
CSIRO.
Vocab Services in WDTS
Leveraging Vocabulary Services for…
• Representing schema control lists currently being
maintained in Excel spreadsheet
• Validation services:
• Validate potentially lots and lots of XML data in WDTF format
• > 200 data providers transferring their water data to BoM
• Need to ensure format is followed
CSIRO.
Typical usage of vocabulary service
• Specific cardinality constraints and vocabulary checking
• Using HTTP get queries like:
• Is this a valid vocabulary definition?
http://localhost:8080/VocabLookup
/get/concept
/vocab1.0
/param%3AWaterpH_pH
http://localhost:8080/VocabLookup
/check/concept
/vocab1.0
/param%3AWaterpH_pH
CSIRO.
More query examples
• Is this Urn or http-Uri valid? E.g. urn:ogc:def:crs:EPSG::28349
http://localhost:8080/VocabLookup
/check/property
/vocab1.0
/dc:identifier
/%27urn:ogc:def:crs:EPSG::28349%27
• Does this Water regulation code parameter have the right
measurement unit associated with it?
http://localhost:8080/VocabLookup
/check/relation
/vocab1.0
/param:WaterpH_pH
/skos:related
/dc:identifier/%27[pH]%27
CSIRO.
Generating Schematron code from query
• To check cardinality between one element and another
• Example:
your HydroCollection xml data may have as many
<observationMember> nodes but must only have one
<metadata> node
http://localhost:8080/VocabLookup/get/cardinality/wdtfstructure/wdtf:HydroCollection/wdtf:metadata
Or just get all of them
http://localhost:8080/VocabLookup/getall/cardinality
/wdtf-structure2/
CSIRO.
Problems and areas of difficulty
• Emerging requirements
• Open source tools
• Standards for representing vocabularies
• Implementation specific details of how vocab stored, managed
• Exploring what we can and can’t do with vocabulary services
• Method or approach used
• Encapsulating vocabulary in the ‘right’ way
• Various ways to represent something
• Best practices for querying
• Versioning
CSIRO.
Conclusion
• WDTS Project and driving problem space for
Vocabulary services
• Vocabulary services:
Huuh - what is it good for… absolutely something!
• At least, for validating content and business rules
in WDTF
CSIRO.
Future work
• Continuing to develop solutions to current business
rules and best practices for WDTF 1.0, 1.1
• Validation of future (complex) business rules
• Currently unknown – suspect that we will continue to push the
boundary with leveraging of vocabulary services
• Documentation generation
• Leverage on vocabulary service to aid documentation
generation – i.e. populating constraints
• WaterML 2.0
• Worldwide standard for a water data exchange format
CSIRO.
Questions?
CSIRO.
Download