Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4th September 2009 Jonathan Yu CSIRO Land and Water Talk outline Water Data Transfer Standards What are vocab services? How are they used? How are we are using it in WDTS? Future work CSIRO. Water Data Transfer Standards (WDTS) • Joint effort by CSIRO & Bureau of Meteorology (BoM) • Problem space: standardising format of water observation data • Currently water data providers sending data in various formats CSIRO. WDTF • Develop Water Data Transfer Format (WDTF) • Standardised format for sending and receiving water related data using XML (e.g. groundwater, river flow) • Primarily used by water providers to send their data for ingestion by BoM • But also for exchange between other organisations • Part of integrated national water information system to help with water crisis CSIRO. Validating WDTF • Potentially lots of agencies (over 200) submitting WDTF • Can’t possibly examine each XML file for valid structure and content ! • Need mechanism(s) for validating WDTF CSIRO. Why not use XML Schema? Hang on, can’t we just use XML Schema to enforce validation rules? • XML Schema not sufficient enough and can’t capture a lot of the semantics in business rules <wdtf:Measurement gml:id="WQ24010722"> <om:procedure xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/procedure/QualityMethod/bom/611/> <om:procedureOperator xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/party/laboratory/w00233/SWC-LAB/> <om:observedProperty xlink:href=http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/WaterpH_pH/> <om:featureOfInterest xlink:href="#s209021491“/> <wdtf:metadata> Approp. <wdtf:MeasurementObservationMetadata> identifiers <wdtf:regulationProperty>Reg200806.s3.9g</wdtf:regulationProperty> <wdtf:securityConstraints>Unclassified</wdtf:securityConstraints> </wdtf:MeasurementObservationMetadata> </wdtf:metadata> Valid content <wdtf:result uom="[pH]">7.56</wdtf:result> </wdtf:Measurement> and contextual use CSIRO. What are Vocab Services? • Set of services for querying and managing vocabularies 1. Interfaces • • • 2. • Vocabularies Vocabularies • CSIRO. SPARQL Protocol and RDF Query Language (SPARQL) queries HTTP Get/Post , REST Html forms Descriptions about a domain in specification language: concepts, properties, relationships, assertions Vocabularies What do they look like? CSIRO. Vocabularies • What do they look like? • water regulation codes, • units of measure, • pizza classifications • http://www.co-ode.org/ontologies/pizza/2007/02/12/ • wine vocabularies CSIRO. What are vocabs services good for? 1. Dictionary lookup: What does this term mean? What is beetroot? What is Metres? http://localhost:8080/VocabLookup/get/concept/vocab1.0/unit:Metres 2. Discovery and analysis: What is it related to? I know I have beetroot in my fridge, what other related food is in my fridge? What water regulation parameters use the unit Metres? Where does the concept “Gold” occur in Geological surveys in Victoria? http://portal.auscope.org/gmap.html 3. Interoperability and shared definitions and semantics: Oh, this concept in my Business model maps to this other concept in your Business model! Oh, your parameter of WaterCourseLevel is measured in metres? Mine is in millimetres – let’s talk 4. Data validation: Do I have milk in my fridge? Is this a valid water parameter? Is my XML data consistent with WDTF? 5. Config. management and Generating code: Fill templates or Spit out some code or artifact based on concepts, properties, or conceptual structure in the vocabulary (my dinner, sitemaps, Java code, Schematron rules) CSIRO. Vocab Services in WDTS Leveraging Vocabulary Services for… • Representing schema control lists currently being maintained in Excel spreadsheet • Validation services: • Validate potentially lots and lots of XML data in WDTF format • > 200 data providers transferring their water data to BoM • Need to ensure format is followed CSIRO. Typical usage of vocabulary service • Specific cardinality constraints and vocabulary checking • Using HTTP get queries like: • Is this a valid vocabulary definition? http://localhost:8080/VocabLookup /get/concept /vocab1.0 /param%3AWaterpH_pH http://localhost:8080/VocabLookup /check/concept /vocab1.0 /param%3AWaterpH_pH CSIRO. More query examples • Is this Urn or http-Uri valid? E.g. urn:ogc:def:crs:EPSG::28349 http://localhost:8080/VocabLookup /check/property /vocab1.0 /dc:identifier /%27urn:ogc:def:crs:EPSG::28349%27 • Does this Water regulation code parameter have the right measurement unit associated with it? http://localhost:8080/VocabLookup /check/relation /vocab1.0 /param:WaterpH_pH /skos:related /dc:identifier/%27[pH]%27 CSIRO. Generating Schematron code from query • To check cardinality between one element and another • Example: your HydroCollection xml data may have as many <observationMember> nodes but must only have one <metadata> node http://localhost:8080/VocabLookup/get/cardinality/wdtfstructure/wdtf:HydroCollection/wdtf:metadata Or just get all of them http://localhost:8080/VocabLookup/getall/cardinality /wdtf-structure2/ CSIRO. Problems and areas of difficulty • Emerging requirements • Open source tools • Standards for representing vocabularies • Implementation specific details of how vocab stored, managed • Exploring what we can and can’t do with vocabulary services • Method or approach used • Encapsulating vocabulary in the ‘right’ way • Various ways to represent something • Best practices for querying • Versioning CSIRO. Conclusion • WDTS Project and driving problem space for Vocabulary services • Vocabulary services: Huuh - what is it good for… absolutely something! • At least, for validating content and business rules in WDTF CSIRO. Future work • Continuing to develop solutions to current business rules and best practices for WDTF 1.0, 1.1 • Validation of future (complex) business rules • Currently unknown – suspect that we will continue to push the boundary with leveraging of vocabulary services • Documentation generation • Leverage on vocabulary service to aid documentation generation – i.e. populating constraints • WaterML 2.0 • Worldwide standard for a water data exchange format CSIRO. Questions? CSIRO.