Common Observations and Measurements Profile COMP Today and Tomorrow Data Management IMD, CSB September 2014 COMP Background The Need for a Common Data Standard • Incompatible data formats and metadata conventions among • programs Assessment, comparison and integration of data sources requires excessive efforts – often impractical (Quoting the DST) Primary Audience • Scientists and data scientists • Can make a major contribution to: • Open Science (OGAP 2.0) where EC plays a leading role • Open Data Initiatives Based on WaterML 2.0 Part 1 2 “The core aspect of the model is in the correct, precise description of time series. Interpretation of time series relies on understanding the nature of the process that generated them. This standard provides the framework under which time series can be exchanged with appropriate metadata to allow correct machine interpretation and thus correct use for further analysis.” COMP XML Standards Architecture WaterML 2.0 Part 1- Timeseries North American Profile of ISO 19115 (ISO/TS 19139) Observations and Measurements 2.0 (ISO 19156) SWE Common Data Model 2.0 Geography Markup Language 3.2 (ISO 19136) TimeSeriesML • WMO/NOAA and EC want WaterML 2.0 Part 1 rebranded • IMD is participating in the OGC TimeSeriesML SWG COMP Features Available Today XML Data Exchange A common XML exchange profile for time series data that is 100% compliant with OGC international standards – no local extensions COMP Viewer When you open an online COMP XML file in your browser, the Viewer tracks down all the external references and presents you with a complete picture of the metadata and data as an HTML report in the official language of your choice – with outlining for easy navigation COMP Data Point Downloads The Viewer also offers download file options for the extraction of data values into tabular formats for consumption by your analytical software Future Value Added Tools GIS Mapping Data Visualization The Anatomy of COMP Common Observations and Measurements Profile A common XML exchange profile for time series data that is 100% compliant with the OGC international standards: wml2: WaterML 2.0 Part 1 – Timeseries om: Observations & Measurements swe: Sensor Web Enablement Common Data Model gml: Geography Markup Language COMP Logical Data Model Provides a simple, stable, logical layer used for: User interfaces Data resource modularization Creating COMP XML Two methods: Data Input Templates Legacy system ETL Excel Input Templates FME - Feature Manipulation Engine with XSLT transformations Collection Monitoring Site Observation Data Points Contact Procedure SKOS Taxonomies Example of COMP Use of SKOS Taxonomies COMP XML This XML fragment references a name and a unit of measure in SKOS taxonomies SKOS Taxonomies Define these terms in English and French Simple Knowledge Organization System EC ISO-NAP NAtChem (Air Quality) Substance Unit of Measure WaterML2 Species Bio-organism Water Quality Water Quantity Meteorology Ice Service Wild Life Service ? COMP Viewer Looks up these SKOS references and resolves them in English and French en-CA fr-CA COMP in Action COMP XML File http://www.ec.gc.ca/data_donnees/comp When the user clicks on the file, it asks the browser to render the XML using the COMP Viewer instead of its own default XSLT script (See 2nd line of syntax) COMP Viewer COMP Files XSLT SKOS Taxonomies Download Service Data Point Extraction Scripts 1 Browser uses COMP XSLT 4 The Download Service pipes back the output to the browser invoking its standard file download facilities 2 Selecting a download option invokes the Download Service 3 7 On-the-fly Download Options Domain Variables Where, When? Range Variables What? Future Options? fix a data point within the space and time extents of the time series, e.g. date, time, latitude, longitude, altitude, flight number, pass number. They make up the key of a data point. are the observed properties recorded in a time series, such as the values for wind speed, temperature, pollution levels, specimen size, etc. Other Vocabularies: This fits within the concept of dimensions in data warehousing. This approximates to the concept of facts in data warehousing. JSON KML OGC XML WMO/NOAA XML COMP Demo COMP Viewer Data Point Downloads COMP Next Steps Data Warehouse XML-Relational Hybrid Query Dimensions API WFS Service Query Response COMP XML Payload Audience EC GOC International Built-in Functionality Temporal extent Spatial extent Sites Variables Techniques Indexed SQL Tables Pointing to COMP Viewer Data Point Downloads XML CLOBs Query-specific Collections of COMP components are assembled on-the-fly for the API What users want - What COMP offers Easy access to high quality metadata • that allows them to evaluate the quality of the data and the • techniques used to acquire and process it The COMP Viewer gives them a comprehensive view of all the relevant metadata in an easily navigated format in their preferred official language Easy access to data points for analysis • In formats compatible with their analytical software • The COMP Viewer provides instant access to on-demand generation • of data point text files in a variety of formats Additional formats can be supported on request, both text and XML Easy access to functionality wherever the dataset is found • On a Web portal, through a Web service API, in Web accessible • 11 folders, or on FTP servers COMP XML files come with all this functionality built-in no matter where they are stored online Backup Slides Just in case! COMP Logical Data Model Contact 0:n 0:n 0:n Collection Procedure 0:1 0:1 0:1 0:1 1:n Monitoring Site or or 0:n hierarchy 0:n related 1:n 1:n Observation 1:1 1:n Time Series Variable 1:n Domain Range 1:n 0: 4 Data Point 0:n Qualifier COMP Input Templates Taxonomy Template generates SKOS Taxonomies references .RDF .XML Contact Template generates Contacts Collection Template Collection references Contact Monitoring Site .XML .XML Observation Monitoring Site Template generates Monitoring Sites references Procedure .XML .XML Procedure Template generates Time Series Data Point Procedures references .XML .XML Data Point Template generates Data Points references .XML .XML Define once and then reference wherever needed Data Points in Composite CSV Format Data Points in Composite TSV Format Data Points in Simple TSV Format Data Points in Coverage TLV Format