GeoViQua: the quality challenges for GEOSS YANG Xiaoyu, BLOWER Jon, CORNFORD Dan, LUSH Victoria, MASO Joan, ZABALA Alaitz, Nüst Daniel Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org QUAlity aware VIsualisation for the Global Earth Observation system of systems www.geoviqua.org The problem • Is there quality information in the GCI? – There is some in the form of ISO19115 DQ elements and lineage – Not enough • The GEOSS Common Infrastructure does not follow a global model for quality • The GEOPortal search and results – are not ranged by quality – quality indicators are not shown • Common data viewers do not generally include quality information in parallel with the data www.geoviqua.org The aim GeoViQua will provide a set of scientifically developed software components and services that facilitate the creation, search and visualization of quality information on EO data integrated and validated in the GEOSS Common Infrastructure. GEO S&T Label Community building Pilot case studies www.geoviqua.org Time table Requirements and Data Model phase finished, 1 2 3 4 5 6 7 8 Search & Visualization Workshops GeoLabel 10 11 Metadata extraction Quality elicitation Pilot cases 9 Start 12 13 14 15 16 17 1 8 Best practices quality encoding 19 20 21 22 23 25 Direct extraction from continuous variables 26 27 28 29 30 31 32 33 34 Extraction from categorical variables User feedback Validation Prototypes Mobile Solutions Data ready Quality recommendations Testing solutions User & technical requirements to CoP Proposals evaluation 24 User & technical solutions to CoP Final document www.geoviqua.org 35 36 Community Views on Data Quality • Many researchers refer to the ‘famous five’ as the common criteria for evaluating spatial data quality – lineage; completeness; consistency; positional accuracy; and attribute accuracy. • Broad scientific acceptance of the common spatial quality elements does not apply to all cases for “fitness-for-use” evaluation – user requirements can go far beyond the widely accepted ‘famous five’. • We used semi-structured telephone and face-to-face interviews with a variety of geospatial data users and experts from a number of countries and application domains. www.geoviqua.org What users want? • Users are exceedingly interested in good quality metadata records – And information that can help to assess fitness-for-use of the data • Users find metadata records typically incomplete with essential data omitted – The process of dataset discovery and selection is more difficult • Users are also interested in ‘soft’ knowledge about data quality – Data providers’ comments on the overall quality of a dataset, known data errors, potential data usage – Peers’ reviews and recommendations (they contact their peers to obtain suggestions) – Dataset provenance, citation and licensing information • Citation is incomplete (lack of valid producer contact details), and licensing often missing • Citation: users rely on data from good reputation producers • Currently, some of these cannot be recorded in standard metadata • Need for easily and systematically compare metadata records – Side-by-side visualisation of all metadata elements would allow geospatial datasets to be compared more effectively, • especially when datasets are very similar and differences are hard to distinguish www.geoviqua.org Producer’s-consumer’s quality • Producer’s quality metadata – – – – In the producers metadata records Encoded in the classical ISO 19115/19139 Some extensions required Stored in the current catalogues (GEOSS Clearinghouse, etc) • Consumer’s quality metadata – – – – In independent metadata repositories Linked to producer’s metadata by id Future component of the GCI? Contains comments, “like it”, star rates, etc www.geoviqua.org The ISO classical view Quality indicators Provenance/Lineage Usage www.geoviqua.org Add ‘soft’ knowledge to producer’s metadata Metadata Packages Dataset series 0..* Metadata User Feedback Discovered Issues Publication Lineage Quality Scope Dataset Subset of data Feature Type Data Quality Universe of Discourse •• Non-quantitative Quality Information Metaquality Quality Element Quality Parameter (ISO 19157) Completeness Positional Accuracy Temporal Accuracy Thematic Accuracy Quality Indicator (ISO 19157) Omission Missing Items Commission Number of Missing Items Quantitative attribute accuracy •• Logical Consistency Classification correctness Misclassification rate Usability Non-quantitative attribute correctness Misclassification matrix •• Quality Measure (ISO19157, UncertML) www.geoviqua.org Quality model is much more that positional accuracy • There are many quantifiable aspects that can be recorded – Consistency, completeness, positional, thematic and temporal accuracy… • There are many qualitative aspects that are needed – Lineage (traceability), scientific papers, user feedback, data usage… www.geoviqua.org GeoViQua Data model: statistical uncertainties <gmd:DQ_QuantitativeAttributeAccuracy> <gmd:DQ_QuantitativeAttributeAccuracy> <gmd:result> <gmd:result> <gmd:DQ_QuantitativeResult> <gmd:DQ_QuantitativeResult> <gmd:valueType> <gmd:valueUnit>m</gmd:valueUnit> <gco:RecordType xlink:href=“http://www.uncertml.org/distributions/normal”> <gmd:value> Value of the vertical DEM accuracy <gco:Record>3.6</gco:Record> </gco:RecordType> </gmd:value> </gmd:valueType> </gmd:DQ_QuantitativeResult> <gmd:valueUnit>m</gmd:valueUnit> </gmd:result> <gmd:value> Explicit recognition that errors </gmd:DQ_QuantitativeAttributeAccuracy> <gco:Record> acceptably fit a Normal distribution <un:NormalDistribution> with mean 1.2 <un:mean>1.2</un:mean> • An overall positive bias was <un:variance>3.6</un:variance> observed </un:NormalDistribution> </gco:Record> • A difficult feature to convey by </gmd:value> traditional means) </gmd:DQ_QuantitativeResult> </gmd:result> </gmd:DQ_QuantitativeAttributeAccuracy> www.geoviqua.org The need for a measure dictionary • Current quality measure names in the GCI – Nothing to do with ISO19138 list of possible measures – Not well defined Absolute external positional accuracy Anweisung Straßeninformationsbank (Bundes… Codelist omission completeness Feature represented as a single object horizontal Horizontal Positional Accuracy Lagegenauigkeit Latitude Resolution Longitude Resolution Mean value of positional uncertainties (2D) Overlapping polygon Quantitative Attribute Accuracy Assessment Rate of missing items Sach- und Geodatenüberprüfung Temporal Resolution Überprüfung der Toplogie Valid code Test Vertical Positional Accuracy Vertical Resolution vertikal Vollständigkeit www.geoviqua.org 2 1 2 198 2 3146 3265 3 3437 3350 3 2 255 87 7 2870 2 2 1826 812 348 4 Data Quality Measure Dictionary • Some quality indicators are used, but the name and description of the measure used to derive the indicator are rarely well described. • Problems can occur due to the lack of semantic definitions of quality measures. – “uncertainty at 90% significance level” ??. • A Quality Measure Dictionary is proposed that includes: – vocabularies for quality measures – associated semantic annotations – integrate UncertML concepts and vocabularies. • Composed on quality measures provided by – ISO138 ISO19157 – UncertML. Quality Measure ID (ID=“” Name=“”, Alias=“”) Description Definition Quality element Basic measure Value type UncertML Dictionary Value structure Parameter Example use Source reference UncertML representation (URI=“”) URI • Measure has a unique ID – quality element, value type, quality basic measure, description, example use, etc. • “uncertainty at 90% significance level” can be annotated using UncertML vocabulary “ConfidenceInterval”(URI: http://www.uncertml.org/statistics/confidence-interval) <un:ConfidenceInterval xmlns:un="http://www.uncertml.org/2.0"> <un:lower level="0.05"> <un:values>3.14</un:values> </un:lower> <un:upper level="0.95"> <un:values>6.28</un:values> </un:upper> </un:ConfidenceInterval> www.geoviqua.org Quality Metadata Levels Level: Multiseries Positional accuracy: 2.5 m Content date: 2009-2010 Level: theme=contour line Overwrite positional accuracy: 1.5 m Multiseries Level: sheet=73-30 Overwrite content date: October 2009 Series Sheet or Scene 777333--3-33000 Dataset (raster or feature instance) Level: dataset (theme=contour line, sheet=73-30) Positional accuracy: 1.5 m Content date: October 2009 www.geoviqua.org GEOSS common infrastructure GEOSS Common Infrastructure Main GEO Web Site Registered Community Resources Client Tier Registries GEO Web Portals Community Portals Client Applications Components & Services Standards and Interoperability Best Practices Wiki Business Process Tier GEOSS Clearinghouse User Requirements Community Catalogues Workflow Management Alert Servers Processing Servers Access Tier GEONETCast Product Access Servers Sensor Web Servers Model Access Servers www.geoviqua.org Before GEOSS Capacity Resource User SBA Business Process Tier Capacity Catalogues Disasters Health Energy Access Tier Climate Product Access Servers Water Weather Model Access Servers Ecosystems Agriculture Sensor Web Servers Biodiversity GEONETCast www.geoviqua.org How GEOSS worked yesterday Capacity Resource Business Process Tier User Components & Services Registry SBA Capacity Catalogues Access Tier Product Access Servers Model Access Servers Disasters Health GEOSS Clearinghouse Catalogue Energy Climate Water DB GEO Web Portal Ecosystems Agriculture Sensor Web Servers GEONETCast Weather Biodiversity GEOSS Common Infrastructure www.geoviqua.org How GEOSS is going to work Capacity Resource Business Community Process Tier Community Catalogue Community Catalogue Community Capacity Catalogue Catalogues Catalogue Access Tier Product Access Servers Model Access Servers User Components & Services Registry SBA Disasters GEOSS Clearinghouse Catalogue EuroGEOSS Broker Health Energy Climate Water DB GEO Web Portal Weather Ecosystems Agriculture Sensor Web Servers Biodiversity GEONETCast GEOSS Common Infrastructure www.geoviqua.org How GEOSS is going to work Capacity Resource Business Process Tier Community Catalogues Access Tier Product Access Servers Model Access Servers Components & Services Registry Community Community Catalogue Community Catalogue Capacity Catalogue Catalogue User SBA Disasters EuroGEOSS EuroGEOSS Broker Broker GEOSS Clearinghouse Catalogue Health Energy Climate Water DB GEO Web Portal Weather Ecosystems Agriculture Sensor Web Servers Biodiversity GEONETCast GEOSS Common Infrastructure www.geoviqua.org GeoViQua quality model EuroGEOSS Broker model GeoViQua Model Metadata Packages Dataset series 0..* Metadata Comments/ Peer Review Discovered Issues Publication Lineage Quality Scope Dataset Subset of data Feature types Data Quality Product Specification Rules Quality requirements Universe of Discourse (i.e. Reality) •• Non-quantitative Quality Information Metaquality Quality Element Quality Parameter (ISO 19113) Completeness Positional Accuracy Temporal Accuracy Thematic Accuracy Quality Indicator (ISO 19113) Omission Missing Items Commission Number of Missing Items Quantitative attribute accuracy •• Logical Consistency Classification correctness Misclassification rate Quality measure (ISO19114/ISO19138, UncertML) www.geoviqua.org Usability Non-quantitative attribute correctness Misclassification matrix •• Quality in GEOSS Capacity Resource Business Process Tier Components & Services Registry Community Community Catalogue Community Catalogue Capacity Catalogue Catalogue Product Access Servers Model Access Servers User SBA Capacity Catalogues Access Tier Enhanced geo-search tools Disasters GEOSS Clearinghouse Catalogue EuroGEOSS Broker Health Energy Climate Water DB GEO Web Portal Weather Ecosystems Agriculture Sensor Web Servers Biodiversity GEONETCast GEOSS Common Infrastructure www.geoviqua.org Including data quality in search • SELECT WHERE positional_accuracy < 20 and classification_correctness > 90% FROM GEOSS_GCI Devillers R, Bédard Y, R Jeansoulin (2005) Multidimensional Management of Geospatial Data Quality Information for its Dynamic Use Within GIS www.geoviqua.org Enhanced geo-search tools Consumer’s data quality • More informal • Based on social network patterns – – – – Comments Linked data Like it Star ratings • More dinàmic • Need for an encoding • Need for an independent repository www.geoviqua.org GEOSSBack http://www.ogc.uab.cat/GEOSSBack • Just a prototype to play with and demonstrate a concept. www.geoviqua.org Producer’s+consumer’s GeoViQua Broker cmp GeoViQua Components Agreed So Far CSW Clearinghoure WMS SOS-Q + SensorML Capacity Catalogues Sensor Registry Q SOS-Q + SensorML EuroGEOSS Discov er broker Q CSW CSW-Q GeoViQua Broker CSW WAF Metadata Import tool + + + HDF netCDF others... unknown FeedBack Serv er www.geoviqua.org Quality Metadata comparison www.geoviqua.org Conclusions • After user interviews • Producer’s quality model – – – – GeoViQua quality model is based in ISO With extensions for ‘soft’ knowledge Inclusions of uncertML Quality measure dictionary • Consumer’s quality model – Based on social network patterns – Encoded independently (from producers) • Linked by the GeoViQua broker (extension/complement of the EuroGEOSS broker) www.geoviqua.org GEOLabel • What is it? – The GEO Label is intended to “assist the user to assess the scientific relevance, quality, acceptance and societal needs of the components” (ST-09-02 Task Team, 2010). Task performed in be a quality indicatorcollaboration for GEOSS geospatialwith data and datasets EGIDA • Problem: Usability depends on data application; there is no defined threshold. FP7 anddatasets. the GEO improve user recognition andproject trust in validated • Problem: who is going to certify this? task ST-09-02 • Purposes? – – – assist in searching by providing users with visual clues of dataset quality and relevance. – provide accreditation, provenance, monitoring – increase visibility of EO data – Emphasize in open access and easy availability • Possible shape? – Certification label – A formal way to present • quality indicators • provenance • attribution www.geoviqua.org GEOLabel • Until the end of this week • Publicly available in the web • We encourage you to participate! www.geoviqua.org Please participate in the questionnaire: http://geolabel.questionpro.com just a couple of days left!! Thanks Joan.Maso@uab.cat (CREAF) Please participate in the questionnaire: http://geolabel.questionpro.com just a couple of days left!! Thanks Joan.Maso@uab.cat (CREAF) How GEOSS is going to work Capacity Resource Business Process Tier Components & Services Registry Community Community Catalogue Community Catalogue Copacity Catalogue Catalogue Product Access Servers Disasters GEOSS Clearinghouse Catalogue EuroGEOSS Broker Health Energy Climate Water DB GEO Web Portal Model Access Servers Sensor Web Servers User SBA Capacity Catalogues Access Tier Quality aware visualisation tools Quality Access Broker GEONETCast GEOSS Common Infrastructure www.geoviqua.org Weather Ecosystems Agriculture Biodiversity Quality map visualization Quality aware visualisation tools Express data quality using maps Blackmond Laskey K, EJ. Wright PCG da Costa (2009) Envisioning uncertainty in geospatial information Devillers R, Bédard Y, R Jeansoulin (2005) Multidimensional Management of Geospatial Data Quality Information for its Dynamic Use Within GIS • Dark color represents poor quality and light color good quality www.geoviqua.org Quality map visualization Quality aware visualisation tools • 3D representations – representation of estimated water balance surplus/deficit and their uncertainty (using bars above and below the surface). • Map representations have some problems – Makes visualization more complicated and difficult to understand – Attracting the attention to the more uncertain objects!! Pang A (2001) Visualizing Uncertainty in Geo-spatial Data MacEachren AM, A Robinson, S Hopper, S Gardner, R Murray, M Gahegan, E Hetzler (2005) Visualizing Geospatial Information Uncertainty; What We Know and What We Need to Know www.geoviqua.org Pilot Case scenarios Agriculture Global Carbon Air Quality Based on many user stories among GEOSS SBA www.geoviqua.org Please participate in the questionnaire: http://geolabel.questionpro.com just a couple of days left!! Thanks Joan.Maso@uab.cat (CREAF)