GeoSciML: Enabling the Exchange of Geological Map Data Bruce Simons Eric Boisvert Boyan Brodaric Simon Cox GeoScience Victoria GPO Box 4440 Melbourne, Victoria, 3001, Australia Bruce.Simons@dpi.vic.gov.au Geological Survey of Canada 490 rue de la Couronne, Québec, G1K 9A9 Canada eboisver@nrcan.gc.ca Geological Survey of Canada 615 Booth St, Ottawa, Ontario, K1A0E9 Canada brodaric@nrcan.gc.ca CSIRO Exploration & Mining ARRC, PO Box 1130, Bentley WA, 6102, Australia Simon.Cox@csiro.au Tim R. Duffy Bruce R. Johnson John L. Laxton Steve Richard British Geological Survey Murchison House, W Mains Rd Edinburgh, EH9 3LA, UK trd@bgs.ac.uk U.S. Geological Survey 954 National Center Reston, VA 20192, U.S.A. bjohnson@usgs.gov British Geological Survey Murchison House, W Mains Rd Edinburgh, EH9 3LA, UK jll@bgs.ac.uk Geological Survey of Arizona 416 W. Congress St., #100 Tucson, Arizona, 85701, USA steve.richard@azgs.az.gov SUMMARY The CGI data model working group have established an initial geology data model and XML based exchange language to accommodate geological map data, referred to as GeoSciML. The language is based on prior work carried out at North American, European and Australian geological survey and research organisations. Unified Modelling Language (UML) has been used as a design aid for capturing the geological concepts and their properties. The UML model has then been converted to the GML-conformant GeoSciML. The design of GeoSCiML meets the short-term goal of accommodating the geoscience information presented on geological maps, as well as being fully extensible to include the full range of geological concepts covered by the geosciences. To demonstrate the ability of GeoSciML to deliver data via web feature services, a small subset has been selected as a testbed. This testbed will deliver lithostratigraphic units, boreholes, faults, contacts and compound materials from different national geological surveys. Key words: data exchange, geology data modelling, GeoSciML, GML, Web Feature Services, XML. INTRODUCTION The exchange of geoscientific information has traditionally been through hard copy media such as geological maps, reports and papers. The content and style of geological maps has often been left to the authors' or organisations' individual preferences or standards, with the success of the information transfer dependent on the skills of the user to 'interpret' the intent of the mapmaker. With the development of web-based data access interfaces, and increased requirements for machine-based data exchange by geoscientific agencies, the ability to interpret meaning is lost. Although standardised formats for geoscience data have long been seen as a desirable goal, the need for common models and encodings has assumed a greater practical significance for this machine-based data exchange. The IUGS Commission for the Management and Application of Geoscience Information (CGI) established an initiative to AESC2006, Melbourne, Australia. develop a harmonised geoscience data model and exchange format based on GML (Geography Mark-up Language), referred to as GeoSciML (GeoScience Mark-up Language). These developments incorporate several novel aspects in design of the data model and transfer format as well as the technical procedures involved in their creation. Predecessor projects have strongly influenced the development of GeoSciML. These include multijurisdictional activities by the North American Geologic Map Data Model Steering Committee (2004) and Australian Government Geologists Information Policy Advisory Committee (2004), the CSIRO led eXploration and Mining Mark-up Language (XMML) work (Cox, 2004) and individual agency work at the British Geological Survey (Sen and Duffy, 2005), GeoScience Victoria (Simons et al, 2005), and the BRGM. Growing awareness of the overlap between these projects and the desire to minimise duplication led to agreement on the formation of a working group to move forward collaboratively on the development of a data model and transfer format under the auspices of the CGI. GeoSciML accommodates the short-term goal of representing geoscience information associated with geological maps and observations, as well as being extensible in the long-term to other geoscience data. It is unique in its breadth of inputs and content as it draws from many national geoscience data model efforts. From these it establishes a common suite of feature types based on geological criteria (geological units, geological structures, fossils, geological relationships, earth materials, geological fabrics) or artefacts of geological investigations (specimens, sections, observations, measurements). Supporting objects, such as timescales and lexicons, are also included so that they can be used as classifiers for the primary objects. The demonstration of the delivery of a subset of GeoSciML via Web Mapping Services (WMS) and Web Feature Services (WFS) has been undertaken by the CGI working group. This 'testbed' delivers an extensive suite of property information for lithostratigraphic and lithodemic geological units, faults, geological contacts and boreholes. METHOD AND RESULTS The specific objectives of the CGI working group are to: develop a conceptual model of geoscientific information drawing on existing data models; 1 GeoSciML Simons BA, Boisvert E, Brodaric B, Cox S, Duffy TR, Johnson BR, Laxton JL and Richard S implement an agreed subset of this model in an agreed schema language; implement an XML/GML encoding of the model subset; develop a testbed to illustrate the potential of the data model for interchange; identify areas that require standardised classifications in order to enable interchange; Standards In order to benefit from emerging geospatial web-service standards the focus is on GML-based XML data encodings for the transfer format. The modelling framework used for GeoSciML is based primarily on the Rules for Application Schema from ISO/TC 211 (ISO 19109:2004). The rules assert that geospatial information languages should be developed and governed within domain-specific communities. They also specify the term "feature" for a real-world object of interest. Features are classified into types on the basis of a characteristic set of properties. For example a "GeologicUnit" is a feature type that has a rank, composition, morphology, outcrop character, colour, etc. base map depiction and extents to draw a geologic map visualisation. 6. GeologicObject represents those geological concepts that can be described in their own right and may be properties of other geological concepts, but are not mappable features. Rocks (which are considered as types of CompoundMaterials) and fossils are two such objects. A sample of the associations, and their roles, that exist between various classes are shown in the summary diagram (Figure 1) to illustrate the way that the model works. For example a Fault can be made up of FaultSurfaces (role = faultSurface) that may have one or more Displacement attributes. Complementing the ISO standards, GML has been developed as an XML encoding for geographic information. GML directly provides few concrete feature types, as these are intended to be created using the standard components in a domain-specific "GML Application Schema". GeoSciML is an example of one of these schema. Cox et al. (2004) established rules for converting models expressed in UML to GML-conformant XML. These rules are highly significant since they allow GML-compatible model development to take place in the intuitive graphical UML environment. They also ensure consistency of XML schema derived from the UML. The GeoSciML Model Figure 1 shows a UML diagram of a summary version of GeoSciML to illustrate the framework within which the data model is being developed. Due to the complexity of the model we have not shown all the classes. The attributes of the various classes are also not shown. Four top-level GML classes are used as starting points: 1. Abstract Feature is the root of all classes representing real-world objects. These include GeologicFeatures, representing geological concepts (GeologicStructures, GeologicUnits), as well as artefacts of the evidence collection process (Site, Observation etc) and artefacts of the geologic record (MappedFeatures). 2. Abstract Geometry is the GML object that describes the geometry of the features (eg. point, line, polygon). 3. Metadata is the root of all classes dealing with metadata, including dataset metadata, mapped feature metadata and geological feature metadata. 4. Definition is the root of classes representing the reference systems, controlled vocabularies and dictionaries that constrain the values of the class properties. Two additional classes specific to GeoSciML inherit directly from the top level AbstractGML class: 5. GeologicPortrayal stores the model elements used to represent the selection and symbolization of MappedFeature AbstractGeometry instances, along with AESC2006, Melbourne, Australia. Figure 2. UML diagram of GeoSciML GeologicUnit and EarthMaterial classes, showing inheritance, associations and roles, and class attributes, with data types and cardinality. The properties of geological units are shown in Figure 2. These are the subset of properties that have been chosen for testbed purposes. Like all classes in GeoSciML, geological units (GeologicUnit) inherit a name, description and GML id from AbstractGML. They also inherit an age and purpose from GeologicFeature, in addition to the GeologicUnit attributes bodyMorphology, outcropCharacter, genesis and exposureColour. Only lithostratigraphic and lithodemic units (types of LithologicUnits) are being considered as part of testbed. Lithologic units have the additional attributes of rank, composition, weatheringCharacter, the presence of structures and metamorphicGrade. Lithostratigraphic units include additional attributes to account for specific bedding 2 GeoSciML Simons BA, Boisvert E, Brodaric B, Cox S, Duffy TR, Johnson BR, Laxton JL and Richard S related properties. Other types of GeologicUnits (such as Chronostratigraphic, Geomorphologic, Pedostratigraphic, Lithotectonic) are accommodated by the model. The GeologicUnitPart class allows GeologicUnits to be made up of other GeologicUnits (for example 'Formations' with child 'Members'). The CompositionPart class allows GeologicUnits to be composed of CompoundMaterials, which in the model represent either rocks or unconsolidated material. GeoSciML also includes an extensive suite of data types defined to accommodate the wide range of values that are assigned to geological observations. These are designed to accept single (eg fine grained) or range (eg fine grained to medium grained) values from controlled vocabularies, single or range numeric measurements with error values and units of measures, or combinations of these. seek broader geological community support for GeoSciML as the standard geological map data exchange language. Controlled Vocabularies GeoSciML is designed to provide the mechanism that will enable delivery of the geological and geographic information associated with geological maps. A current gap in GeoSciML is the lack of an agreed set of vocabularies for the data content, limiting the ability to deliver standard data content. Therefore, at present, GeoSciML provides a standard schema but does not define standard content within the schema. Future work involves developing agreed standards on data content, that is controlled vocabularies, thesaurus and dictionaries, by the international geological community. The CGI are establishing a separate collaborative effort to meet this need. Testbed 2 CONCLUSIONS The working group has established a testbed to demonstrate the delivery of geological map data via the web using Web Mapping Services (WMS), Web Feature Services (WFS) and GeoSciML. This follows on from demonstrations using XMML to exchange borehole data between the British and French geological surveys (see https://www.seegrid.csiro.au/ twiki/bin/view/CGIModel/TestBed#CGI_Interoperability_Test bed_1) and the SEEGrid geochemistry demonstrator (Cox et al., 2005). The aim of Testbed 2 is to evaluate the ability of GeoSciML to deliver the rich and complex data used to generate geological maps from a variety of geological organisations via WMS and WFS. The geological surveys of Canada, USA, UK, Sweden, France and Arizona along with Geoscience Australia, GeoScience Victoria and CSIRO are participating in Testbed 2. Testbed 2 aims to deliver 4 use cases using GeoSciML: Use Case 1: Client asks for a map showing geological units, faults, contacts and/or boreholes on a browser. Server returns a map with default symbolisation. User can click on any graphic feature from one layer to retrieve at least an HTML presentation of the attributes of that feature which is consistent with the CGI model. (Client can request other formats than HTML if server supports them.). Use Case 2: Select mapped features by specifying a geographic bounding box and download the most specific information available for each mapped feature as GeoSciML GeoSciML accommodates the short-term goal of representing geoscience information associated with geological maps and observations, as well as being extensible in the long-term to other geoscience data. It is unique in both the breadth of its inputs and content. This has been achieved by drawing on many local, national and international geoscience data model efforts, in conjunction with the work on international data exchange standards. The working group has established a testbed to demonstrate the delivery of geological map data from a variety of national and state geological surveys using Web Mapping Services, Web Feature Services and GeoSciML. The success of this demonstrator will determine the future for XML-based geological data exchange languages. REFERENCES Cox, S.J.D., 2004, XMML – a standards conformant XML language for transfer of exploration data: Proceedings, ASEG/PESA Geophysical Conference and Exhibition, Sydney 2004 Cox, S.J.D., Daisey, P.W., Lake, R., Portele, C. and Whiteside, A., 2004, Geography Markup Language (GML) 3.1.0, OpenGIS® Recommendation Paper, OGC document 03-105rl, xxi+ 580. Cox, S.J.D., Dent, A., Esterle, J., Woodcock, R., Girvan, S., Mackey, T., Wyborn, L., Bandy, S., Ward, B., Hannant, T., Jenkins, G., Jolly, M., Atkinson, R. and Barrs, P., 2005 Standardized Web-access to Geoscience Datasets: the SEEGrid WFS Testbed. Proceedings of IAMG'05: GIS and Spatial Analysis, Vol.2, 844-849. Use Case 3: The user chooses to display mapped features representing geologic units, symbolized on the basis of age using the IUGS standard geologic age colour scheme, or on the basis of lithology using a CGI defined lithology colour scheme. Use Case 4 (optional): Select a subset of geologic unit mapped features on the basis of age or lithology and highlight them with the same highlight colour. Government Geologists Information Policy Advisory Committee, 2004, National Geological Data Model Version 1.0 Explanatory Notes. http://www.geoscience.gov.au/geoportal/standards.html Testbed 2 aims to deliver these four use-cases using geological data from a variety of organisations and map-scales as a demonstration by September 2006. The testbed results will be formally presented to the CGI during the IAMG conference in September 2006 with the intention to showcase the results to North American Geologic Map Data Model Steering Committee, 2004: NADM Conceptual Model 1.0—A conceptual model for geologic map information: U.S. Geological Survey Open-File Report 2004-1334, 58 p., accessed online at URL http://pubs.usgs.gov/of/2004/1334. AESC2006, Melbourne, Australia. 3 GeoSciML Simons BA, Boisvert E, Brodaric B, Cox S, Duffy TR, Johnson BR, Laxton JL and Richard S Also published as Geological Survey of Canada Open File 4737, 1 CD-ROM. Sen, M. and Duffy, T., 2005 GeoSciML: Development of a generic GeoScience Markup Language. Computers & Geosciences, 31, 1095–1103. Simons, B., Ritchie, A., Bibby, L., Callaway, G., Welch , S., and Miller, B., 2005, Designing and Building an Object– Relational Geoscientific Database using the North American Conceptual Geology Map Data Model (NADM-C1) from an Australian Perspective. Proceedings of IAMG'05: GIS and Spatial Analysis, Vol.2, 929–934. Figure 1. UML diagram showing the primary hierarchy of a selection of GeoSciML classes and relationship to base classes provided by GML, and a selection of associations between classes. AESC2006, Melbourne, Australia. 4