Experiences of UML-to-GML Encoding Roy Grønmo, Ida Solheim, David Skogan SINTEF Telecom and Informatics Forskningsveien 1, Pb 124 Blindern, N-0314 Oslo, Norway {roy.gronmo | ida.solheim | david.skogan}@informatics.sintef.no Abstract. This paper presents experiences gained from the development of an automatic conversion from GI application schema to an XML exchange format. The application schema is expressed in the Unified Modelling Language (UML), and the chosen exchange format is the Geographic Markup Language (GML) specified by the Open GIS Consortium (OGC). A set of conversion rules have been identified and implemented in a tool that reads UML class diagrams and writes corresponding GML code. A comprehensive cadastre model has constituted the test case. The work has been performed as part of the national GeNorway project. 1 1.1 Introduction Two Prevailing Encoding Approaches for GI In 2001, a controversial issue has been dominating the relationship between two standardisation bodies for geographic information – Open GIS Consortium (OGC) and ISO/TC 211 (ISO) . The disagreement can be summed up in the following double question: How should geographic information be encoded, and what should the exchange format look like? According to ISO, the data provider and the data receiver are supposed to agree on a so-called application schema. An application schema is typically UML class diagrams expressing the structure and content of the data to be exchanged. The standard ISO 19109 Rules for application schema (ISO, 2001b) prescribes how to make an application schema in UML. The standard ISO 19118 Encoding (ISO, 2001c) prescribes conversion rules for the translation from an application schema in UML into a corresponding XML Schema (W3C, 2002). Thereby ISO has created an XML format for encoding of geographic information. On the other hand, OGC has developed another XML format for GI encoding, called Geographic Markup Language (GML) (OGC, 2001). GML plays a central role in OGC’s successful Web Mapping testbeds and Web Services specifications, and is being implemented in GI systems in several countries. GML is currently a competitor to ISO’s XML format. 1.2 The GeNorway Project SINTEF has been involved in specifying ISO standards as well as implementing UML model-based tools in GI projects such as DISGIS (Grønmo et al., 2000) and JNIP (Grønmo and Skogan, 2001). These efforts are continued within the project GeNorway – Model-based infrastructure for living geospatial data in eNorway. GeNorway is an ongoing two-year project funded by the Norwegian Research Council and with the GIS vendor Norkart as project owner. The project will test the practicability of selected standards from ISO/TC 211 in an implementation of a Web Feature Server (WFS) according to OGC’s specification. Therefore, GeNorway has to develop solutions conforming to both ISO standards and OpenGIS specifications. Approach and preliminary results were presented at the ACM GIS 2001 in Atlanta (Grønmo, 2001). A WFS is a Web service with predefined XML requests and responses. GML shall be used within WFS to represent the geographic models and instances to be communicated between a client and a server. Important questions in GeNorway have therefore been: • Can GML be generated automatically from ISO-conform UML models? • If so, does the generated GML code prove to be as expressive, compact and readable as hand-coded GML? • If not, how and why does automatic GML generation fail? Since neither UML nor GML is designed to match the other, UML-to-GML encoding is not trivial. This paper will start the discussion by identifying some design criteria for UML-to-GML encoding. 2 Design Criteria Encoding a UML class diagram into an XML Schema can be done in a number of different ways. This topic is well covered by David Carlson (Carlson, 2001), who points out that the encoding strategy will vary depending on the problem scenario. The GeNorway project has chosen a set of general design criteria for UML-to-GML encoding. These design criteria will be the basis for evaluation of the UML-to-GML conversion rules presented in the next section. For convenience, the term UML model is used for a UML class diagram, the term GML is used for GML 2.0, and the term ISO is used for ISO/TC 211. The design criteria are: 1. The UML models shall fulfil the rules specified by ISO 19103 Conceptual schema language (ISO, 2001a) and ISO 19109 Rules for application schema (ISO, 2001b). 2. 3. 4. 5. The UML models shall be conceptual and neutral to implementation choices. The UML models shall not be modified to “fit” GML requirements. That means, the UML modeller shall not need to know anything about GML. The generated GML schema shall be fully determined by the UML model. This eliminates the possibility of user configuration. The major advantage is that agreement on a UML application schema implies agreement on the to-begenerated GML application schema. A positive side effect is that the corresponding code generation tool will be easier to implement. The generated GML schemas should exploit the constructs provided by XML Schema (inheritance, data types, xlinks, facets etc). This will make the GML schemas easy to read and understand. The generated GML schemas and corresponding GML documents shall fulfil the GML specification. Design criterion 1 is specific to ISO UML models. Criteria 2, 3 and 4 are all general UML-to-XML design criteria. Criterion 5 is specific to GML. 3 UML-to-GML Conversion Rules A conversion rule transforms schema constructs of one schema language into schema constructs of another schema language. The UML-to-GML conversion rules must support both general UML constructs and ISO-specific constructs. Some of these constructs need nothing but trivial conversions, while others require non-trivial conversions. Road classification : CharacterString number : CharacterString linearGeometry : GM_Curve <complexType name="RoadType"> <complexContent> <extension base="gml:AbstractFeatureType"> <sequence> Conversion <element name="classification" type="string"/> <element name="number" type="string"/> <element name="linearGeometry” type="gml:LineStringPropertyType"/> </sequence> </extension> </complexContent> </complexType> Figure 1: Converting an ISO-conform UML class Road to a GML Schema RoadType Figure 1 illustrates a trivial conversion from the UML constructs class, attribute and attribute type. A UML class Road is converted to an XML Schema complexType. The attributes within the UML class are converted to XML Schema elements within the corresponding complexType. The attribute types are converted as shown in Table 1. This table shows examples of trivial conversions of ISO-specific constructs: • • from ISO basic types (given in ISO 19103 Conceptual schema language) to XML Schema basic types, and from ISO geometry types (given in ISO 19107 Spatial Schema) to GML geometry types. Table 1: Left part: ISO basic types converted to XML Right part: ISO Spatial Schema types converted to GML Basic Type according to ISO 19103 CharacterString Integer Date Boolean Real XML Schema Basic Type Spatial Type according to ISO 19107 GML 2.0 Type string integer date boolean decimal GM_Point GM_Curve GM_CompositeSurface PointPropertyType LineStringPropertyType PolygonPropertyType Table 1 and 2 present the complete set of conversion rules. Table 1 presents only trivial conversion rules. Table 2 presents some trivial and some non-trivial conversion rules. Non-trivial conversions are necessary for encoding inheritance, associations and order of attributes and associations. The non-trivial conversions are discussed further in the next section. Table 2: UML constructs converted to GML 2.0 UML Construct Package Class Conversion to GML 2.0 Packages are ignored. Classes with stereotype Enumeration or CodeList are converted to: <simpleType name="UMLCLASSNAMEType"> <restriction base="string"> Classes that inherit from a superclass are converted to a complexType and an element declaration: <complexType name="UMLCLASSNAMEType"> <complexContent> <extension base="UMLSUPERCLASSNAMEType"> ... <attributeGroup ref="gml:AssociationAttributeGroup"/> ... <element name="UMLCLASSNAME" type="UMLCLASSNAMEType" substitutionGroup="UMLSUPERCLASSNAME"/> Classes that do not inherit from other classes and have at least one navigable association to another class, are converted to a GML collection type and an element declaration: <complexType name=" UMLCLASSNAMEType "> <complexContent> <extension base="gml:AbstractFeatureCollectionBaseType"> ... <element name="UMLCLASSNAME" type="UMLCLASSNAMEType" substitutionGroup="gml:_FeatureCollection"/> All other classes are converted to a GML feature type and an element declaration: <complexType name=" UMLCLASSNAMEType "> <complexContent> <extension base="gml:AbstractFeatureType"> ... <element name="UMLCLASSNAME" type="UMLCLASSNAMEType" substitutionGroup="gml:_Feature"/> Attribute Attribute type All classes that are abstract are converted to an abstract XML element type. Attributes within classes of stereotype Enumeration or CodeList are converted to <enumeration value="UMLATTRIBUTENAME"/> within the <restriction base="string"> element within the <simpleType> of the corresponding class. Attributes within all other classes are converted to <element name="UMLATTRIBUTENAME" within the <sequence> of the <complexType> of the corresponding class. Attribute types are ignorede for attributes within classes stereotyped as Enumeration or Codelist. Attribute types that are identified as ISO/TC 211 basic types or ISO/TC 211 spatial types are converted according to Table 1. All other types are assumed to be user-defined types within the UML model as class names. The converted type will be UMLATTRIBUTETYPEType. (If these types are not defined as class names within the UML model, the GML Schema will not be a legal XML Schema.) Association The resulting type, CONVERTED_UMLATTRTYPE, is inserted as the value of the type attribute within the <element> of the corresponding attribute (<element name="UMLATTRIBUTENAME" type="CONVERTED_UMLATTRIBUTETYPE)" Composition, aggregation and association are treated the same way. Navigable UML class associations are converted to explicit GML featureAssociation types and an element declaration (Two-way associations result in two type and two element declarations): <complexType name="UMLCLASSNAME.ROLENAMEATTHEOTHERCLASSType"> <complexContent> <restriction base="gml:FeatureAssociationType"> <sequence minOccurs="0"> <element ref="UMLOTHERCLASSNAME"/> </sequence> <attributeGroup ref="gml:AssociationAttributeGroup"/> ... <element name="UMLCLASSNAME.ROLENAMEATTHEOTHERCLASS" type="UMLCLASSNAME.ROLENAMEATTHEOTHERCLASSType" substitutionGroup="gml:featureMember"/> This explicit GML featureAssociation type will be part of the sequence of the UML class that has the navigable association: <complexType name=" UMLCLASSNAMEType"> <sequence> <element ref="UMLCLASSNAME.ROLENAMEATTHEOTHERCLASS "> Cardinality Inheritance Other UML constructs (including operations) The navigable UML class association is depending on an explicit role name on the “visible” side of the association (Figure 2) Attribute and association cardinalities are converted to values of the minOccurs and maxOccurs attributes within the corresponding <element>. Integer values are converted to themselves, * is converted to the xml value: unbounded. UML class inheritance is converted to XML element type inheritance by <extension> elements: <complexType name="UMLSUBCLASSNAMEType"> <complexContent> <extension base=" UMLSUPERCLASSNAMEType"> Multiple inheritance is not supported. Ignored. + vi si bl eRol eSi de Country Ci ty 0..* +visibleRoleSide Shop 1 0..* Items +visibleRoleSide Figure 2: A navigable UML class association must have a role name on the "visible" side 4 Problems Encountered The UML-to-GML conversion rules have been implemented in a code generation tool. This tool has been tested successfully with a comprehensive cadastre model spanning about a hundred classes. The cadastre model has been made conformant to the requirements of ISO 19103 Conceptual schema language and ISO 19109 Rules for application schema. The model makes extensive use of UML inheritance, associations and attributes, and should thus supply a good test case. When applying the conversion rules to this model, we were able to satisfy the design criteria to a large extent. However, some problems were encountered, of which the most important are discussed in the following subsections. 4.1 Inheritance in GML It is incompatible to use the two GML base types feature and featureCollection with a general inheritance hierarchy. This problem is explained by looking at a concrete example shown in Figure 3. The example defines an inheritance structure in ISO UML. The question is how to compose an inheritance structure in the GML Schema. RealEstate owner : String Farm has Anim als : Boolean + far mFiel d Building num berOfFloors : Integer 0..* Field vegetation : String Figure 3: How to determine proper GML base types for the GML types corresponding to this inheritance hierarchy? According to GML’s rules, all the types containing other types shall inherit from featureCollection. All other types shall inherit from feature. Thus GML’s rules dictate that Farm shall inherit from featureCollection, while RealEstate, Building and Field shall inherit from feature. On the other hand, design criterion 4 requires that an XML Schema must maintain the same inheritance structure as within the UML model, instead of copy-down of attributes from supertypes. This fact implies that Farm and Building must inherit from RealEstate, while RealEstate (and Field) may be assigned a proper GML predefined supertype. The chosen GML supertype for RealEstate will indirectly be the supertype also for all the subtypes Farm and Building. Generally speaking a GML supertype can only be chosen for the root type in any inheritance hierarchy within the application schema. GML’s rules prescribe one XML Schema inheritance structure, and design criterion 4 prescribes another XML Schema inheritance structure. The example above shows that they are in conflict with each other. 4.2 Multiple Inheritance in ISO UML ISO UML allows multiple inheritance, whereas XML Schema does not. This implies that UML attributes must be copied down to all subclasses. Such copying violates design criterion 4 by not exploiting the XML Schema inheritance construct and thus making the GML less readable. If the conversion rules do not support multiple inheritance within UML, they violate design criterion 1. 4.3 Ordered subelements in GML There is no way to specify the order of UML attributes and associations, whereas the corresponding subelements within a GML schema will have a specified order. Thus, UML-to-GML encoding will supply subelements in an unpredictable order. This fact may reduce the readability of the generated GML (design criterion 4) and cause problems when regenerating GML from UML. 4.4 Global Declarations in GML GML states that all element and type declarations must be defined globally within each application schema. The UML associations are converted to GML featureMember types and elements with names corresponding to the association role name. ISO UML prescribes that a UML association role name be unique within each class, not necessarily within the application schema. 4.5 Modelling of Value Domain Restrictions in ISO UML ISO UML has no guidelines on how to model value domain restrictions, while XML Schema has built-in support for facets. Facets are a powerful tool to restrict the value domain of simple XML elements. Examples of such value domain restrictions may be that a string type must have a length of eight characters, a string type must start with an alphanumeric character, and the legal integer values are only the even numbers. UML itself does not provide any construct that corresponds to facets, and ISO UML has not defined how UML extension mechanisms can be used to support this. This is a violation of design criterion 4 of exploiting the constructs available in XML Schema. 4.6 Complicated Definition of Associations in GML Associations are modelled by GML with the use of explicit featureMember elements (“feature-property” model). These featureMember elements will then contain or refer to the elements that participate in the association. Carlson (2001) and ISO 19118 define associations directly by contained subelements. This makes a simpler encoding from UML, and the GML becomes more readable. Defining explicit featureMember elements violates design criterion 4 by making the XML Schema unnecessarily complex. 5 Proposed changes to ISO UML and to GML Changes to GML and ISO UML can solve almost all of the problems listed in the previous section. This paper proposes the following changes to ISO UML: • • Exclude multiple inheritance. This change will solve problem 4.2. Multiple inheritance has often been a major source of complexity and errors (Shan et al., 1993, Swaine, 1989, Madsen, 1995). This is why languages such as XML and Java has chosen to not provide unrestricted multiple inheritance. ISO 19103 (ISO, 2001a) states that “Multiple inheritance shall be used at a minimum, because it tends to increase model complexity.” Define a way to express value domain restrictions corresponding to XML Schema facets. This will solve problem 4.5. A possible approach may be use of the Object Constraint Language (OCL) (Warmer and Kleppe, 1999). The proposed changes to GML are: • • • Allow prefixing of associations by the corresponding ISO UML class name. This change will solve problem 4.3. Remove the featureCollection base type. Use the feature base type for all previously defined featureCollections. A feature containing subelements is implicitly a featureCollection, and there is no need to state this explicitly. This change will solve problem 4.1. The removal of featureCollection is no loss because: A featureCollection contains a set of general subelements, which is the only difference from a feature type. The possibility to contain subelements will be lost if we remove featureCollection. But the containment has to be refined anyway to ensure that only the “correct” subelements are contained. Once this containment is refined, the general containment relation adds nothing. Remove the “feature-property” model. Compensate by letting the subelements appear directly as part elements. This change will solve problem 4.6. The change will make the GML encoding agree with Carlson (2001). The remaining problem to solve is the order of attributes (4.3). Design criterion 2 asserts that the UML models shall be conceptual and neutral to implementation choices. Hence, the attribute order is irrelevant in UML, but it is relevant in XML Schema. Regeneration of an XML Schema may therefore reorder the attributes and thereby cause failure in data transfer. The problem may be overcome by some extension to the code generation tool, either user intervention or automatic interpretation of a “master” XML Schema containing the wanted attribute order. 6 Related Work Two other recent works addressing the topic of UML-to-GML encoding are discussed below. Patterns in GML (Galdos, 2002) is a draft submitted to OGC by Galdos Systems Inc. This company has been central in defining GML. The draft describes the intentions and models of GML and presents encoding rules from UML to GML. However, there are no clear guidelines for modelling UML application schemas, and there is no relation to ISO UML modelling guidelines. (The latter fact violates our design criterion 1 of compliance with ISO standards.) The UML modelling presented by Galdos (op.cit.) is characterised as follows: • • It introduces UML extensions for XML Schema constructions such as complexType, simpleType and restriction. It requires that feature types inherit from the GML predefined base types featureCollection and feature, and that this inheritance structure is modelled explicitly. The above items make it obvious that UML is used in an implementationdependent way, and that a UML modeller must have good knowledge of GML (violation of our design criterion 2). UML Model and Encoding Rules of GML2 (Portele, 2002) is a discussion paper (draft) submitted to OGC. This paper is intended to be input to the process of the ISO New Work Item Proposal for GML (ISO, 2002b) where Portele is the appointed leader. The paper shows how application schemas can be modelled according to ISO UML. This part of the work coincides to some degree with the work presented in this paper. Portele (op.cit.) introduces UML classes that correspond to the GML elements feature and featureCollection. This fact makes the UML model less accessible to nonGML-experts and ties it to GML implementation (violation of our design criterion 2). Neither of these two drafts is capable of separating the UML models from their implementations. The principle of implementation-neutral UML models has a considerable advantage when it comes to generating code to different implementations. GML may be today’s choice, but tomorrow other needs may arise, such as Java generation, CORBA IDL generation, service interface generation, and so on. GML-oriented UML models not only confuse people that are not GML experts, but also prevent usage of the same UML models for different purposes. 7 Conclusions and Future Work In the GeNorway project we have elaborated a set of conversion rules and developed an automatic tool for translating UML class models to GML Schemas. The findings of this work can be summarised as follows: • • • We have shown that is possible to generate GML from conceptual and implementation-neutral UML models that comply with the modelling guildelines of ISO/TC 211. Thereby, we have applied the model-based approach of ISO 19118 Encoding and verified its practicability in the OGC/GML world. Our design criteria have to a large extent been satisfied. However, the work has disclosed some disadvantages of both ISO UML and GML when it comes to model conversion. Most of these can be eliminated by reasonable changes to ISO UML and GML. A valuable side effect of the UML-to-GML encoding tool is its inherent and automatic quality control of UML models. Thus, our test case consisting of a large and comprehensive cadastre model has been checked, corrected and refined as part of the encoding experiment It is important to point out that conceptual, implementation-neutral modelling does not require any knowledge of GML (or other specific implementation technologies). Consequently, when a new version of GML (or whatever implementation platform or programming language) is adopted, there is no need to rewrite the model, only change the appropriate conversion rules. It is worth checking if the findings of GeNorway can be related to work performed by the Object Management Group (OMG). There are two topics in OMG that are particularly relevant: (1) OMG has launched the idea of a model-driven architecture (MDA) (OMG, 2002a), which to a large extent coincides with ISO 19118 Encoding. OMG’s MDA has attracted attention and aroused interest around the world and also among OGC members. (2) XML Metadata Interchange (XMI) (OMG, 1999) is a cross-domain OMG standard for conversion between UML and XML. The question of using XMI for GI encoding deserves further investigation. The model-based approach used by the GeNorway project is applicable to more than data exchange formats. Also services can be defined in the same way. The authors of this paper believe that Web services, e.g. OGC’s WFS and WMS, can have their interfaces generated automatically from UML models. UML interface models, or operations in UML class diagrams, will probably serve this purpose. UML-specified interfaces should be translatable (“encodable”) into e.g. WFS specifications, or into corresponding interface specifications written in e.g. the Web Services Description Language (WSDL) (Ariba et al., 2001). This topic encourages further work. 8 References Ariba, IBM and Microsoft (2001), Web Services Description Language (WSDL) 1.1, W3C Note: www.w3.org/TR/wsdl Carlson, D. (2001), Modeling XML Applications with UML - Practical e-Business Applications, Addison Wesley. Galdos (2002), Patterns in GML - Draft - www.galdosinc.com, 9th of January, 2002. Grønmo, R. (2001), Supporting GI standards with a model-driven architecture. ACM GIS 2001, Atlanta, USA. 2001. Grønmo, R., Berre, A.-J., Solheim, I., Hoff, H. and Lantz, K. (2000), DISGIS: An Interoperability Framework for GIS - Using the ISO/TC 211 Model-based Approach. Global Spatial Data Infrastructure (GSDI) 4, Cape Town, South Africa. 2000. Grønmo, R. and Skogan, D. (2001), SINTEF Report: Joint Nordic test case using ISO/TC 211 standards, STF40 A01010. 2001. ISO (2001a), Draft Technical Specification 19103, Geographic information Conceptual schema language, ISO/TC 211 N 1082. 12th of July, 2001a. ISO (2001b), Final text of CD 19109, Geographic information - Rules for application schema, ISO/TC 211 N 1127. 19th of July, 2001b. ISO (2001c), Final text of CD 19118 Geographic information - Encoding, ISO/TC 211 N 1136. 9th of August, 2001c. ISO (2002a), ISO/TC 211 Geographic information/Geomatics: www.isotc211.org ISO (2002b), New work item proposal: Geographic information - Geography Markup Language (GML), ISO/TC 211 N 1220. 8th of February, 2002b. Madsen, O. (1995), Open issues in object-oriented programming - A Scandinavian perspective, Software Practice & Experience, 25: 3-43 Suppl. 4 DEC 30 1995. OGC (2001), Geography Markup Language (GML) 2.0, OGC Recommendation Paper, 01-029. February, 2001. OGC (2002), Open GIS Consortium: www.opengis.org OMG (1999), XML Metadata Interchange (XMI) Version 1.1, OMG Document ad/9910-02. October 25, 1999. OMG (2002a), Object Management www.omg.org/mda Group's Model Driven Architecture: OMG (2002b), Unified Modelling Language: www.uml.org Portele, C. (2002), UML Model and Encoding Rules of GML2 - Discussion Paper (Draft), OpenGIS Project Document 02-005. 11th of January, 2002. Shan, Y.-P., Cargill, T., Cox, B., Cook, W., Loomis, M. and Snyder, A. (1993), Is Multiple Inheritance Essential to OOP, ACM SIGPLAN NOTICES, 28 (10): 360-363 OCT 1993. Swaine, M. (1989), Is Multiple Inheritance Necessary, Dr. Dobbs Journal, 14 (3): 107-& MAR 1989. W3C (2002), XML Schema: www.w3.org/XML/Schema Warmer, J. B. and Kleppe, A. G. (1999), The Object Constraint Language: Precise Modeling With Uml, Addison-Wesley Pub Co.