Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices Datatypes, Interfaces and Types John R. Herring US/HMMG 1 Introduction Three of the most commonly used stereotypes in the ISO TC 211 Harmonized Model are DataType, Interface, and Type. They represent in different ways, the two sides of an object; datatypes represent state, interfaces represent behavior and types are an interface with a canonical partial state structure. The definitions and descriptions culled from the UML specifications (both UML 1.x and UML 2.0) are: «DataType» A data type is a type whose values have no identity (i.e., they are pure values). … In the metamodel, a DataType defines a special kind of Classifier in which Operations are all pure functions (i.e., they can return DataValues but they cannot change DataValues, because they have no identity). … A Primitive defines a predefined DataType, without any relevant UML substructure (i.e., it has no UML parts). A primitive datatype may have a logical algebra of operations and constraints defined outside of UML. «Interface» [A] named set of operations that characterize the behavior of an element. … In the metamodel, an Interface contains a set of Operations that together define a service offered by a Classifier realizing the Interface. A Classifier may offer several services, which means that it may realize several Interfaces, and several Classifiers may realize the same Interface. … Interfaces may not have Attributes, Associations, or Methods. An Interface may participate in an Association provided the Interface cannot see the Association; that is, a Classifier (other than an Interface) may have an Association to an Interface that is navigable from the Classifier but not from the Interface. … All [operations] defined in an Interface are public. «Type» Specifies a domain of objects together with the operations applicable to the objects, without defining the physical implementation of those objects. A type may not contain any methods, maintain its own thread of control, or be nested. However, it may have attributes and associations. The associations of a Type are defined solely for the purpose of specifying the behavior of the type's operations and do not represent the implementation of state data. Thus, an interface defines purely functional behaviors, consisting of operations and their signatures. A data type defines an immutable value, usually in a programming language data structure consisting directly of primitives or other data types. A type (or abstract type) consists of operations, along with a logical data structure that defines some of the behavioral elements of the concrete classes that implement (realize) it. In the extreme case, the data members of a type can be sufficient to create a generic type constructor (or initializer) and a generic type query interface that should be implementable by all concrete classes that realized it. This logical design pattern for the three kinds of classifiers is: 17 February 2016 Created by Dr. John R. Herring Page 1 of 7 Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices «interface» X-interface «type»X-type State «datatype» X-dataType +getData() : X-dataType +setData(in data : X-dataType) : bool X Figure 1-1: Common Interface, Type, DataType and Object Pattern NOTE – A Word on Equivalences of Models: In defining a mechanism for modeling, ISO TC 211 must be careful not to allow the peculiarities of particular tool’s implementations or interpretation of UML, nor UML or XML or JAVA constraints that are not based on inescapable logic to bias the way in which models are done. For example, the models in the ISO 191xx standards use multiple-inheritance even though JAVA and XML do not support it, and most of the self-proclaimed object experts claim (based on implementation difficulties usually tied to a particular language or programming environment) that it has ‘insurmountable’ problems. Both JAVA’s use of Interfaces and XML’s use of choice blocks are both alternate, albeit partial, solutions to its implementation, and bypass the problems usually attributed to the practice. 2 Data Type Data types are inherently transient unless contained inside a persistent object and essentially abstract because most if not all object management solution require some form of identity based on logical or physical storage location. The transient nature of datatypes is due to their lack of identity. Since they cannot be identified, they cannot be stored unless they are placed in an identifiable container (an object of another class). Since they cannot be identified, only their container can point to them, because their container is the only object that knows where the value is stored (because of encapsulation). The closest thing to a data type in programming is the value of a C-structure – a collection of named primitives and other C-structures. The C-structure is a simple transient container, identified by its memory address or variable name, which can hold the value of a data type. 17 February 2016 Created by Dr. John R. Herring Page 2 of 7 Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices The data type is abstract in the sense that they can essentially never be stored unless they are in a container of some object, either as an attribute of an object class or as a member of a strong aggregation role of that class. Since data types are not identifiable except by their value, that value must have a container whose value is changeable and expressible as the data type. In most programming languages, the only data types are the primitives built into the language, such as Integer, Real, String, and Boolean. If we declare a variable to be of such a type, as in “X : Integer” then only a transient local slot is created that can contain an integer value. Further, if the routine is recursive (can call itself directly or indirectly) then each copy of the stack for that routine will have a different value for the data type slot “X”. In essence, X is not an Integer but an Integer container (transient at that). The closest we get to pure data types is the temporary constants that are created in expressions as in “X = 1 + 3”. Literally, inside the machine, the 1 and the 3 are created in machine registers, a ‘4’ is created as the output of an arithmetic computation, and then that value is used to modify the ‘object’ X. All object have an associated data type, consisting of the information that is stored internally to the object, (except for the identity of the object which is not considered part of the object’s value). In most programming languages, expressions such as “A = B” transfer the data type information from one object (B) to another (A), going through whatever casting operations (transformations between alternative representations) that are needed. Using the pattern in Figure 1-1, this is equivalent to A.X-type::setData(B.X-type::getData()). If the two sets of expression (A = B, B = A) and (B = A, A = B) always get you back to where you started, then the casting operations from A to B and from B to A are (as a set) ‘idempotent,’ and the two classes have equivalent data types. This sort of abstract constructor/initializer process is key to ISO 19118: Encoding. For example, the XML produced by an encoder is essentially the content of the types associated data type. The source and target system are assumed to have equivalent but not equal object types for the encoded XML. In short, the following has to work: encode decode System1:: X XML :: X System2 :: X That does not say that the two object classes involved are the same, since their behavior for other operations and their internal data structure can be radically different. Actually, they usually are not behaviorally different. A data type comes with a certain inherent semantics, defined by its operations, attributes and constraints. For example, the Positive Integers can be defined by an axiomatic set called Peano's Axioms, which combined with additional axioms to describe subtraction give you the full arithmetic. Any representations that satisfy these axioms are mathematically equivalent. The most common example in computer science is the difference between ones-complement and twos-complement Integers, which are equivalent (in their common domain) but incompatible representations of the Integers. Because of the lack of identity, a datatype cannot be in any but a strong aggregration. Because of a UML limitation, the aggreation cannot be backwardly navigable. ISO TC 211 has made an exception to this last rule. A datatype can have an outward pointing association if one of the two following is true: 17 February 2016 Created by Dr. John R. Herring Page 3 of 7 Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices 1. The target of the association is another datatype and the association is a strong aggregation 2. The target of the association is a well-known immutable object for which a universally recognized identifier exists. In case 1, the structure is equivalent to a member attribute of the data type. In case 2, the structure is equivalent to the datatype having a member attribute whose value is the identity of the target. For example, from ISO 19107, the datatype DirectPosition has a relation to SC_CRS. This is equivalent to DirectPosition having an attribute of type CharacterString (or a namespace enhanced name such as GenericName from 19103, or RS_Identifer from ISO 19115) to hold the coordinate system identity. Technically this is a violation of UML rules, but essentially it does not violate the intent. Figure 2-1 shows the two alternatives. The first is directly from ISO 19107, and the second is the fully UML-compliant equivalent model. The diagrams are from a model drawn in Enterprise Architect, which uses UML 1.4 notation not supported in Rational Rose. This is not a very big issue, since the current ISO TC 211 method for using Rational Rose has equivalent characteristics. cd Data Model RS_ReferenceSystem «DataType» DirectPosition + + coordinate: Sequence<Number> /dimension: Integer +directPosition «Abstract» SC_CRS +coordinateReferenceSystem 0..* {abstract} 0..1 + + kindCode: SC_KindCode remarks: CharacterString «DataType» DirectPosition2 + coordinate: Number [1..* ordered] +/ dimension: Integer + coordinateReferenceSystem: GenericName [0..1] Figure 2-1 DirectPosition example: Use of Associations by datatypes In general, if a datatype in the ISO191xx documents has an association role named “referenceToB” pointing to a type “B”, then it should be replaceable by an attribute “referenceToB” of type CharacterString (or similar type) that will contain the identity of a logically immutable instance of type “B”. 3 Interface The UML specification makes several statements about interfaces that describe their nature and use. Some of them are: An interface is only a collection of operations with a name. It cannot be directly instantiated. 17 February 2016 Created by Dr. John R. Herring Page 4 of 7 Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices The purpose of an interface is to collect a set of operations that constitute a coherent service offered by classifiers. Interfaces provide a way to partition and characterize groups of operations. An interface does not imply any internal structure of the realizing classifier. For example, it does not define which algorithm to use for realizing an operation. Several classifiers may realize the same interface. The relationship between interface and class is not necessarily one-to-one; a class may offer several interfaces and one interface may be offered by more than one class. The same operation may be defined in multiple interfaces that a class supports; if their specifications are identical, then there is no conflict; otherwise, the model is ill formed. Moreover, a class may contain additional operations besides those found in its interfaces. [A] classifier offering the interface must provide not only the operations declared in the interface but also those declared in the ancestors of the interface. Along with types, which are similar, interfaces give a free structure to define behavior without the worry of creating contradictions by defining data structure. Since interfaces are just sets of protocols for operations (no methods, no data) there is no logically problems with any form of inheritance. A concrete class must implement any operation in all of the interfaces it realizes, and so it must implement a union of the operation protocols defined by any interface it directly realizes or is associated to transitively by some form of inheritance. Interface classifiers can be used in operation protocols if the operation only depends on the values that would be returned by the interface operations. 4 Type The type is a partial behavioral definition of an object, just as the data type is the structural definition of an object. It is similar to an interface in the mechanism in which it is realized by implementation classes. Types are abstract in that they can never be instantiated (they have no methods and no directly defined internal data structure). In general, the attributes and association roles associated to a type are abstract, and each implementation of a type can be different. For example, if a type has an attribute “point” of type DirectPosition, then the implementation class must have a way to get and set this value as if it were an attribute. If the implied semantics of a type only require an attribute to be readable, the attribute declaration should be prefaced by the stereotype <<readonly>> and the implementations need only implement the get operation for that attribute. A similar mechanism it to mark the attribute as ‘derived’ which adds the semantics that the attribute can be determined by the value of other attributes (not necessarily of the type) and can thus its value only be affected indirectly The UML specification makes several statements about types that describe their nature and use. Some of them are: 17 February 2016 Created by Dr. John R. Herring Page 5 of 7 Harmonized Model Management Group Recommendation Paper ISO TC 211 Best Practices [A type] specifies a domain of objects together with the operations applicable to the objects, without defining the physical implementation of those objects. A type may not contain any methods, maintain its own thread of control, or be nested. However, it may have attributes and associations. The associations of a Type are defined solely for the purpose of specifying the behavior of the type's operations and do not represent the implementation of state data. Although an object may have at most one Implementation Class, it may conform to multiple different Types. An Implementation class is said to realize a Type if it provides all of the operations defined for the Type with the same behavior as specified for the Type’s operations. An Implementation Class may realize a number of different Types. [The] physical attributes and associations of the Implementation class do not have to be the same as those of any Type it realizes and the Implementation Class may provide methods for its operations in terms of its physical attributes and associations. In ISO/TS19103 Clause 6.3 Classes A class according to this Technical Specification is viewed as a specification and not as an implementation. Attributes are considered abstract and do not have to be directly implemented (i.e. as fields in a record or instance variables in an object). This is not in conflict with the process of encoding as described in ISO 19118, as this describes an external representation that does not have to be equivalent to the internal representation. For each class defined according to this Technical Specification, the set of attributes defined with this class, together with the sets of attributes of classes that are reachable directly or indirectly via associations, shall be sufficient to fully support the implementation of each operation defined for this particular class. This clause essentially says the classes in all ISO TC 211 documents are to be considered as if they were marked with the stereotype «Type». To ensure this behavior, most classes in the HM should be «Type» stereotyped. 5 Rules 1. Interfaces must be marked with the stereotype «Interface». 2. «Interface» classifiers must not inherit or be inherited from anything other than another «Interface» classifier. All other may ‘realize’ the «Interface» classifier. 3. «Interface» classifiers must not realize «Type» classifiers. 17 February 2016 Created by Dr. John R. Herring Page 6 of 7 Harmonized Model Management Group Recommendation Paper 4. 5. 6. 7. 8. ISO TC 211 Best Practices «Interface» classifiers must not be involved in any associations. Data types must be marked with the stereotype «DataType». «DataType» classifiers must not have identities «DataType» classifiers must not be the target of association roles that are by reference. «DataType» classifiers must not be the source of association roles that are not targeted for other «DataType» classifiers except when the target class’ instances are essentially immutable (not subject to change) and all have well know identifiers which could be expressed as a «DataType» classifiers, such as in the case of coordinate reference systems, prime meridians, or standards authorities such as EPSG (European Petroleum Survey Group), OGC, CEN, IEC or ISO. «DataType» classifiers must not have attribute members other than those whose type is another «DataType» classifier or built-in Primitive. 10. Primitives, which are «DataType» classifiers, must not be involved in any associations. 11. «DataType» classifiers cannot inherit from anything other than another «DataType» 9. classifier. 12. Types must be marked with the stereotype «Type». 13. «Type» classifiers must not inherit or be inherited from anything other than another «Type» classifier. All other may ‘realize’ the «Type» classifier. 6 Guidelines 1. Most classifiers in the HM should be «Type», «Interface» or «DataType» classifiers depending on usage and complexity. This rule does not apply if the standard is at the implementation level, in which case an implementation language specific UML patterns should be followed, such as the one defined for GML application schemas in ISO 19136. 2. Abstract classes should be «Type» classifiers. 3. Services should be «Interface» or «Type» classifiers. Services should be «Type» classifiers only if they have publicly accessible state or relation information that is essential to their operations. 17 February 2016 Created by Dr. John R. Herring Page 7 of 7