The work reported here has been done within Distributed Geographical Information Systems(DISGIS) in the period from
June 1996 to June 1999. DISGIS is an EC-supported Esprit project and the focus has been to develop mechanisms for
GIS Interoperability, based on use of state of the art technology from the distributed system domain.
The DISGIS Interoperability approach is based on the ISO RM-ODP model for Open Distributed Processing, the usage of UML for model-specification and the usage of XML and/or binary streaming for data exchange.
For the GIS domain, The OpenGIS Guide (OGC 1996b) gives the following description of the term “Geodata interoperability”:
“Geodata interoperability” refers to the ability of digital systems to 1) freely exchange all kinds of spatial information about the Earth and about the objects and phenomena on, above, and below the Earth’s surface; and 2) cooperatively, over networks, run software capable of manipulating such information.
User
Services
Client B
Common Model
Common API streaming
Client A
catalogue services data access electronic business cl/map FW
Common Model
Common API streaming
Application
Server streaming
Common API
Common Model
Feature Model
Server-mapping streaming
Common API
Common Model
Feature Model
Server-mapping streaming
Common API
Common Model
Feature Model
Server-mapping
GIS service
A
Data
Server
OpenGIS SQL OpenGIS COM Proprietary DB
Figure 1 Internet/Intranet Interoperability between clients and servers
Figure 1 shows the Internet/Intranet interoperability level focused on in DISGIS. The aim is to support interoperability between components on the feature-model level, independent on the underlying storage structure and access mechanisms.
Even though it is important to define platform-dependent storage and manipulation interfaces for a data server level, such as the OGC Simple feature specification for SQL and COM/OLE, our aim has been to specify an interoperability interface on a storage-independent application server level.
The DISGIS Interoperability Reference Model is an interoperability approach based on the ISO RM-ODP model.
ISO RM-ODP uses five viewpoints to describe open distributed systems. Each of these viewpoints is an abstraction of a complete system that focuses on a specific area of concern. As various emphasis is given in each of the viewpoints, the viewpoints provides a basic for discussing various aspects of interoperability. The Interoperability Reference Model uses the five viewpoints to identify and discuss various aspects of interoperability. In each of the five ISO RM-ODP viewpoints the Interoperability Reference Model identifies one type of interoperability. The Interoperability Reference
Model also shows how these types of interoperability can be achieved.
(Business Process Models -
Context/Scope)
Enterprise viewpoint
Domain use-cases + process / activity models
Use Case tools process-modelling tools
OMT/UML tools
(Domain
Model) type models
Information viewpoint
(Distribution structure
& patterns)
Engineering viewpoint
Computational viewpoint
(Architecture
Model - IT) collaboration models
Role-modelling tools (OOram)
Frameworks for distribution concerns
(Infrastructure
mappings)
CORBA
Technology viewpoint
Java APIs
ActiveX
DCOM
SQL/ODBC, ODMG, ...
Figure 2 ISO RM-ODP and UML as a basis for the Interoperability Reference Model
The DISGIS Methodology is based on the ISO RM-ODP framework and the usage of the Unified Modeling Language
(UML) as a notation. The approach focuses on creating models according to the five ODP viewpoints. System interoperability issues are being resolved with respect to all five viewpoints.
The DISGIS reference model is based on the generic use of ODP-principles (ISO Reference Model on Open Distributed
Processing)
[ i
]
and ISO CSMF (Conceptual Schema Modelling Facilities)
[ ii
]
.
The ODP viewpoints represent abstractions or “views” into a system, and make a useful conceptual framework around which concrete solutions can be developed. The highlight of the DISGIS methodology aspects is given below in terms of enterprise, information, computational, engineering, and technology modelling.
•
The enterprise modelling describes a system in terms of business and user-requirements for the system. Use cases.
•
The computational modelling describes a system in terms of its interacting components and interaction APIs, based on a distributed service architecture.
•
The information modelling describes a system in terms of a geodata domain model. UML models specified according
to the defined UML modelling rules, in order to be able to do code-generation.
•
The engineering modelling describes a system in terms of logical distribution, without considering the enabling technology for realising the actual distribution. Patterns for distribution concerns (DASCo)
•
The Technology modelling is a mapping of the system specifications onto specific technology.
Figure 2 depicts the abstract ODP viewpoint and how the viewpoints can be supported by methods, tools, and techniques in order to describe the different viewpoints.
The enterprise viewpoint of ISO RM-ODP is concerned with the purpose, scope and policies of a computer system in an organisation, and it shows how a computer system functions within an organisation.
To allow two GIS software solutions from two different organisations to interoperate there is a need for organisational
interoperability.
GISDK
AutoCad
Editor
NMA
FYSAK
Editor cl/map FW
DG MOD 4
DG API 4
DG CFW cl/map FW
DG MOD 4
DG API 4
DG CFW
<XML,Binary>
<XML,Binary>
DG CFW
DG API 4
DG Mod 4
GISDK server
Sysdeco
Editor cl/map FW
DG MOD 4
DG API 4
DG CFW
<phys. com: sockets ,CORBA, ..>
DG CFW
DG API 4
DG Mod 4
NMA QuadriServer
DG CFW
DG API 4
DG Mod 4
Sysdeco server
Coordinate
Transformation
server
Oracle ProC
SQL/3
Quadri
Store
MapServer
Oracle 7.3
Figure 3 Pilots per June 1999
The enterprise level focus in DISGIS has been on maintenance and editing of geodata, based on requirements from data providers such as national mapping agencies.
•
Geodata – a collection of objects representing geographical information
•
Feature – an object with geographical location
•
Featuretype – a metadescription which is used for characterising sets of features.
•
Typeregistry – collection of featuretypes
A feature is a representation of a real world entity or an abstraction of the real world. It has a spatial domain, a temporal domain, or a spatial/temporal domain as an attribute. Features usually represent entities [15].
Features have attributes and relations to other features. In addition most features (ideally all) should have geometric attributes describing the location of the feature in the geographic space. A feature is characterised by its attributes and relations.
Area of concern and requirements were discovered by analysis of user’s working habits, existing systems and interviews. The goal of the geodata editor is to provide the services needed to obtain and perform updates on geodata within the context of consistent transactions. The goal of the users is to use the services provided by the system to perform update-transactions on the geodata.
A user should be able to obtain geographical objects, modify them, delete them and commit a set of changes. The geographical objects are provided by one or more geodata servers that may be distributed. The user may issue
geographical queries and receive the corresponding geographical objects. Geographical objects are represented by
features that are linked with featuretypes describing their nature. For example, a geographical object (feature) ‘road E18’ has a featuretype ‘road’. The user may then modify attributes (e.g. location) of objects, delete objects or create objects.
All update actions are performed in the context of a transaction.
The geodata server provides access to geographical data. It handles security, data access and transactions. The geodata server offers its services to local users as well as users in distributed locations.
The functional requirements described by these use cases are support for user sessions, transactions, data retrieval and data update. There may be multiple users (clients) and a set of distributed data geodata servers.
The non-functional requirements of the system are that: the system services should be accessible through the Internet,the system should not be restricted to a specific implementation language or hardware platform and that the user authorisation must be handled by the system.
The computational viewpoint of ISO RM-ODP is concerned with describing the components of a distributed system. It describes the interaction patterns between the components and their interfaces. To be able to interoperate in the computational viewpoints two systems must be component and services interoperable. Two systems are component and services interoperable if they agree on the set of services offered by the components of the two systems and the interfaces of these components. By defining standardised interfaces to these components, the components of one system will be able to request services from components in another system.
Client Client
ActionSequence
Result
ConflictSet Requests
- Types
- Geodata
ActionSequence
Result
ConflictSet
Requests
- Types
- Geodata
Feature Model
Server
Data servers
Figure 4 DISGIS API interaction
The DISGIS API has deliberately been designed to be flexible and as simple as possible with only the operations that should be supported by every system defined. The interaction protocol shown in figure 4 illustrates the two types of request for typeregistry and geodata, and the return of a result. The result class is specialised into different result classes.
Instances of these are returned to the client as a response to the corresponding request.
The interaction level is aimed at browser and editing clients that need rapid access to the features of interest. To have sufficient performance, this means that the features have to be present in the client-node and cannot be accessed by use of remote references.
d g _ R e s u l t
(fro m B a se ) d g _ E rro rR e s u lt
- e rro rM s g : S trin g d g _ T yp e R e g is tryR e s u l t d g _ G e o d a ta R e s u lt
-typ e R e g i s try d g _ T yp e R e g is try
(f ro m t y p e R e g i stry ) d g _ G e o d a ta S e t
(f ro m g e o d a ta S e t)
Figure 5: Result model
Since all clients work with copies of the original persistent objects residing in the server, consistency problems may occur when the persistent objects are modified. To enable the client to keep its local copy of the GeodataSet up-todate with changes in the server, the client will be able to be notified of changes in the server data. The notification model chosen is rather simple, but has the potential for later extensions.
An example of use of notification is as follow:
1.
The clients register their areas of interest with the server.
2.
If the server gets a transaction that intersects with one or more client areas, it sends a notificationEvent to the clients that are concerned. The server sends one notification per client.
3.
If the client is interested, it asks the server for more details. The server responds with the area that has been changed or alternatively with the ActionSequence which was the source of the event.
The information viewpoint in ISO RM-ODP describes the information that flows in a system and is processed by a system. It focuses on the structuring of semantic information, typically the information that will be stored in a database and communicated between the components of a system. An information model is used to describe the information viewpoint. This information model defines the structure and semantics of the information used in system by defining objects, their properties and their relationships.
A DISGIS server communicates geodata with a DISGIS client and the geodata objects are described according to the
DISGIS geodata model. The geodata model follows the principles of the ISO/TC211 general feature model and the feature model that has been specified in the OGC, but is not one-to-one with these, in particular since these still are under development. Some particular requirements, such as sharing of geometry between features have also been supported by the DISGIS model. However, the DISGIS approach has been to define the model fully in UML and to generate much of the underlying representation and model manipulation code. It will be easy to change the model to comply with the
ISO/TC211 and OGC models in the future.
In the DISGIS model, features and geometry attributes are considered as persistent objects in the sense that the they are uniquely identified by an identifier (an "Id") and may hence be stored permanently in a DISGIS server. Parts of the geodata model is depicted in figure Figure 6.
{set}
-com posi tes dg_GeoObject
(f rom b ase )
-attributes dg_Attribute
(from at tri but e) name : string dg_GeodataSet
(from geodataSet) dg_SpatialObject
-spatialO bj ectType dg_SpatialObjectType
(f ro m t ype ) dg_Feature
+{set}
-spatials dg_SpatialPropertyObject +spatRefSys tem dg_SpatialReferenceSystem
(f rom g eometry ) dg_Point
(f rom geo metry) dg_Curve
(from geom etry) dg_Surface
(f rom geo me try)
Figure 6 The DISGIS Geodata Model – overview
The DISGIS geodata model includes a dictionary describing types of features (i.e. "HOUSE", "ROAD") and feature attributes. The purpose of the dictionary is to formally describe types of feature used in a certain application domain. This includes the possibility to validate new feature objects in order to check if they have a correct internal structure, or the possibility to make the dictionary create new empty feature objects. In addition a dictionary act as a description of the application domain. The dictionary is contained in FeatureTypeRegistries.
Feature objects are communicated between a DISGIS server and a DISGIS client in collections called geodata-set.
The geodata-set is as an object with both geometric and non-geometric attributes.
The geometry of geodata is described in a real world coordinate system called a spatial reference system. Feature in a geodata-set may all belong to the same or different spatial reference systems.
As far as possible, the DISGIS model has been described according to emerging international standards of geographic information in ISO/TC211 and OGC.
The engineering viewpoint focuses on mechanisms for distribution and support for distribution transparencies and support services such as security and persistence. In order to achieve interoperability between two environments it is necessary to have mappings between their support for these transparencies. This can be done through a higher abstraction layer that maps to the implementation and representation of these services in various environments.
The following are the most important distribution transparencies that should be supported by a distribution abstraction layer for multiple platforms.
1.
Communication transparency with respect to physical communication mechanisms. This corresponds to ODP access
transparency.
2.
Location transparency with respect to location of GIS servers. This corresponds to ODP location transparency.
3.
Replication transparency
The Distributed Communication Framework(DCF) handles the supports for these transparencies. The Distributed
Communication Framework is an object-oriented framework implemented in C++ with the purpose of simplifying the development of the distributed applications. The framework allows an incremental development of distributed applications by separating the application’s functionality from the distribution-specific issues. The framework can utilise multiple underlying communication infrastructures, within the project CORBA and socket communication has been utilised but the framework can also easily support others such as DCOM or Java RMI.
To ease the support of existing clients with proprietary models, a client mapping framework for mapping of instances of one data model to another has been developed. This is typically used for mapping from between the DISGIS Geodata model and a proprietary model.
The technology viewpoint of ISO RM-ODP is concerned with the underlying infrastructure in a distributed system. It describes the hardware and software components used in a distributed system. To achieve interoperability in the technology viewpoint an infrastructure that allows the components of a distributed system to interoperate is needed. This infrastructure may be provided by a Distributed Object Environment (DOE) that allows objects to interoperate across computer networks, hardware platforms, operating systems and programming languages.
Generate
C++
Header files
Generate
Generate Compile
C++
.cpp files
90%
Binary stream encoding
XML stream encoding
Compile
C++
.cpp files
10%
Manually implemented
Figure 7 Automatic code generation from UML to C++ and XML/Binary streaming
The initial approach in DISGIS aimed at supporting the native binary streaming mechanisms for the various distributed environments, such as CORBA object-by-value and DCOM. These mechanisms are however platform dependent, and not interoperable between platforms. The next step therefore focused on a platform neutral streaming using XML or binary C++ streaming.
XML is emerging to be the leading publish format of the World Wide Web. Therefor it is a technology that businesses are believed to relate to anyway. Using XML for data exchange will then reduce the number of technologies that businesses must relate to. SOSI files, EXPRESS files and other formats will often introduce another technology for a given business.
The XML format is platform and language neutral while for instance the binary Streaming Service from Object
Space is C++-dependent
1
. In XML, the client can have a Java implementation, while the server has a C++implementation. Only the interface has to be the same. The data can be browsed using Internet browsers that most clients
1
The Streaming<ToolKit> is a complete implementation of the Universal Streaming Service, a nonintrusive mechanism for facilitating platform independent object persistence and network transport of objects.
will have anyway. No special browser has to be created for a proprietary formats like Express. The validity of the XMLfile can also be checked with any XML-parser.
When designing a communication protocol, one must try to meet the following demands:
1) Flexibility : The possibility to use different implementation languages such as Java, C++ or other languages.
2) Efficiency : The volume of GIS data implies that efficiency both in terms of memory usage and execution time, have high priority objective when designing communication protocols. It is important to have efficient implementations of bulk data transfers in the distributed environment.
These two objectives are often "conflicting – efficiency means low flexibility and vice versa. Performance measurements has been made to compare the XML encoding schema to the binary encoding schema from ObjectSpace that has been for the C++ structure streaming in DISGIS.
The disadvantage of using XML instead of binary structures is the size and thereby the speed of the applications.
It is important to measure the space complexity of the different encoding schemas. We define the space complexity as the size (in bytes) of a given geodata set when encoded using a given encoding schema. As a simple example, consider the encoding of a 3D position in the geodata model. The class representing a 3D position is dg_Position3D :
As can be seen, this class contains three data members, x, y and z, which are all doubles. In memory, the size of a dg_Position3D is 24 bytes (8 bytes pr. double) plus 4 bytes to hold the type info (the size of a pointer). A dg_Position3D instance therefore occupies 28 bytes in memory.
The XML encoding of a 3D position is:
<dg_Position3D Id="ID1306" x="265492.832000"
y="117897.250000"
z="0.000000"
>
</dg_Position3D>
This equals 97 bytes, where the type info is 27 characters. It should be noticed that the domain set of bytes (0-255) is not fully utilised by XML encoding. Since this is an ASCII encoding, only a small subset of this range is used (probably less than half the range of a byte), which gives a potential for data compression.
The size of the binary encoding is 80 bytes where 13 bytes are used for the type id. The domain set of bytes are fully utilised by binary encodings, so we will expect that the relative information content of a binary encoding is higher than the relative information content of the corresponding XML encoding.
Type size
Data size
Total size
Memory
Representation
4
24
28
Binary
Encoding
13
67
80
XML
Encoding
27
70
97
Since the dg_Position3D is one of the most used data model instances in a typical geodata set, this gives a good indication of the amount of overhead necessary for transferring information through a generic framework.
XM L E ncoding to B inary E ncoding ratio
1,66
1,655
1,65
1,645
1,64
1,635
1,63
1,625
1,62
0 5000 10000 15000 20000
Size(XM L) / Size(Binary)
25000 30000 35000
Figure 8 Relative size of XML versus binary encoding per number of features
Here we see that the XML to binary encoding ratio is between 1.62 and 1.66. However, we also see that this ratio is slightly increasing meaning that the XML encoding size is increasing in the number of features.
Finally, we have compressed the files using a standard Huffman encoding (using WinZip 8.0 with maximum compression) and measured the size of the resulting compressed files. The compression ratio indicates the relative amount of information contained in the file. We measured that compressing an XML encoded geodataset results in a compression ratio of approximately 10-12 %. When compressing a binary encoded geodata set, the compression ratio is only slightly higher. Having a compression ratio of 10 % means that it may be efficient to compress the content before sending it since this may give a performance gain of expectedly 5 times the current speed. The compression ratios being very similar means that communication using both XML or binary encodings may benefit equally from compression.
In our performance measurements we have seen that the algorithm that has been used for conversion into XML streaming has been less efficient than the C++ streaming algorithm, with a time difference between 15-30 times. The encoding algorithm is applied twice in the measured times – first, the data is encoded at the server and secondly it is decoded at the client. Since the size of an XML encoding is less than twice the size of a binary encoding, the actual algorithm of creating the XML encoding must be approximately 5-10 time slower than the algorithm converting data to a binary encoding. This gives high expectations to the prospects of being able to use XML to communicate efficiently since the only real obstacle is the difference in size. Since this difference is less than a factor of two, we would expect that XML could be used as a generic encoding format instead of using traditional binary formats which are dependent on a specific computer programming language. Also the experience from Internet development has shown that open, simple and flexible solutions are winning compared to more complex and specialised solutions with higher efficiency. Still the approach of generating streaming from the UML models allows also for the generation of optimised binary streaming for special situation.
The usage of UML as an implementation neutral specification language, with semi-automatic mappings to various platforms and storage structures has been promoted within the ISO/TC211 and OpenGIS. Partners have also been responsible for the direct development of some standard parts, in particular ISO 15046-3 (Conceptual Schema Language
– using UML), 15046-19 (Encoding – using XML) and 15046-18 (Service architecture). In addition direct input has been provided to OpenGIS on the usage of UML and the definition of the OpenGIS Service architecture.
As shown in figure 13 DISGIS will demonstrate that it is possible to create a portability/interoperability layer on top of the OpenGIS platform-implementation specifications. This level will hide the platform differences through a unified
object model that can be directly supported in an object-oriented language, such as C++ or Java. This level will be realised on top of OpenGIS implementations using available GIS server technologies such as Oracle SDC and ESRI
SDE.
This layer will be on the level where ISO/TC 211 aims at standardising service interfaces. A common abstract specification, where platform dependent profiles can be derived. The next figure shows how the abstract specification, in
UML, can serve as an interface between ISO/TC211 work and OpenGIS work.
ISO
Spatial sub schema
Meta data
Quality sub schema
General
Feature
Model
Rules for
Application
Schema
Coord
Ref. Sys 1. Conformance
Imp. Spec
COM/MIDL
Abstract Service Spec
(UML w/precision)
(Common Imp.spec)
Direct C++/Java
Portability/Interop Interface
OGC
Feature rel.ship
Spatial sub schema
2- Conformance
3. Reverse mapping for
Portability and Interoperability
Imp. Spec
CORBA/IIDL
Imp. Spec
ODBC/SQL
SQL3/MM
Imp. Spec
SDAI/EXPRESS
Imp. Spec
ODMG/ODL
Figure 9 Harmonisation of ISO/TC211 and OpenGIS
The goal of the DISGIS API is to aim at the direct C++/Java portability/interoperability layer, equivalent to the
OpenGIS Abstract specification/common implementation specification, which also can be adopted as a standard by
ISO/TC211. A goal is to investigate how this level can be specified in UML (following modelling guidelines) on sufficient level of precision, so that:
•
Conformance can be shown with the relevant ISO Parts
•
Conformance between an abstract specification and an implementation specification can be verified. Necessary for potentially new implementation specifications (ODMG/ODL, SDAI/EXPRESS, …)
•
Interoperability between implementations from different implementation specifications can be enabled, by showing reverse mapping from implementation specification to abstract specification
•
A portability layer can be made (on the cost of performance)
The DIS results of the DISGIS project are an ODP Methodology approach to interoperability , with associated UML modelling tools and code generation tools for C++ and XML/Binary-streaming, supported by a distributed communication framework.
The GIS results of the DISGIS project are an architecture and protocol , the DISGIS API, for interaction between client and servers in a distributed environment, supporting a geodata model suitable for the requirements of the partners, and related to the geodatamodels of ISO/TC211 and OpenGIS. The Client Mapping Framework is a framework to help the mapping between the Geodata model and clients internal and proprietary models. A coordinate transformation service is an example of a reusable geospatial processing service. An example usage of XML for Geodata encoding is shown through an XML-browser based on the ISO/TC211 15046-18 Encoding standard, with a Java-based browser. Proof of
concept for the integration of the ISO/TC211 and OpenGIS standards. In particular demonstrating a common interface portability/interoperability feature layer that can be placed on top of different geodata-servers.
The experiences and lessons learned from the pilot case implementations are that
The Interoperability approach described here can also be applied to other domains. In particular it have been tried in the domain of finance and C4I. The COMPASS
2
project has applied this methodology in the context of financial component standardisation for the general ledger component of accounting systems. RM-ODP will provide the basis for reasoning about system distribution and UML will provide the notation for describing these systems from different viewpoints. C4I _ MACCIS (Reference !!)
[5]
[6]
[7]
[8]
[9]
[1]
[2]
[3]
[4]
[10]
[11]
[12]
ISO/IEC, “ISO/IEC 10746-1 Information technology - Basic reference model of Open Distributed Processing - Part 1:
Overview” ISO ITU-T X.901 - ISO/IEC DIS 10746-1, 1996.
ISO/IEC, “ISO/IEC 10746-2 Information technology - Open Distributed Processing - Reference Model:Foundations” , 1996.
ISO/IEC, “ISO/IEC 10746-3 Information technology - Open Distributed Processing - Reference Model: Architecture” ,
1996.
ISO/IEC, “ISO/IEC DIS 10746-4 Information technology - Open Distributed Processing - Part 3: Architectural semantics” ,
1996.
OMG/UML, “UML Notation” . http://www.rational.com/uml/html/notation, 1997.
DISGIS, “DISGIS Methodology - Deliverable MD2.3” , DISGIS project report MD 2.3, June 1997.
MAGMA, “Magma Software engineering handbook” SINTEF, draft 1997.
OBOE, “Business Object Methodology Handbook” SINTEF 1998.
OMG/UML, “OCL - Object Constraint Language Specification” , version 1.1 ed. http://www.rational.com/uml/html/ocl/,
1997.
A.-J. Berre, J. Ø. Aagedal, and A. R. Silva, “SIMOD - An ODP-extended Role-Modeling Methodology for Distributed
Objects” presented at Thirtieth Annual Hawaii International Conference on System Sciences, Wailea, Hawaii, 1997.
A.R.Silva, T.Goncales, F.Rose, A.J.Berre, and J.Aagedal, “Organization, Information System and Distribution Modeling: An
Integrated Approach” presented at EDOC 97' (Enterprise Distributed Object Computing Workshop), 1997.
T. Reenskaug, P. Wold, and O. A. Lehne, Working with Objects - The OOram Software Engineering Method: Manning
Publications, ISBN 1-884777-10-4, 1996.
C. Alexander, A Pattern Language. New Your: Oxford University Press, 1977. [13]
[ i
]
SC21 N8926rev, “ITU-T X.901 | ISO/IEC 10746-1 ODP Reference Model Part 1. Overview”, ISO/IEC JTC1/SC 21/N,
Draft International Standard
[ ii
]
ISO/xx, “CSMF - Conceptual Modelling Facility
2
Component Based Accounting Systems and Services, ESPRIT Project No. 25717