Semantic Technology based Data Transformation of Dutch Cadastral Parcels into INSPIRE Cadastral Parcels 1 Zhengjie Fan1, Peter van Oosterom1, Sisi Zlatanova1 GIS-technology, OTB Institute, Delft University of Technology 1. INTRODUCTION The paper presents a semantic technology (including ontologies and rules) based approach for the transformation between two data models. We use the Dutch (LKI) cadastral parcel and the INSPIRE cadastral parcel models to compare a traditional transformation method (using FME) with the semantic technology based transformation method. The semantic technology method is more based on encoding of the knowledge of the different data models (and the mapping between them) and less on the actual procedural steps (using predefined functions form a toolbox) to perform the transformation. A reasoner uses this knowledge and performs the actual transformation. This paper illustrates that the semantic technology based approach is also capable of performing the topology (in the Dutch model) to geometry (in the INSPIRE model) transformation. This in addition to the transformation of thematic attribute (values) and reclassification, which is a well-know advantage of an ontology-based transformation. As often discussed in the literature geographical data are highly heterogeneous due to different data formats, spatial/geographic data types, spatial reference systems, data schema/model (structure and constraints), classification, taxonomy, terminology/vocabulary, metadata model, scale, degree/amount of detail, etc. However, very often data have to be transformed from one model in another. Especially data schema transformation is a widely studied problem in both research and industrial field. In this paper, we compare the traditional way of spatial data transformation with a semantic technology based method of data transformation. This paper is organized in the following parts. Section 2 illustrates the traditional way of data transformation. Section 3 introduces the ontology approach for the same data transformation case as in Section 2. 2. AN EXAMPLE OF DATA TRANSFORMATION WITH EXISITING SOFTWARE Currently, there are various kinds of conversion tools for geographical model, such as Altova MapForce 2010 and DBConvert Product Line. Software packages such as ArcGIS, Bentley Systems, Integraph, MapInfo, etc. can import various file formats and convert them to internal data structures or for exporting them. One of the most elaborated tools for mapping between different spatial data sets is FME (www.safe.com). The developers argue that “FME is the world's only complete spatial ETL solution that enables GIS Professionals to quickly translate, transform, integrate and distribute spatial data.” It “converts and integrates data in 225+ formats, manipulate data into the exact data model you need and share spatial data and ETL tasks over the web.” Therefore this data conversion tool is selected to illustrate the traditional data transformation approaches. class CadastralParcels «featureType» BasicPropertyUnit + + inspireId: Identifier nationalCadastralReference: CharacterString «voidable» + areaValue: Area [0..1] + validFrom: DateTime + validTo: DateTime [0..1] «lifeCycleInfo, voidable» + beginLifespanVersion: DateTime + endLifespanVersion: DateTime [0..1] constraints {areaValueUoM} {validTo} {endLifespanVersion} +basicPropertyUnit «voidable» 0..* «featureType» CadastralZoning + + + + CadastralParcel + + + + geometry: GM_Object inspireId: Identifier label: CharacterString nationalCadastralReference: CharacterString «voidable» + areaValue: Area [0..1] + referencePoint: GM_Point [0..1] + validFrom: DateTime [0..1] + validTo: DateTime [0..1] «lifeCycleInfo, voidable» + beginLifespanVersion: DateTime + endLifespanVersion: DateTime [0..1] constraints {geometryType} {areaValueUoM} {validTo} {endLifespanVersion} +parcel «voidable» 1..2 +zoning «voidable» 1 geometry: GM_MultiSurface inspireId: Identifier [0..1] label: CharacterString nationalCadastalZoningReference: CharacterString +upperLevelUnit «voidable» 0..1 «voidable» + estimatedAccuracy: Length [0..1] + level: CadastralZoningLevelValue + levelName: LocalisedCharacterString [1..*] + name: GeographicalName [0..*] + originalMapScaleDenominator: Integer [0..1] + referencePoint: GM_Point [0..1] + validFrom: DateTime [0..1] + validTo: DateTime [0..1] «lifeCycleInfo, voidable» + beginLifespanVersion: DateTime + endLifespanVersion: DateTime [0..1] constraints {zoningLevelHierarchy} {estimatedAccuracyUoM} {validTo} {endLifespanVersion} «codeList» CadastralZoningLev elValue «featureType» CadastralBoundary + + + + + 1stOrder 2ndOrder 3rdOrder geometry: GM_Curve inspireId: Identifier [0..1] «voidable» + estimatedAccuracy: Length [0..1] + validFrom: DateTime [0..1] + validTo: DateTime [0..1] «lifeCycleInfo, voidable» + beginLifespanVersion: DateTime + endLifespanVersion: DateTime [0..1] constraints {estimatedAccuracyUoM} {validTo} {endLifespanVersion} Figure 1: UML class diagram: Cadastral Parcels (INSPIRE 20009). For the test with FME the conversion between the Dutch Cadastre (LKI) model and the INSPIRE Cadastral Parcel model was used. The Cadastre is the registry of the land ownership in the country. The Cadastral map often contains two main classes: parcel and boundary. A parcel is a piece of land that owned (homogenous rights) by a person or an organization. A boundary is a line that marks a change in the rights and is shared by two neighbor parcels. Together different boundaries completely surround the parcel. Different European countries represent Cadastre data in different data schemas. A uniform INSPIRE data schema is defined to be transformed to so as to help people access to different EU countries' Cadastral data (UML Class diagram in Figure 1). The Dutch cadastral parcel model (LKI) also has, among others, two classes: NL_Parcel and NL_Boundary; see Figure 2 (van Oosterom and Lemmen, 2001). class NL 1 NL_Parcel + + + + + + + + + + + + + + + object_id: int location: GM_Point 1 d_location: GM_Point rotation: int 1 bbox: GM_Envelope l_num: int calculated_area: float legal_area: float quality_code: char [2] municip: char [5] section: char [2] sheet: char [4] parcel: char [5] tmin: DateTime tmax: DateTime NL_Boundary l_obj_id + + + + + + + + + r_obj_id outer_ring_1bnd inner_ring_1bnd 1 first_left object_id: int 1 interp_code: int shape: GM_Curve bbox: GM_Envelope line_len: int source_cd: char [2] 1 quality_code: char [2] tmin: DateTime tmax: DateTime first_right last_left 1 last_right l_num-1 all references to NL_Boundary are signed (+ or -) indicating proper direction Figure 2: UML class diagram: Dutch LKI Cadastral Parcels. The INSPIRE Cadastral Parcel model keeps the geometric information in the parcel and optionally also at boundary. But in Dutch Cadastre model, there is topology information in the parcel and geometric information in the boundary. Therefore the transformation from Dutch Cadastre data to INSPIRE Cadastre data should perform the collection of the coordinates of the surrounding boundaries of the parcels according to the topology information in Dutch Cadastre. We selected a portion of Dutch Cadastre data and stored these in Oracle. Then, two FME functions are used to compute the parcel's geometry information; see Figure 3. One is AreaBuilder, whose function is extracting line features of several neighboring boundaries into a single polygon. The other is PointOnAreaOverlayer, which is used to identify each single polygon with its id in the database. The result of the translation is partly shown in Figure 4. Figure 3: FME data transformation procedure Figure 4: FME data transformation result Our assumption is that the use of semantic technology will allow improving the whole transformation procedure and will allow deriving the geometry from the topology. Firstly, ontology is good at defining the unified meaning of the objects and the relationships between each other. In the above test, topology is a kind of relationship, which can be defined by ontology. Secondly, semantic technology provides reasoning function to find out implicit objects and relationships. This could help compute the geometrical information of the parcels. So, these two advantages are in demand by the above transformation task. Therefore, in next section, we will introduce semantic technologies briefly and then illustrate how ontology could help to fulfill the transformation by giving a simple example. 3. A SMALL SEMANTIC TECHNOLOGY BASED TEST EXAMPLE Ontology is a network of concepts and relationships, which provide specifications of the knowledge in the world people are working on (Uschold and Grüninger, 1996). As a key technology in semantic field, ontology is a unified structure which can not only act as a conceptualization but can also be shared. It is a standard explanation of concepts and relations used by the system (Agarwal, 2005, Visser et al., 2002, Bittner et al., 2005). Moreover, ontology reasoning can help find out the implicit or contradicting relationships that can hardly be discovered by human being. 4 2 1 7 II 3 4 III 1 I 3 6 5 5 2 Figure 5: A small cadastral test example Figure 5 shows a small example of three parcels, five nodes and seven boundaries. The ids of the primitives are labeled in the graph. Each line has a predefined direction. It is assumed that the starting point is the first point of the line. The ending point is the last point of the line. The example is edited in web ontology language (in short, OWL, see http://www.w3.org/TR/owl-features/) by http://protege.stanford.edu/); see Figure 7. the ontology editor Protégé (see Figure 6: The example edited in Protégé (individual) boundary 3 is shown Figure 7: Data transformation rules in SWRL (left: first boundary, right: next boundary) Two rules are defined for the transformation task as follows (see Figure 7). The first rule is designed to get parcel's first outer ring boundary's coordinates. The second rule is designed to get the next boundary's coordinates based on the topology information given by the first outer ring boundary. The rules are written in semantic web rule language (in short, SWRL; see http://www.w3.org/Submission/SWRL/), which is executed in ontology reasoner RacerPro (see http://www.racer-systems.com/); see Figure 8. Figure 8: Ontology-based Transformation Result Comparing to the transformation process of FME, the transformation process of the ontology is more flexible. In FME, the functions that could achieve the transformation must be selected and combined by the user. The functions are designed in the software beforehand. But the ontology's method uses the user defined rule to fulfill the transformation process (and does not specify the exact execution order). This is more intuitive for the users to provide just the knowledge (in an ontology and rules), which is then used by the system (to reason and derived the result, that is, transformations). The person who wants to transform the data does not need to read the list of functions and figure out how to combine them into the right execution order. In the ontology method, they only need to know the relationship of the data and define the transformation process by rules. BIBLIOGRAPHY Agarwal, P., 2005. Ontological considerations in GIScience. International Journal of Geographical Information Science, 19:501-536. Bittner, T., Donnelly, M., and Winter, S., 2005. Ontology and semantic interoperability. In Prosperi, D. and Zlatanova, S., editors, Large-Scale 3D Data Integration: Challenges and Opportunities. CRCPress, Boca Raton, FL. INSPIRE, 2009. D2.8.I.6 INSPIRE Data Specification on Cadastral Parcels – Guidelines (v3.0, 7-9-2009). INSPIRE Thematic Working Group Cadastral Parcels. van Oosterom, P. and Lemmen, Ch., 2001. Spatial Data-management on a very large cadastral Database. Computers, Environment and Urban Systems, 25(4-5):509528. Uschold, M. and Grüninger, M., 1996. Ontologies: principles, methods, and applications. Knowledge Engineering Review, 11(2):93-155. Visser, U., Stuckenshmidt, H., and Schlieder, C., 2002. Interoperability in GIS-enabling technologies. In AGILE Conference on Geographic Information Science, pages 25-27.