The Construction and Implantation of a Marine Environment Geographic Information System (GIS) Tesfazghi Ghebreegziabeher 1*, Leo Van Biesen 2, Patrice Yamba 3 1,2,3 Vrije Universiteit Brussel/ Department ELEC Pleinlaan 2,1050 Brussels, Belgium Introduction Environmentally the area of interest is composed of spatial and temporal data pertaining to the Guayaquil Estuary (Ecuador, South America) with three distinct aquatic environments; i.e. the estuary-water-column zone, the estuary-floor-surfacial zone and the estuary-bottom zone (undifferentiated sedimentary deposits). Integrated multidisciplinary groups have been involved in the data collection and measurements with the aim to construct a GIS for this particular area of interest [11]. The comprehensive list of discipline’s complex data includes: Sedimentology data intended to understand the sediments distribution within the estuary environment and their relationship with related measured parameters. Bio data intended to decipher the nature of the diverse species spatial distribution. Physical measurement intended to understand the water’s current speed and direction with respect to temporal variation. Chemistry (chemicals) intended to explain the spatial dissemination of chemicals and their relationship with the hosting three sub marine environments. Geological pre-existing data compiled and integrated to understand the rock types, geological set up which is considered as the provenance of the estuarine sedimentary deposits. Structure data that helps to understand the tectonic history and trends of the estuary channels. The heterogeneity and diversity of the data and the complexity of the environment required to establish a data model, which can be utilised to store the data, link it to the GIS application and convert to information thereby to retrieve, analyse and display spatially and to visualise dimensionally. The construction of the GIS data model (External) has been conceived on a RDBMS and eventually integrated with Special Internal System entities generated on MicroStation Geographics (Bentley). The process includes the data acquisition, definition, integration, association and analysis of the variegated phenomena that are represented as samples or measured data in the GIS. The Data Model The hatching of a GIS incepts with the notion that everything either dynamic or static possesses a position that serves to georeferencially pin down its exact occurrence on the earth. Vector models are preferable to raster to distinctly allocate a position to a data measured or collected on the surface of earth [10]. The location is described using co-ordinate systems characterised by a special projection system. The co-ordinates indicate the position of an attribute data or alphanumeric attached to an object (point, line) in a GIS database record. With this regard, the best Entities that pertain to the environment were selected. Entities were composed of attributes akin to the complex list of disciplines described above. The candidate entities that represent the multi-discipline environment GIS were the Location, Field Data, Campaign Date, and Measurement Results. The later entity is an amalgamation of all the disciplines’ results that were expected to relate to the same entity that composes the constructed GIS. Amalgamating the redundant Measurement_Results of every discipline was required because the presentation and classification was based on the mmcode, which is an identifier of the record and p1, p2, p3, are informative attribute names belonging to a specific entity, (e.g. Sedimentology). Repeating similar informative attributes creates unnecessary layers of entities and inflates the system, slows the retrieval process [5], requires multiple joins and alias to identify unambiguous georeferenceable records within the GIS. This problem was avoided by the process of normalisation that has been used to remove redundant attributes without affecting the content and context of the intended aim (Figure 1). E n tity N a m e s . T h e I s s u e o f N o r m a liz a tio n in a M u lti D is c ip lin e G I S D a ta E n g m m c o d e |p 1 |p 2 |p 3 | C h m M e a s u r e m e n t_ R e s u lts If n o rm a liz e d m m c o d e |p 1 |p 2 |p 3 | m m c o d e |p 1 |p 2 |p 3 |d i s c p c o d e B io m m c o d e |p 1 |p 2 |p 3 | S ed m m c o d e |p 1 |p 2 |p 3 | R ed u n d a n t A ttr ib u te s Figure 1: Normalisation of the Multidisciplinary GIS Attribute Entities The Integration of the GIS Building Block Entities Primarily, the Dimension Entities (User Defined Object Entities) that store data related to geographic objects of the axis of an investigation were selected. Geographic co-ordinates and referencing attributes were defined within these entities. These entities were connected to the central fact bearing entities (UDE) and system generated entities (SGE) by means of migrating foreign key. In a GIS project, where complex data are collected, multidiscipline analysis correlation is required, and mutual relation of the different GIS information is needed, then integrating the GIS database is a prerequisite [4], [11]. This process is accounted as a necessity for understanding the functional dependency, system integrity and spatial data processing eventually monitoring the environment. Figure 2 shows the Building Block Entities that compose the GIS system. The Relationship of the entities that compose the MicroStation GeoGraphics GIS depicts internally linked tables, [7] (block 1) and UDO (user defined objects, block 2) with Entity’s relational linkage to (to the user defined objects, block 2). The UDO are the ‘interlinkers’ to the data model (block 3) via the SGE (system generated entities). The system entities can be created by Data Definition Language (DDL) of SQL [7], [13] or can be created automatically. The relationship’s cardinality varies from one-toone, one-to-many or many-to-one. Block 1 (System Entities) are generated during the GIS project creation. BLOCK 1 Partial view of System Entities BLOCK 2 User Defined GIS objects ‘interlinkers’ BLOCK 3 User defined Data model Figure 2: The GIS objects Relationship Block 2 (User Defined Object) entities which are part and parcel of the external model, where the feature definitions, graphical properties and symbology is defined within the GIS application (Figure 3) and the relational constraints and validation are carried out in the external and internal models. Block 3 (User Defined Entities) entities pertaining to this level are wholly defined by the DDL (data definition language) of SQL. The removal of redundant attributes or normalisation process was based on Entity Relationship and the in built relational constraint implementers’ tool of Microsoft access has been used. Figure 3: Constructing The GIS Feature Entity 2.1.1 The GeoReferencing Entities The co-ordinate attributes and one or more internal model attribute(s) describe the location entity. Upon geodetic referencing, the Location entity is used as the primary Entity [7]. The attributes are all mandatory. Depending on the data items and required visualisation, the real world object can be represented by a two-dimensional point (latitude and longitude) [10] or a threedimensional point that include the two-dimensional + elevation attributes (6). In this case study both were tested and it is clear that the 2nd is a subset of the 3rd in terms of the visualisation for reasons stated below. A 2-dimension feature contains the mandatory attribute latitude and longitude as mandatory attributes implies that the spatial relationship and a mutual visualisation are to be implemented on a 2-dimensional map. A 3-dimension feature contains mandatory attributes that could be expressed by latitude longitude and elevation implies that the GIS’s information retrieval will include a third attribute of elevation to georeference and visualise on a 3-dimension perspective. A GIS should display real world phenomenon which making some decisions required by the user. These are done by keeping the latitude and longitude and change the elevation of the GIS record. Based on the GIS object entity structure, it is possible to classify the structure of a GIS database unique record into 3 components. This process indicates the difference between a RDBMS database and GIS database where the GI Database is composed of three types of data fields. Identification- attributes strictly utilised as a basis to uniquely identify records, searching, sorting and joining the Entities. Informative- attributes or fields containing data items, which can be used for later retrieval and information filtration. ‘Super Identification fields’- these are specific predefined fields, which migrate and descend as physical pointers from the internal model. The application of these fields is to retrieve spatially associated GIS features. As indicated above, the LOCATION entity can be referenced by one or more combined mandatory key attributes. However a single attribute entity, which uniquely identifies the database record akin to different fields, but to the same associated entity, is required. The station attribute is a key at the relational linkage level, however this is true as long as it stays as nonrecurring repetitively. An OBJID is a pointer attribute that would be used to bridge the gap between the internal and external systems and is applied to retrieve spatially referenceable GIS information for the following combined reasons. A location can be identified by its station-code/station. In a single Location there could be data collected or measurement done by different disciplines. In a location there could be data functionally dependable on temporal and spatial attributes (x, y, z) parameters. The presence of the elevation (z) attribute within the LOCATION entity evolves a problem during sampling or measurement in a single location, which have different depths. At this point the mandatory attribute station loses its constraining power and could not be used to identify ambiguity free spatial information outputs. Therefore, the objid and mapid identification attributes takes over the system integrity, functionality in interpreting and visualising the problem using GIS. The MSLINK and the mapid (MicroStation’s attributes) [6] take over the capability of the unique GIS records identification and system optimisation. The MSLINK is a real world object id (primary key). The ugmap’s mapid migrates to the dimension entities Location, Structure, Estuary, Lithology (Figure 2) with enforced referral integrity key and a cardinal join type of extending the GIS information link based on equal and same DDL constrained attribute types. As the Estuary is the main aquatic zone, different tributaries and geomorphology do characterise it. Spatially, the measurement sites are situated within the Estuary. The relationship between the Location (for measurement) and Estuary is m: 1 (many to one). Integrating and Implementation Processes A GIS Database management system (in this case study) is the result of amalgamating an External and Internal (GIS application) models. It allows transforming of the data into GI. [10]. The DDL language (Create, Update, Insert, Alter and Delete) is used to define all the attributes pertaining to the GI database [13]. The attributes pertaining to a specific entity, for instance the Location entity attributes defined by DDL, are bundled together to form a single unit of database record or row which can be attached or linked to a geographic object. The limitation encountered, is that a Relational Database DDL could not be used to define a bit, OLE or hypertext data type which can be linked to GIS features unless other applications such as, e.g. the Geomedia (Intergraph) or the Optional Spatial Oracle Server is used. It is possible to have the same attributes in different entities in the same constructed GIS. The attributes are qualified by defining an alias relating to the Entity name. This facility removes the limitation that states that Entities should have unique attribute names. One of the system entities (ugtable_cat) contains a field that permits to define the table alias for each GIS object. Therefore Entitity.Attribute Name - fully qualifies an attribute name during data retrieve process [7]. GIS Integrity constraints The location object is identified by an objid (MSLINK and MAPID) and a STATION where the formers are pointers Keys from system entity UGMAP and the later is part and parcel of the user defined entity candidate key. Entities link to exchange the GIS information in between them by means of their keys. The relational linkage is determined during the design by choosing attributes that have the same type of properties defined by DDL. The relationship between the Location Field Data and Measurement_Results is verified as follows. If the Station attribute values occurs recursively due to a presence of an elevation attribute at the Feature object definition level, then the Location’s key station loses its constraining power and the principal object keys remain the MSLINK and mapid. On the other hand the Entity (Measurement_Result) contains always its user-defined mmcode, key besides the migrating keys above mentioned. Likewise, it holds true for the fielddata. Establishing entity relationship helps to avoid update and delete anomalies. This is considered for unnecessary loss and modification of the information asset. For instance there should not be data in the measurement result entity that could not be geodetically referenced in the Location Entity by means of the MSLINK and MAPID. This rule is defined to upheld justifiability and to safeguard the GIS information integrity. Enabling the cascade updating and deletion database tools can also synchronise information integrity [13], [12]. Each value of the matching attribute must posses the same type and should have the same value. This can be illustrated that if the migrating primary key from the object entity Location is deleted then any record mapped by the relationship established to the results entity will be deleted. Cascade update is an option, which allows the system to be synchronised. Cascade deletion preserves the referential integrity at the cost of performing massive deletion. Thus both must be done with circumspection. Composing and populating the GIS Populating Attribute data Populating the GIS database is a phase that carried out in the aftermath of the implementation and ahead of the phase of GIS information retrieval process. Generally two types of data populate the structured GIS. Loading the database with alphanumeric attribute values associated with or without a bit or image data for GI retrieval. This step requires the flat file preparation be importable or loaded to the structured Entity. The case study loading process has been done using macros which detects a structured flat file is loaded by attribute values [12], [13]. The structure of the flat file should be conforming to the GIS Entity’s structure. The incoming input data is constrained by the validation rule defined during the definition of the Entity structure. Any incoming data that violate the validation is rejected by the system from being residing inside the GIS. Populating Geographic Objects It was not possible to use the ANSI SQL to load geographic co-ordinates and produce features (points, lines or polygons). This done using external programs or subroutine [8] and feature loader programs (e.g. Intergraph Feature Loader) because the geographic objects posse properties and symbology that cannot be defined by SQL. The populating processes include the conversion of the textual attribute values or co-ordinates of latitude, longitude, (e.g. 02: 45:34,80:23:45) to numerical value using SQL (Structured Query Language) program and later dispatch it to a loading program. MBE [8] was used to generate and populate the required GIS map with required objects’ symbology and properties, which eventually associated with GIS database records). A sub procedure that loads GIS objects (point and circle-shape) with specific dimension of each object within its respective co-ordinates and the elevation value is illustrated as follows. Sub main MbeSendKeyin "place point" MbeSendCommand "ACTIVE COLOR 7" MbeSendCommand "ACTIVE STYLE 0" MbeSendCommand "ACTIVE WEIGHT 3" Mbesendkeyin”xy=8955.85,299.977,4.9” Mbesendkeyin”xy=9009.5,372.7955,4.7… MbeSendKeyin "place circle radius" MbeSendCommand "ACTIVE COLOR 10" MbeSendCommand "ACTIVE STYLE 0" MbeSendCommand "ACTIVE WEIGHT 2" Mbesendkeyin”xy=8955.85,299.977,4.9” Mbesendkeyin”xy=9009.5,372.7955,4.7”… MbeSendReset End Sub The result of the program is displayed in a newly opened 3D seed file [3]. No topology clean up is required. The geographic objects ought to be relevant to a particular projection system to geodetically reference the GIS information. Case study projection was the SA-1956 ellipsoid and Mercator projection system [9]. The elements populated to and associated with a feature must be topologically cleaned. The defined element, thematic type must be consistency as defined in the feature set up. It should be import/export (able) to GIS projects that share the same working directories. These objects act as intercalated medium between the internal and external models to perform the required spatial data retrieval. The geometric co-ordinates, origins and data linkage information of all the features are stored within the GIS application (Figure 4). Table 1 shows the x and y co-ordinates and the calculated measured average depth in meters. Figure 4: The Generated Geometric Shapes The attributes are stored in the structured entities, which include a unique identifier for the corresponding spatial object accompanied by various relevant attribute groups. The unique spatial object identifier serves as a link between the attribute data and the corresponding spatial data [7]. Commonly, the spatial entities attribute includes spatial data values such as the area and perimeter, which could be derived from the geometric data representation. Every geometric shape contains a centroid that stores database records associated with. MSLINK 10 11 12 13 14 15 16 17 mapid 4 4 4 4 4 4 4 4 Station 18 19 2 20 3 4 5 6 EastingKm 619,031 621,742 625,346 617,491 626,588 628,344 624,575 626,99 NorthingKm 9749,013 9747,027 9763,399 9753,064 9761,006 9770,452 9756,538 9750,894 AvgDepthm 11,607 5,586 3,202 11,512 2,233 1,601 5,68 4,507 Area 7,869953 6,009852 7,23916 19,091859 7,23916 7,23916 7,23916 19,091859 Perimeter 11,268 9,806 10,778 17,48 10,778 10,778 10,778 17,48 Table 1 Spatial Analyses and Results The structuring process of the GIS Entities was based on such a way, that the content of the multidisciplinary GIS entities could be correlated spatially to solve or suggest a solution to the selected problem. With this regard the following problems were challenged. Locating potential site of a discipline specific anomalous concentration of the measured parameters or analysed attribute results. Delineate dimensional extent of the measured parameter values, which discloses abnormal concentration and suggest the solution to that problem. Create a superimposition of the different processed data or information layers to observe which attributes are having common environmental problems- for instance an area could be classified as toxic zone based on the Measured Results of different parameters concentrations Relate the geological structures and the distribution of the different lithologies as a source for the different sediment type deposition within the marine environment and to determine weather the estuary channels are related to the tectonic structure of the area. The components of the GIS information layer includes attribute values belonging to a multidisciplinary categories and the geodetically referenceable location of data as represented by geographic features such as the point and a polygonal shapes on a randomly measured locations. The interpretation utilise GIS as a spatial technology tool by incorporating specific criteria such the temporal attributes (dd-mmm-yyyy and hh-mm-ss), measured depth and analyses results of various parameters pertaining to a specific discipline. The nature of the environment can be understood only, if the scientific results in the GIS can be filtered with a user friendly and interactive language, such as SQL and integrated then after to visualise the environmental impact. SQL (The Structured Query Language) is a powerful RDMS language used in data definition, manipulation, sorting, inserting, updating and filtration attributes values in a structured and associated GIS database [1], [13], and [12]. It is flexible and powerful as long as it is applied to the purpose it is design for. SQL’s data manipulation (DML) can be used to retrieve, insert, and modify besides the querying capability such as the Select, project, join, embedding). The consistencies and integrity of the queried results depend on the robustness of the model and designed GIS. The process of displaying, reviewing, locating and creating of the topologically resymbolised, and spatially associated attribute information is done by the in built GIS application [6], [7] commands. Anomalous Zone of Pesticides (endrine) Geographic Information based on detailed GI processed data and selecting specific conditions that approximately helps to suggest a solution and identify the problem is as important as the problem being investigated. The environmental impacts can be only determined using GIS if the detailed and relevant criteria are selected. The following query retrieves the ENDRIN pesticide concentration, which is banned in some countries. SELECT * FROM LOCATION WHERE STATION IN(SELECT STATION FROM RESULTS WHERE PARAMETER = 'endrin' AND PRACT_VALUE > (SELECT AVG(PRACT_VALUE) FROM RESULTS WHERE PARAMETER = 'endrin') AND avgdepthm >(SELECT AVG(avgdepthm) FROM location WHERE avgdepthm BETWEEN 0 AND 1 AND (LOCATION.AREA > 5); The query searches for the pesticide endrine within the measured Locations, retrieves the values, and compare if greater than the average measurement per station; the value is compared to the stated average depth per a single location. If this is TRUE the value is compared to the dimensional extension around the centroid, which is associated to the shape. The result is displayed tabularly. The graphic query, Review, Locate, Annotation, Spatial Topologic Creation, zoning and displays are performed by the GIS application tools. The area affected by the above average concentration of the pesticide ENDRIN according to the specified criteria occurs in the north-eastern of the area of interest. However, if less than average then the distribution is on the east-west channel (zone1), which implies less concentration, probably due to the distance of source of the pesticide been investigated (Figure 5). Zone one concentration is less then zone two. Figure 5: The Pesticide Anomalies Delineated Zones Lead Anomalous Zone Delineation The below SQL spatial analysis is based on the following criteria: SELECT * FROM LOCATION WHERE STATION IN(SELECT STATION FROM RESULTS WHERE PARAMETER = 'lead' AND PRACT_VALUE > (SELECT AVG(PRACT_VALUE) * 1.5 FROM RESULTS WHERE PARAMETER = 'lead') AND SAMP_DEPTH_M >(SELECT AVG ( SAMP_DEPTH_M) FROM RESULTS WHERE SAMP_DEPTH_M >5)) AND (LOCATION.AREA > 1); Lead zone concentration is half above of the average sampled depth in meters is greater than average, for depths only above 5 meters. Figure: 6 The Heavy Metals (Lead) Anomalies Delineated Zones The area of influence, which is a predetermined, is larger than 1 km2. Eight spatial locations are displayed out of which the three locations are characterised by a depth of greater than 10m the spatial distribution of the lead is mainly on the North West of the study area, (Figure 6). One can suggest that the spatial dissemination of lead decreases with depth meters greater than 10. Spatial higher anomalous dissemination of lead is related to shallow depth. The Sediments Profile select * from location where maxdepth between 3 and 11 and station in(select station from results where parameter = 'sand' and pract_value > 56%) UNION select * from location where maxdepth between 3 and 11 and station in(select station from results where parameter = 'ppddt' and pract_value between 0.01 and 2 ng/g). Refer, Figure 7. The Measured Sub Marine Morphology Sand% > 56 Vertical exaggeration = 1:1000m Figure: 7 Sediment distribution The bottom estuary contaminant sediments deposit zone, where the result of the analysed sand rate is greater than 56%. The cross sectional view of the area depicts the eastern flank of the measurement position. Conclusions This research work has proved that, the construction of the feature based GIS, where multidisciplinary heterogeneous marine environment data are involved, requires robust data model. This enhances the GIS data integrity, interoperability, and retrieved spatial information consistency and system optimality. Furthermore, the implementation and exploitation persistency of the GIS depends on the conceptual schema definition and on the established relational integrity constraints. A hybrid and flexible querying methodology, which involves SQL and graphical tools, enabled to retrieve spatial and temporal information. Eventually it was possible to delineate and correlated zones that display abnormal concentrations of measured and analysed parameters results of pesticides, contaminant sediments, demarcate zones of higher and lower bio-species in different locations. Visualisation of the Marine environment on a 3d map discerns a sub marine environmental view (profile) by exaggerating the vertical extension. Acknowledgements The construction and implementation of a Marine Geographic Information (GIS) research work has been funded by ABOS (Belgian Agency for co-operation and development), VLIR (Flemish Inter-university Council) and EC-Programme MAST 3 (contract no MAS3-CT97-0100). The authors would like to gratefully acknowledge the ABOS and EU MAST Office for their financial support. References [1] Egenhofer, M., 1989 Spatial query languages, PhD Thesis, University of Main , USA. In Int. J. of Geographic Information Systems, 1992, vol.6, no 2, 71-55 [2] Ghebreegziabeher, T., 1992 The evolution of coastal plains, south-west Belgium. MSc. Thesis Vrije Universiteit Brussel, Fundamental and Applied Quaternary Geology program 66p. [3] Molenaar, M., Rikkers, R., Stuiver, J. A Query Oriented Implementation of a topologic data structure for 3 dimensional vector maps. In Int.J.Geographic Information Systems, 1994, vol. no 3, 234-260 [4] Ghebreegziabeher, T., 1995 The Design and Development of a Geo_Database Application in Geo-data Processing, Vrije Universiteit Brussel, Department of computer science (Informatica) 64p. [5] Date, C.J. 1995 an Introduction to Database Systems, 6 th ed. Addison Wesley, 839p [6] Ghebreegziabeher, T., Peirlnckx L., and Van Biesen L. 1996. Design and Development of Marine Geographical Information Systems (GIS), ELEC (Vrije Universiteit Brussel) [7] Bentley (ed.) 1996 MicroStation geographic. Bentley Systems, Incorporated, Pennsylvanian [8] Bentley (ed.) 1996 MicroStation Basic. Bentley Systems, Incorporated, Pensylvania [9] Bentley (ed.) 1997 McroStation Geocoordinator. Bentley (Mizar) Systems, Inc., Pennsylvania [10] Jones, C. B., 1997 Geographic Information Systems and Computer Cartography, Addison Wesley 289p [11] Van Biesen, L., Cisneros.Z., Yamba P., Ghebreegziabeher T., and Peirlnckx L. Tackx, M.,Tores, F.,Roose, P., Gomez, h., Wartel,S., and Vincx, M., 1998. Development of a multi-disciplinary geographical information system (GIS) of the Marine Environment in view of the monitoring and modeling of the Guayas estuary and the Estero Salado (Ecuador. In: The Proceeding of the oceans ’98 conference, Nice. [12] Simpson, A., et al, 1997, Mastering Access 97, fourth ed. [13] ORACLE, 1995, SQL*plus users guide and other manuals (Oracle corporation)