Geografiske informasjonssystemer (GIS) SGO1910 & SGO4930 Vår 2004 Foreleser: Karen O’Brien (karen.obrien@cicero.uio.no) Seminarleder: Gunnar Berglund (gunnarbe@student.sv.uio.no) Geographic Databases A GIS can answer the question: What is where? WHAT: Characteristics of attributes or features. WHERE: In geographic space. A GIS links attribute and spatial data Attribute Data Flat File Relations Map Data Point File Line File Area File Topology Flat File Database Attribute Attribute Attribute Record Value Value Value Record Value Value Value Record Value Value Value 13 11 2 12 10 7 POLYGON “A” 5 9 4 2 1 6 3 8 1 1xy 2xy 3xy 4xy 5xy 6xy 7xy 8xy 9xy 10 x y 11 x y 12 x y 13 x y Points File Arc/node map data structure with files File of Arcs by Polygon A: 1,2, Area, Attributes 1 1,2,3,4,5,6,7 2 1,8,9,10,11,12,13,7 Arcs File Figure 3.4 Arc/Node Map Data Structure with Files. What is a Data Model? A logical construct for the storage and retrieval of information. Attribute data models are needed for the DBMS. The origin of DBMS data models is in computer science. Definitions Database – an integrated set of data on a particular subject Geographic (=spatial) database - database containing geographic data of a particular subject for a particular area Database Management System (DBMS) – software to create, maintain and access databases A DBMS contains: Data definition language Data dictionary Data-entry module Data update module Report generator Query language Advantages of Databases Avoids redundancy and duplication Reduces data maintenance costs Applications are separated from the data Applications persist over time Support multiple concurrent applications Better data sharing Security and standards can be defined and enforced Disadvantages of Databases Expense Complexity Performance – especially complex data types Integration with other systems can be difficult Characteristics of DBMS (1) Data model support for multiple data types e.g MS Access supports Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object, Hyperlink, Lookup Wizard Load data from files, databases and other applications Index for rapid retrieval Characteristics of DBMS (2) Query language – SQL Security – controlled access to data Multi-level groups Controlled update using a transaction manager Backup and recovery Role of DBMS System Task Geographic Information System • • • • • Data load Editing Visualization Mapping Analysis Database Management System • • • • Storage Indexing Security Query Data Retrieval The ability of the DBMS or GIS to get back on demand data that were previously stored. Geographic search is the secret to GIS data retrieval. Many forms of data organization are incapable of geographic search. GIS systems have embedded DBMSs, or link to a commercial DBMS. Types of DBMS Model Hierarchical Network Relational - RDBMS Object-oriented - OODBMS Object-relational - ORDBMS Historically, databases were structured hierarchically in files... Norge Oppland Bærum Akershus Asker Hordaland Ski Relational DBMS Data stored as tuples (tup-el), conceptualized as tables Table – data about a class of objects Two-dimensional list (array) Rows = objects Columns = object states (properties, attributes) Relation Rules Only one value in each cell (intersection of row and column) All values in a column are about the same subject Each row is unique No significance in column sequence No significance in row sequence Table Column = property Table = Object Class Row = object Object Classes with Geometry called Feature Classes Relational Join Fundamental query operation Table joins use common keys (column values) Table (attribute) join concept has been extended to geographic case Relational Data Bases Patient Record Key Check-in 42 2/1/96 78 2/3/96 Purchase Record Item Date Skate Board 2/1/96 Baseball Bat 2/1/96 Accident Report Date Injury 2/1/96 Broken Leg 2/2/96 Concussion 2/2/96 Cut on Ear Price 49.95 17.99 Name John Smith Sylvia Jones Robert Doe File Check Out Room No. 2/4/96 N763 2/4/96 N712 Customer John Smith James Brown Key 42 654 123 File Key 42 978 File Location 75 Elm Street 12 State Street 2323 Broad Street Most DBMS are now relational databases. Based on multiple flat files for records, with dissimilar attribute structures, connected by a common key attribute. Retrieval Operations Searches by attribute: find and browse. Data reorganization: select, renumber, and sort. Compute allows the creation of new attributes based on calculated values. Spatial Retrieval Operations Attribute queries are not very useful for geographic search. In a map database the records are features. The spatial equivalent of a find is locate, the GIS highlights the result. Spatial equivalents of the DBMS queries result in locating sets of features or building new GIS layers. The Retrieval User Interface GIS query is usually by command line, batch, or macro. Most GIS packages use the GUI of the computer’s operating system to support both a menu-type query interface and a macro or programming language. SQL is a standard interface to relational databases and is supported by many GISs. SQL Structured (Standard) Query Language – (pronounced SEQUEL) Developed by IBM in 1970s Now de facto and de jure standard for accessing relational databases Three types of usage Stand alone queries High level programming Embedded in other applications Types of SQL Statements Data Definition Language (DDL) Data Manipulation Language (DML) Create, alter and delete data CREATE TABLE, CREATE INDEX Retrieve and manipulate data SELECT, UPDATE, DELETE, INSERT Data Control Languages (DCL) Control security of data GRANT, CREATE USER, DROP USER Spatial Relations Equals – same geometries Disjoint – geometries share common point Intersects – geometries intersect Touches – geometries intersect at common boundary Crosses – geometries overlap Within– geometry within Contains – geometry completely contains Overlaps – geometries of same dimension overlap Relate – intersection between interior, boundary or exterior Spatial Methods Distance – shortest distance Buffer – geometric buffer ConvexHull – smallest convex polygon geometry Intersection – points common to two geometries Union – all points in geometries Difference – points different between two geometries SymDifference – points in either, but not both of input geometries Spatial Search Buffering is a spatial retrieval around points, lines, or areas based on distance. Overlay is a spatial retrieval operation that is equivalent to an attribute join. Identify Recode OR Data overlay Overlay Types of overlay operations And Or Max Min Buffer (raster) +1 Buffer (vector) Complex Retrieval: Map Algebra Combinations of spatial and attribute queries can build some complex and powerful GIS operations, such as weighting. Summary Database – an integrated set of data on a particular subject Databases offer many advantages over files Relational databases dominate Part II: Working with Attributes in ArcGIS Issues to discuss how attribute data is stored in a table of rows and columns how attribute data is associated with features tabular field types supported in ArcGIS types of table relationships how tables can be related to each other how to join tables based on a common field Review A geographic database contains both spatial and tabular data. The spatial data contains feature shape and location information, while the tabular data contains the attributes for the features. Often, feature attributes are contained in multiple tables. Anatomy of a Table Each table in a database has the same basic format: an array of rows and columns. Rows are also called records, and columns are also called fields. Some tables, like a feature class's default feature attribute table, have a preset number of columns. For instance, a polygon coverage's feature attribute table has four standard columns: Area, Perimeter, Coverage#, and Coverage-ID, while a line shapefile's feature attribute table has only one default column, named Shape. Other tables are completely user-defined. The table has three user-added columns: Name, Country, and Population. ArcMap automatically adds a third column (FID) for display purposes. The name of this column may be different depending on the type of data source. For example, it is called FID for a coverage or shapefile, OBJECTID for a geodatabase feature class, and Order_ID for a grid. Because some databases and some operations do not support fields with blanks in their names, you should avoid creating fields that contain them. In addition, every column in a table should have a unique name but columns in the same table can have a variety of formats. NOTE: Norwegian “æ å ø” can also create problems, as can decimal formats (10,1 versus 10.1). Tabular data field types Tables are capable of storing date, number, and text values, but most tabular formats have several different field types to store this information. Choosing the best field type for the values to be stored is an important consideration. Also, the available field types can vary between tabular formats. In general, you can store numbers, text, and dates. Specifically supported formats in ArcCatalog™ include short integer, long integer, float, double, text, date, object-id, and blob. Information stored in tables is organized by fields and field types. When defining a table's fields, be aware that each database has its own rules defining what names and characters are permitted. ArcGIS Tabular Formats ArcGIS supports the use of multiple formats for storing and managing tabular data. Each of ArcGIS software's primary spatial formats has its own native format. Coverages use INFOformatted tables; shapefiles store their attributes in dBASE (.dbf) format; geodatabases rely on the format of their supporting RDBMS (Oracle, for example). Deciding on the proper format in which to store attribute information is an important part of database design and can affect the efficiency with which you are able to access feature attributes. To facilitate sharing data that's in different formats, ArcCatalog and ArcToolbox contain tools to convert between the various tabular formats. In addition, some formats, such as the coverage, can link to independent tables regardless of their format. Tabular information can be stored in a variety of formats. In this case, feature information is stored in the coverage feature attribute table, data about the owners is stored in dBASE format, and tax information is stored in a relational database format. Associating Tables Because features often have many attributes, most database design guidelines promote organizing your database into multiple tables—each focused on a specific topic—instead of one large table containing all the necessary fields. This scheme is more efficient because it eliminates duplicate information in the database–you store the information only once in one table. Tables can be "connected" so that when you need information that isn't in the current table, you can access it from a table associated with it. Two tables can be connected if there is a similar field in each table containing common values. Each table must have at least one field containing unique values for each record; in database terminology, this field is called the primary key. Even if there are duplicate values in all the other fields, the primary key ensures that each row will be unique. Row uniqueness is important when connecting two tables because you want to make sure the correct rows are matched together. As a general rule, you connect tables from a primary key field in one table to the common field (called the foreign key) in the other table. In the next graphic, the ZONE_CODE field exists in both tables, contains common values in each, and has unique values in each row in the attribute table on the right. The tables can be connected based on this field. In each table above, ZONE_CODE contains the same values—codes for zoning types. The attribute table on the right also contains the descriptions for each code; this information is not stored in the feature attribute table, but it is information that users will want to access often. The tables will be connected so that the zoning descriptions can be easily accessed. Table Relationships In ArcMap, you can connect two tables using either a join or a relate. In order to know which method to use, you need to know how individual records in each table relate to one another. You need to know if one or more than one record in the first table is associated with one or more than one record in the second table. There are four possible relationship types (also called cardinality): one-to-one, one-to-many, many-toone, and many-to-many. Cardinality A property of a relationship between objects in a database, describing how many objects of type A are associated with how many objects of type B. Relationships can have one-to-one, one-to-many, manyto-one, and many-to-many cardinalities. For example, one parcel can have one owner (one-toone), one parcel can have many owners (one-to-many), many parcels can have one owner (many-to-one), or many parcels can have many different owners (manyto-many). Connecting tables with joins You can connect two tables together in ArcMap using a join. Join works with shapefiles, coverages, and geodatabase feature classes. Once the tables are connected, you can query, symbolize, and analyze your data based on the joined values. Table joins are designed for one-to-one or many-to-one relationships. For other cardinalities you should use a relate instead of a join. If you join two tables that have one-to-many or many-to-many cardinality, you will omit all records after the first match for each primary key value. When joining two tables, the names of the common fields need not be identical but the fields must be the same type (e.g., text, date, float, etc.). The ArcMap Join Data dialog is where you specify which tables you want to join and which fields contain the values that will match. Joined tables are not permanently connected. The fields from one table are appended to the other table. You can tell from which table a field originates because its source table name displays in its field name. You can remove a table join whenever you want. Table joins are “virtual”; that is, the two tables still exist as separate entities. Connecting tables with relates Another way that you can connect tables in ArcMap is by creating a relate. Like joining tables, relating tables defines a relationship between two tables and is also based upon a common field. Unlike joining tables, a relate doesn't append the fields of one table to the other. Instead, you access data in the related table by selecting records in one table and accessing the related records in the other table. You relate tables instead of joining them when there is a one-to-many or many-to-many relationship between the tables. The Attributes of parcels table and the Owners table have a one-to-many relationship (a parcel may have more than one owner). The two tables are related based on the Parcel_no field. Selecting vacant parcels in the attribute table selects the records with matching parcel numbers in the related table. In ArcMap, you create a relate in the Relate dialog by choosing the tables you want to relate and the fields in each that the relate will be based on. To access data in a related table, open one of the tables and select the records for which you want to display related records. Click Options, point to Related Tables, and click the name of the relate you want to access. The related table will display with the related records selected. It doesn’t matter which table you open, because in ArcMap table relates are bi-directional. If your data is involved in both joins and relates, the order in which the joins and relates are created is significant. If you have a table for which you have created a relate and you join another table to it, the relate will be removed. If you perform a relate on a joined table, the relate is removed when the join is removed. As a general rule of thumb, it is best to create your joins, then add your relates. Join Suppose you have a parcels feature attribute table and another attribute table that contains the names of parcel owners. The graphic below shows a one-to-one relationship. Each parcel has only one owner, so each record in the feature attribute table will relate to one record in the owners attribute table. For this type of relationship, you would use a table join. The graphics below show both a one-to-many and a many-toone relationship. On the left, one parcel can have many owners; therefore, the relationship is one-to-many. On the right, many parcels can have one owner, resulting in a many-to-one relationship. For a one-to-many relationship, you should create a table relate; for a many-to-one relationship, you should create a table join. You can also have relationships where many features relate to many records in the other attribute table. In this situation, many parcels could have many owners. This is the most complex relationship and can be difficult to manage. For this type of relationship, you should associate the two tables using a table relate. Other items to discuss sort, calculate, and freeze data in a table create summary statistics edit feature attributes using the Attributes dialog use the Field Calculator to update attribute values create a graph create a report There are a number of ways you can display attribute data. You can change the way data is displayed in a table, and you can take data out of the table and display it as statistics or in a graph or report. Access and edit attribute data in ArcMap. When you want to edit feature attributes, you start an editing session. When you're in an editing session, you can view the attributes of selected features by clicking the Attributes button to bring up the Attributes dialog. The Attributes dialog has two parts: a tree view on the left that lists each selected feature, and a pane on the right that shows the attribute values for the feature currently selected in the tree view. If you select features from more than one layer, each layer that contains selected features will be listed in the tree view. You can access individual selected features by expanding their layer If you select features from more than one layer, each layer that contains selected features will be listed in the tree view. You can access individual selected features by expanding their layer You can edit the attribute values for a single feature by clicking next to the field name and typing in the new value. If you want to change values for all the selected features at once, you can click the layer name and type the new value next to the field you want to update. In the graphic below, the OWNER values for all the selected homes are being changed to "Alvi Contracting." Table manipulation You can manipulate tables to change how data is viewed. Right-clicking any field brings up a context menu with many choices. You can sort the record values in a selected field either in ascending or descending order. You can sort both numeric and character fields. You can calculate the values of selected records using the Field Calculator. In the Field Calculator, you can update all the values or a selected set of values in a field at one time. Using the Field Calculator In ArcMap, you can edit feature attributes by creating simple calculations or logical expressions in the Field Calculator. The Field Calculator works on selected features or on all features in a layer if there are none selected. To access the Field Calculator, first start an editing session, then open the desired layer's attribute table by right-clicking the layer name in the Table of Contents. Choose Open Attribute Table and when the table displays, right-click the field you want to update. Choose Calculate Values to display the Field Calculator. To create an expression in the Field Calculator, you combine fields, functions, and operators. As you click on fields and functions, they appear in the expression box. You can also type directly into the box. The expression below will update the VALUE field with the results of multiplying each feature's area by 1.5. Freeze or unfreeze a column Freezing a column locks a column as the left-most column in the table view. You can then use the horizontal scroll bar to see the other columns in the table. When you scroll, the frozen column remains in place while all other columns move. A frozen column is easily identified because it has a thick black line separating it from the other columns in the table. Calculating statistics After selecting features on the map or selecting records in a table, you may want to calculate simple summary statistics for the data. (Cchoose the Statistics option from ArcMap's Selection menu) After you select the layer and field in the feature attribute table for which you want statistics calculated, a list of summary statistics, as well as a frequency distribution chart, display in the Selection Statistics window. Graphs in ArcGIS Values for ArcGIS graphs come directly from feature attribute tables. You can represent your data and analysis results using many styles of graphs, including both two- and threedimensional graphs. Some graphs are better than others at presenting certain kinds of information. You should carefully consider the information you want to present before choosing a graph style. Once you've created a graph, you can add it to a map in ArcMap's Layout View. When placed on the layout, a graph becomes a graphic element that you can size and position as desired. Once you've placed a graph on a layout, however, it becomes static and changes to the graph's source table will not be reflected in the graph. Reports A report presents tabular information about map features (from a feature attribute table) formatted in an attractive manner. You can choose which fields from your table you want to display and how you want to display them. Once you've created a report, you can place it on your map layout with your geographic data or save it as a file for distribution. A report includes a title, page numbers and the current date, summary statistics, images, borders, and, of course, the data from the feature attribute table. Displaying your data in a report allows you to organize your data. You can sort records based on the values in one or more fields— given a list of cities, you could sort them by total population, for example. You can also group records and calculate summary statistics. For example, you could group cities by their country. It would then be easy to see which city has the largest population in a given country. You can also calculate summary statistics—sum, average, count, standard deviation, minimum, and maximum values. You can export reports to different file types, including Adobe's Portable Document Format (PDF), Rich Text Format (RTF), or plain text (TXT).